Generating diffusions with fractional Brownian motion

We study fast / slow systems driven by a fractional Brownian motion $B$ with Hurst parameter $H\in (\frac 13, 1]$. Surprisingly, the slow dynamic converges on suitable timescales to a limiting Markov process and we describe its generator. More precisely, if $Y^\varepsilon$ denotes a Markov process with sufficiently good mixing properties evolving on a fast timescale $\varepsilon \ll 1$, the solutions of the equation $$ dX^\varepsilon = \varepsilon^{\frac 12-H} F(X^\varepsilon,Y^\varepsilon)\,dB+F_0(X^\varepsilon,Y^\varepsilon)\,dt\; $$ converge to a regular diffusion without having to assume that $F$ averages to $0$, provided that $H<\frac 12$. For $H>\frac 12$, a similar result holds, but this time it does require $F$ to average to $0$. We also prove that the $n$-point motions converge to those of a Kunita type SDE. One nice interpretation of this result is that it provides a continuous interpolation between the homogenisation theorem for random ODEs with rapidly oscillating right-hand sides ($H=1$) and the averaging of diffusion processes ($H= \frac 12$).


Introduction
The setting considered in this article is as follows. Consider a particle in a rapidly evolving random medium, so that it is governed by a stochastic differential equation of the type dx t = A(x t , t/ε) dt + σ(x t , t/ε) dB for a small parameter ε > 0. The situation we are interested in is where, in the "static" case (i.e. when A and σ have no explicit time dependence), the system is either super-or subdiffusive. This is the case if the driving noise B is modelled by fractional Brownian motion (fBM) with Hurst parameter H = 1 2 . Recall that fractional noises (i.e. the time derivative of fBM) can be obtained as scaling limits in statistical mechanics models [JL , Dob , Sin ] and that fBM with Hurst parameter H is a Gaussian process with stationary increments and self-similarity exponent H. It is therefore characterised (up to an irrelevant global shift) by the fact that E(B t − B s ) 2 = |t − s| 2H , so that it is superdiffusive for H > 1 2 and subdiffusive for H < 1 2 . The covariance of its increments, E(B t+1 − B t )(B s+1 − B s ), decays at rate |t − s| 2H−2 for large |t − s| and therefore exhibits long-range dependence when H > 1 2 . We furthermore assume that the rapid time evolution of the environment is described by a hidden Markov variable, thus leading to the model with B an fBM with Hurst parameter H ∈ (0, 1) in R m and F (x, y) ∈ L(R m , R d ).
The stochastic integral appearing in the first term is problematic when H < 1 2 : one should really interpret this equation as x ε t = lim δ→0 x ε,δ t with x ε,δ t driven by a smooth approximation B δ to B with relevant timescale δ ≪ ε ≪ 1, see Section . below. Regarding the fast Markov variable, a prototypical situation is that of a system of the type I where W is a Wiener processes independent of the fBM B appearing in ( . ). This allows for the case where the variable x feeds back into the evolution of y, but for most of this article we assume that there is no x-dependence in ( . ). We also assume that y t admits a unique invariant probability measure µ. In the case with feedback, we have a family of invariant measure µ x obtained by "freezing" the value of the variable x in ( . ). It was recently shown by the authors in [HL ] that in the case H > 1 2 the process x ε converges in probability to the solution to where the average of any function h is given byh(x) = h(x, y) µ x (dy). The aim of the present article is to investigate the two cases left out by the aforementioned analysis, namely what happens when either H < 1 2 or when H > 1 2 butF = 0 in ( . )?
. Description of the model It turns out that the effect of the rapid oscillatory motion described by the fast variable y is to slow down the motion of x in the superdiffusive case and to speed it up in the subdiffusive case. This can be explained by the following heuristics. For times of order t ε, the process Y doesn't evolve much so that, by the scaling property of the driving fBM, one expects the process x to move by about ε H in a time of order ε. On large times t ε on the other hand we will see that the limiting process is actually Markovian, even in the case with long-range dependence. This suggests that over times of order t the process x performs about t/ε steps of a random walk with step size ε H and therefore moves by about ε H t/ε. This suggests that one should multiply F by ε 1 2 −H in order to obtain a non-trivial limit. As a consequence, the equations we actually study in this article are of the form: (summation over i is implied), where B is a fractional Brownian motion with Hurst parameter H ranging from 1 3 to 1 , and Y is an independent stationary Markov process with values in some Polish space Y, invariant measure µ and generator −L . At the moment, we are unfortunately unable to cover the case when X feeds back into the dynamics of Y . When H > 1 2 , we furthermore assume that F i (x, y) µ(dy) = 0 for every i = 0 and every x.
Our main result is that, as ε → 0, solutions to ( . ) converge in law to a limiting Markov process and we provide an expression for its generator. In fact, we have an even stronger form of convergence, namely we show that the flow generated by ( . ) converges to the one generated by a limiting stochastic differential equation of Kunita type (i.e. driven by an infinite-dimensional noise).
We could probably deal with H ∈ ( 1 4 , 1 3 ] with our techniques, but this would obscure some of the arguments for relatively little gain. For H ≤ 1 4 , there exists no solution theory even in the absence of Y .
The convention of adding a minus sign to the generator simplifies our expressions later on.
The special case when F 0 = 0 and the F i are independent of the x-variable yields a functional central limit theorem for stochastic integrals against fractional Brownian motion. This already appears to be new by itself and might be of independent interest.
As already hinted at, the map t → F i (·, Y ε t ) is too irregular to fit into the standard theory of differential equations driven by a fractional Brownian motion, especially when H < 1 2 , so that it is not even completely clear a priori how to interpret ( . ) for fixed ε > 0. These questions will be addressed in more detail in Section below. Let us put these aside for the moment and consider the following ordinary differential equatioṅ where v is a smooth stationary Gaussian random process with covariance C such that C(t) ∼ |t| 2H−2 for |t| large. When H < 1 2 we furthermore assume that C(t) dt = 0 and, when H = 1 2 , we assume that C decays exponentially and satisfies C(t) dt = 1. One way of obtaining such a process v is to set v = ϕ * Ḃ for ϕ a Schwartz test function integrating to 1 (and * denoting convolution in time). This in particular shows that, at least in law, one has (εδ) H−1 v( t εδ ) = (ϕ εδ * Ḃ)(t), where we set ϕ ε (t) = ε −1 ϕ(t/ε). Since this converges in law toḂ as εδ → 0, we can view ( . ) as an approximation to ( . ).
It is then possible to show that the limit X ε = lim δ→0 X ε,δ exists and our results hold with X ε interpreted in this way. Furthermore, we will see that all our results hold uniformly over δ ∈ (0, 1] as ε → 0. This in particular shows that the converse limit obtained by first sending ε → 0 and then δ → 0 is the same, as are all limits obtained by other ways of jointly sending ε, δ → 0.

. Description of the main results
We now give a precise formulation of our main results, albeit with a simplified set of assumptions. The reason is that while the simplified assumptions are straightforward to state, they are very stringent regarding the Markov process Y . The more realistic set of assumptions used in the remainder of the article however is quite technical to formulate. We first recall the following standard definition of the fractional powers of the generator of the process Y .

Definition .
We write H = L 2 (µ) with µ the invariant measure of Y and ·, · µ for its scalar product. For α ∈ (0, 1), we then say that f ∈ Dom(L α ) if, for every g ∈ H, the integral converges and determines a bounded functional on H (which we then call L α f ). Recall that the generator of the process Y is −L, so that L is indeed a positive operator in the reversible case and L α does then coincide with the definition using functional calculus.
Similarly, for α ∈ (−1, 0), we write L α for the operator given by Since t → t −α−1 is locally integrable, it follows from the first point of Assumption . below that L α is a bounded operator on the subspace of Lip(Y) consisting of mean zero functions.
Assuming that X ε takes values in R d , we then define the d × d matrix-valued function where L acts on the second argument of F k . As we will see in Remark . , the expression ( . ) is naturally interpreted as the limit δ → 0 of a "local" Green-Kubo formula associated to the fluctuations of ( . ). Note that the conditionF k = 0 is necessary in the case H > 1 2 since the negative power of L appearing in this expression does not make sense otherwise, see also Remark . below. We shall assume mixing conditions and Hölder continuity of the Y variable, see Assumptions . -. below, as well as a regularity condition on x → F (x, ·) (and also F 0 ) as spelled out in Assumption . . A simpler set of conditions is as follows, the first of which is a strengthening of Assumptions . and . , the second is a strengthening of Assumption . , and the last is just a restatement of Assumption . in this context.

Assumption . (Simplified Assumptions)
The functions F i appearing in ( . ) as well as the Markov process Y satisfy the following.
. The Markov semigroup associated to the process Y is strongly continuous and has a spectral gap in Lip(Y), the space of bounded Lipschitz continuous functions on Y. . In the case H < 1/2 we assume that, for any α < H, the process t → Y t admits α-Hölder continuous trajectories and its Hölder seminorm (over intervals of length 1 say) has bounded moments of all orders.

I
. When H > 1 2 , we also assume that F i (x, y) µ(dy) = 0 for every i = 0 and every x. . There exists κ > 0 such that, for every in Lip(Y) and its derivatives of order at most 4 are bounded by C(1 + |x|) −κ for some C > 0.

Remark .
Recall that a Markov semigroup (P t ) t≥0 admits a spectral gap in any given Banach space E ⊂ L 2 (µ) if P t : E → E is a bounded linear operator for every t and if there exist constants c, For this definition to make sense, E of course needs to contain all constant functions.
The reason why we are aiming for a more general result at the expense of a much more technical set of assumptions is that having a spectral gap in Lip(Y) is a very restrictive condition which is not even satisfied for the Ornstein-Uhlenbeck process.
The solution flow of ( . ) converges in law to that of the Kunita-type stochastic differential equation written in Itô form as and where ∂ (2) j denotes differentiation in the jth direction of the second argument.
Proof. As already suggested, this is a special case of our main result, Theorem . below. The fact that Assumptions . -. and . are implied by Assumption . is immediate. (Take E n = Lip(Y) for every n.) As a consequence, we also have the following functional CLT.

Corollary . Let
where the matrix Σ is given by ( . ) (which is independent of x,x in this case).

Remark .
Theorem . characterises lim ε→0 lim δ→0 X ε,δ and shows that it is a Markov process with generator A given by Our proof actually carries over with minor modifications to the case when ε → 0 for fixed δ (but with convergence bounds that are uniform in δ!), in which case the limit is given by the same expression ( . ), but with the matrix Σ given by where R δ (t) = δ 2H−2 Ev(0)v(t/δ) and P t = e −Lt denotes the Markov semigroup for Y . We will derive this formula in Section . where we will also see that, for frozen values of x, it is a special case of the Green-Kubo formula [Kub , PK , KP ]. Note that ( . )-( . ) (in particular the convergence of the flow) is also consistent with [GILN , Theorem . ] where a somewhat analogous situation is considered. It follows from Definition . that Σ δ → Σ as δ → 0, so that the two limits commute (in law).
Remark . There has recently been a surge in interest in the study of slow / fast systems involving fractional Brownian motion. We already mentioned the averaging result [HL ] which considers the case H > 1 2 but withF = 0. The work [PIX ] considers the case H ∈ ( 1 3 , 1) like the present article, but with the very strong assumption that F is independent of the fast variable, in which case only F 0 exhibits rapid fluctuations and one essentially recovers classical averaging results. In [BGS ], the authors consider the case H > 1 2 , but with F independent of the slow variable x and, as in [HL ], not necessarily averaging to zero. They obtain a description of the fluctuations for (a generalisation of) such systems in the regime where there is an additional small parameter in front of F . Formula ( . ) holds for the continuum of parameters H ∈ ( 1 3 , 1). There are two special cases that were previously known. The case H = 1 2 reduces of course to the classical stochastic averaging results [Str , Has , Fre , Sko ] which state that the generator of the limiting diffusion is obtained by averaging the generator for the slow diffusion with the x variable frozen against the invariant measure I for the fast process. Note that for this to match ( . ) one needs to interpret the stochastic integral in ( . ) in the Stratonovich sense. This is natural given that this is the interpretation that one obtains when replacing B by a smooth approximation, which is consistent with Remark . . The fact that one also has convergence of flows however (in the case without feedback considered here) appears to be new even in this case.
Another set of closely related classical results deals with "time homogenisation", also known as the Kramers-Smoluchowski limit or diffusion creation [Kra , PK ]. There, one considers random ODEs of the type with F averaging to zero against the stationary measure µ for the fast process Y .
In this case, one also obtains a Markov process in the limit ε → 0 and its generator coincides with ( . ) if one sets H = 1. This can be understood by noting that, at least formally, fractional Brownian motion with Hurst parameter H = 1 is given by B(t) = ct with c a normal random variable, so that ( . ) reduces to ( . ), except for the random constant c, which then appears quadratically in ( . ) and therefore disappears when averaged out. The standard proofs of averaging / homogenisation results found in the literature tend to fall roughly into two groups. The first contains functional analytic proofs based on general methods for studying singular limits of the form exp(tL ε ) for L ε = ε −1 L 0 + L 1 . This of course requires the full process (slow plus fast) to be Markovian and completely breaks down in our situation. The second group consists of more probabilistic arguments, which typically rely on using corrector techniques to construct sufficiently many martingales to be able to exploit the well-posedness of the martingale problem for the limiting Markov process. The latter are in principle more promising in our situation since the limiting process is still Markovian, but the lack of Markov property makes it unclear how to construct martingales from our process. (But see Section . for a construction that does go in this direction.) Instead, our proof relies on rough paths theory [Lyo , FH ], which has recently been used to recover homogenisation results (formally corresponding to the case H = 1), for example in [KM ]. See also [BC , CFK + , DOP , FGL ] for more recent results with a similar flavour. In the case when the fast dynamics is non-Markovian and solves an equation driven by a fractional Brownian motion, a collection of homogenisation results were obtained in [GL , GL , GL , GLS ], while stochastic averaging results with non-Markovian fast motions are obtained in [LS b, LS a] for the case H > 1 2 . The former group of results are proved using rough path techniques, but there is of course an extensive literature on functional limit theorems based on either central or non-central limit theorems, see for example [BT , BH , BM , MT , DM , PT , Ros ].
Finally, note that many physical systems can be regarded as a slow / fast systems, this includes second order Langevin equations and tagged particles in a turbulent random field [CH , PK , KP , KNR , SVY , BHVW , FK ]. They I also arise in the context of perturbed completely integrable Hamiltonian systems [FW , Li ] and geometric stochastic systems [Li , LBdS , GGR , Per ]. See also [Kur , E , Ver ] for some review articles / monographs.

Remark .
It may be surprising that, when H < 1 2 , even though X ε is driven by a fractional Brownian motion and F (x, y) isn't assumed to be centred in the y variable, the limitX is a regular diffusion. This is unlike the case H > 1 2 [HL , LS b] where a non-centred F leads to an averaging result with a process driven by fBm in the limit. This change in behaviour can be understood heuristically as follows. With η as in ( . ), the covariance of t → f (Y ε t )Ḃ t is given by ζ ε (t − s) = η ′′ (t − s)g((t − s)/ε) for some function g(t) = f, P t f , that would typically converge quite fast to a non-zero limit. The scaling properties of η then show that for some constant C g which has no reason to vanish in general. As a consequence, ε 1−2H ζ ε converges pointwise to 0 while its integral remains constant, suggesting that ε 1 2 −H f (Y ε t )Ḃ t indeed converges to a white noise. When H > 1 2 however, η ′′ is not absolutely integrable at infinity and one needs to assume that g vanishes there, which leads to a centering condition. A similar transition from diffusive to super-diffusive behaviour at H = 1 2 was observed in a different context in [KNR ].

Remark .
As explained, our result implies more, namely that the (random) flow induced by the SDE ( . ) converges in law to that induced by the Kunita-type SDE In other words, the flows ψ ε s,t , where ψ ε s,t (x) denotes the solution at time t to the x-component of ( . ) with initial condition x at time s, converge to a limit ψ s,t which is Markovian in the sense that ψ s,t and ψ u,v are independent whenever [s, t) ∩ [u, v) = . This remark appears to be novel even when H = 1 2 , but it is unclear whether it extends to the case when x feeds back into the dynamic of y as in ( . ).

. Heuristics for general slow / fast random ODEs
We now show how to heuristically derive ( . ). Consider a random ODE of the form where Z ε t = Z(t/ε) for some stationary (but not necessarily Markovian!) stochastic process Z andF (x, ·) is assumed to be centred with respect to the stationary measure I of Z. In the case whenF (x, z) =F (z) does not depend on x, it follows from the Green-Kubo formula [Kub , PK , KP ] that, at least when Z has sufficiently nice mixing properties, X ε converges as ε → 0 to a Wiener process with covariance This suggests that a natural quantity to consider in the general case is and that the limit of X ε as ε → 0 is a diffusion with generator of the form for some drift term b.
To derive the correct expression for the drift b, we note that one expects in the regime ε ≪ δt ≪ 1. The left-hand side of this expression is given by To lowest order, one can approximate this expression by replacing X ε s by X ε t , but the resulting expression vanishes rapidly for s t+ε due to the centering condition onF . To the next order, one has where the last identity follows from the substitution u = (s − r)/ε combined with the fact that, provided that Z is sufficiently rapidly mixing, we expect the main contribution from this integral to come from |u| ≈ 1, while typical values of s are such that (s − t)/ε ≈ δt/ε ≫ 1. Combining this with ( . ) eventually yields the expression Comparing this with ( . ), we conclude that b i (x) = (∂ j Σ ji (x, ·))(x) , I (summation over repeated indices is implied) so that ( . ) can be written as which does coincide with the expression ( . ) as desired.
In order to link this calculation with the setting of the previous section, we note that ( . ) (with F 0 = 0 for simplicity) can be coerced into the form ( . ) by setting Z t = (δ H−1 v(t/δ), Y t ) as well asF (x, (v, y)) = F (x, y)v. In this case, one has so that one does indeed recover the expression ( . ) for any fixed δ.
Remark . The eagle-eyed reader will have spotted that since the stationary measure of Z is N (0, C) ⊗ µ for some multiple C of the identity matrix and sinceF (x, (v, y)) is linear in v, the centering condition forF is always satisfied, independently of the choice of F . This explains why our main result does not require any centering condition when H ≤ 1 2 . When H > 1 2 however, the covariance function R decays too slowly for the heuristic derivation just given to apply. The centering condition for F then guarantees that correlations decay sufficiently fast to justify the second step in ( . ).
The remainder of this article is structured as follows. In Section we introduce the assumptions on the nonlinearities F i as well as the fast process Y , we discuss a few examples, and we give provide the statement of our main result. In Section we then show that solutions to ( . ) converge as δ → 0, which yields in particular a precise interpretation of what we mean by ( . ) when H < 1 2 . The strategy of proof is as follows. Given a smooth mollification B δ of B, we first show convergence of t s r s f (u)Ḃ δ (u)du g(r)Ḃ δ (r)dr as δ → 0 for any deterministic H-Hölder continuous functions f, g. While we are able to reduce this to existing criteria for canonical rough path lifts of Gaussian processes [FV a, CQ ] in the case where the two fractional Brownian motions appearing in this expression are independent, the case where they are equal requires a bit more care and relies on a simple trick given in Proposition . , which is of independent interest. This then allows us to build an infinite-dimensional rough path Z ε (taking values in a space of vector fields on R d ) associated to ( . ) in a similar way as in [KM , Sec. . ] (see also the "nonlinear rough paths" of [NX ] and [GLS ]) and to reformulate ( . ) as an RDE driven by Z ε with nonlinearity given by point evaluation. Section . provides details of the construction of Z ε , while Section . then uses it to formulate our main technical result, namely Theorem . which shows that Z ε converges to a certain rough path lift of an infinite-dimensional Wiener process with covariance function given by Σ. The remainder of the article is devoted to the proof of this convergence statement. Section shows tightness of the family {Z ε } ε≤1 , while we identify its limit in Section . In both sections, the cases H < 1 2 and H > 1 2 are treated in a completely different way.
The fact that we have convergence of the full infinite-dimensional rough path allows us to conclude that we do not just have convergence of solutions for fixed initial conditions, but of the full solution flow. One point of note is that there are two separate sources of randomness, namely the Markov process Y and the fractional Brownian motion B. Our convergence result is "annealed" in the sense that our convergence in law requires both sources, but a number of intermediate results are "quenched" in the sense that they hold for almost every realisation of Y . It is an open question whether our final convergence result also holds in the quenched sense.

Acknowledgements
XML acknowledges partial support from the EPSRC (EP/S / and EP/V / ), while MH gratefully acknowledges support from the Royal Society through a research professorship.

Precise formulation and results
In this section, we collect the precise assumptions on the functions F i as well as the Markov process Y . Convention. We write A B as shorthand for A ≤ KB with a constant K that will differ from statement to statement.

. Technical assumptions on the fast variable Y
Throughout the article we fix H ∈ ( 1 3 , 1) as well as a sequence (E n ) n≥0 of Banach spaces such that E n ⊂ E n+1 and E n ⊂ L 1 (Y, µ) for every n ≥ 0, and such that pointwise multiplication is a continuous operation from E 0 × E n into E n+1 for every n ≥ 0. We also write simply E instead of E 0 and assume E contains constant functions. See Section . below for two classes of examples showing what type of spaces we have in mind here.
First, we impose that Y has "nice" ergodic properties in the following sense, which in particular implies that µ is its unique invariant measure on Y.
For every n ∈ [1, N ), the semigroup P t extends to a strongly continuous semigroup on E n and there exist constants C and c > 0 (possibly depending on n) such that, for In the low regularity case, we also assume that the process Y has some sample path continuity when composed with a function in E 2 .

Assumption . For
for some constant c > 0.
We also need some integrability.
Remark . When combining it with the inclusion of the product, Assumption .
Another consequence of these assumptions is as follows.
Remark . As a consequence of Assumption . , we conclude that if f ∈ E 2 and H < 1 2 , then Recalling the definition of L α from Definition . , it follows that E 2 ⊂ Dom(L α ) for every α < H so that ( . ) is indeed well defined provided that F k (x, ·) ∈ E 2 for every x. This will be guaranteed by Assumption . below.

. Examples of fast variables
One possible concrete framework is as follows. Fix two weights V : Y → [1, ∞] and W : Y → (0, ∞) and a metric d on Y generating its topology with the property that there exists C > 0 such that, for all x, y ∈ Y with d(x, y) ≤ 1, one has For n ≥ 1, we then let B V,W be the Banach space of functions f : Y → R such that One choice of scale of function spaces that is suitable for a large class of Markov processes is to take E n = B V,W for every n ≥ 1 (and suitably chosen V and W ), while E 0 is chosen be the space of bounded Lipschitz continuous functions, namely B 1,1 . This framework is relatively general since it allows for a wide variety of choices of V , W , and of distance functions on Y, see [HMS , HM ]. For example, it was shown in [HM , Thm. . ] that the D stochastic Navier-Stokes equations exhibit P a spectral gap in such spaces under extremely weak conditions on the driving noise. More precisely, for every η small enough there exist constants C and γ such that This at first sight appears to fall outside our framework, but one notices that if one sets then the norm · V,W with V (x) = exp(η|x| 2 ) and W (x) = 1/(1+|x|) is equivalent to the norm ( . ). The reason for the choice of d as in ( . ), which is then "undone" by our choice of W , is to guarantee that ( . ) holds for V , which would not be the case for |x − y| ≤ 1 in the Euclidean distance.
To verify Assumption . one can then for example make use of the following.

In particular, on any fixed time interval, we have
where we combined ( . ) with Markov's inequality in the last step.
When H ≥ 1 2 , Assumption . is empty, so only integrability conditions are required on the spaces E n . This allows for example to use Harris's theorem [Har , MT , HM ] to verify Assumption . for spaces of functions with weighted supremum norms. More precisely, one would then take E to be the space P of all bounded Borel measurable functions and E n = B V , the Banach space of In order to verify our assumptions, it then suffices that V is a square integrable Lyapunov function for the Markov process Y and that the sublevel sets of V satisfy a 'small set' condition for the transition probabilities of Y [MT ].

. Main results
One final assumption we need is that the nonlinearities F and F 0 appearing in ( . ) are sufficiently nice E-valued functions of their first argument. More precisely, we assume the following.

Assumption .
The map x → F (x, ·) is of class C 4 with values in E and there exists an exponent κ > 16d p⋆ with p ⋆ as in Assumption . (and simply κ > 0 when H > 1 2 ) such that, for every multi-index ℓ of length at most 4, The same is assumed to hold true for F 0 . When H > 1 2 , we further assume that F i (x, y) µ(dy) = 0 for every i = 0 and every x.
The condition F ∈ C 4 is of course suboptimal and could probably be lowered to F ∈ C β for β > max{H −1 , 2} and F 0 ∈ C β for β > 1, at least if enough integrability is assumed in Assumption . . We also now fix a Schwartz function ϕ integrating to 1 and set ϕ δ (t) = 1 δ ϕ(t/δ). We then write B δ for the convolution of B with this mollifier, namely With this notation, the solutions to ( . ) are equal in law to the process given bẏ Since B εδ is smooth, this equation should be interpreted as an ordinary differential equation that just happens to have random coefficients. With all these preliminaries at hand, our main result is the following.
Proof. The convergence in probability of the flow X ε,δ → X ε is the content of Proposition . below. The proof of the conclusion of Theorem . , namely the convergence in law of the flow for ( . ) as ε → 0 is the content of Corollary . .
We first address the question of the convergence in probability of solutions to ( . ) to those of ( . ) as δ → 0 for ε > 0 fixed. In fact, we will directly provide an interpretation of ( . ) and show that this interpretation is sufficiently stable to allow for the approximation ( . ).
Our convergence proof relies on the theory of rough paths; we refer to [FH ] for an introduction. The main insight of this theory is that even though, for H ≤ 1 2 , the solution map B → X for equations of the type ( . ) isn't continuous when viewing B as an element of any classical function space large enough to contain typical sample paths of fractional Brownian motion, it does become continuous when enhancing B with its iterated integrals B = B ⊗ dB and endowing the space of pairs (B, B) with a suitable topology.
For this, consider for any x ∈ R d the processes Here, the first integral is interpreted as a Wiener integral which makes sense also when δ = 0 and, when δ > 0, coincides with the Riemann-Stieltjes integral. Recall that the Wiener integral of a deterministic (or independent) integrand against any Gaussian process B is well-defined provided that the integrand belongs to the reproducing kernel Hilbert space H B of B and provides an isometric embedding H B ∋ f → f dB ∈ L 2 (Ω, P). In the case of fractional Brownian motion, it is known that L 2 ⊂ H B when H ≥ 1 2 while for H < 1 2 one has C α ⊂ H B for every α > 1 2 − H. The fact that for fixed x and ε > 0, t → F (x, Y ε t ) belongs to H B for all H > 1 3 is then a simple consequence of Assumptions . and . combined with Kolmogorov's continuity criterion (when H < 1 2 ).
Given a final time T > 0 and α ∈ ( 1 3 , 1 2 ), we define the space C α ([0, T ], B ⊕ B 2 ) of α-Hölder rough paths in the usual way [FH , Def. . ], but with all norms of level-2 objects in B 2 . Recall that an α-Hölder rough path (X, X) is a pair of functions We define the second-order processes Z ε,δ andZ ε by (the differentials are taken in the r variable) and we define Z ε,δ = (Z ε,δ , Z ε,δ ), Z ε = (Z ε ,Z ε ). Note here that r → Z ε,δ s,r (x) is smooth and r →Z ε s,r (x) is Hölder continuous for any exponent strictly less than 1, so these integrals should C be interpreted as regular Riemann-Stieltjes integrals. In Section . below we will give a proof of the following result. , 1), and let Assumptions . -. , and . hold. Then, Z ε,δ For now, we take this result as granted. With this result in place, we obtain the following convergence result as δ → 0.

Proposition . The second claim of Theorem . holds.
Proof. With the space B as above, let δ : . We then claim that, for any ε, δ > 0, ( . ) can be rewritten as the rough differential equation (RDE) driven by the infinite-dimensional rough paths Z ε,δ andZ ε defined above and given by Note that since α + β > 1, there is no need to specify cross-integrals between Z ε,δ andZ ε since they can be defined in a canonical way using Young integration [You ].
To check that this RDE is well-posed for any rough paths Z ε,δ andZ ε belonging to C α ([0, T ], B ⊕ B 2 ) and C β ([0, T ], B ⊕ B 2 ) respectively, we note first that one readily verifies that the map δ is Fréchet differentiable, and actually even C 3 b . Its differential Dδ at x ∈ R d in the direction of y ∈ R d is given by for a suitable partial trace tr. This shows that Dδ · δ extends continuously to a (Since B 2 differs from the projective tensor product of B with itself, this doesn't automatically follow from the fact that δ itself is C 3 b .) Retracing the standard existence and uniqueness proof for RDEs, [FH , Sec. . ] then shows that ( . ) admits unique (global) solutions for every initial condition and every driving path. Furthermore, if the sample path t → Y (t) is given by any continuous Y-valued function then, under the stated regularity conditions on F, F 0 , it is straightforward to verify that the solutions to ( . ) coincide with those of ( . ).
Since the RDE solution is a jointly locally Lipschitz continuous function of both the initial data x 0 and the driving paths Z ε,δ andZ ε into C α (R + , R d ), the claim that X ε,δ → X ε in probability then follows immediately from Proposition . .

Remark .
This shows that it is consistent to define solutions to ( . ) in the general case as simply being a shorthand for solutions to ( . ) driven by the pair of rough paths Z ε andZ ε . This is the interpretation that we will use from now on. The fact that in the special case H = 1 2 this coincides with the Stratonovich interpretation of the equation follows as in [FH , Thm . ].

. Preliminary results
In this section we present a few general results that will be used in the proof of Proposition . . We start with the following elementary property of the second Wiener chaos.
exists in L 2 . Furthermore, the limit in ( . ) depends only on the limitK and not on the approximating sequence K δ and one has the bound EK 2 ≤ 2EK 2 .
Proof. The Gaussian probability space generated by the pair (B,B) has Cameron-Martin space H⊕H whereH is a copy of H. Since K δ is bilinear and K δ (B,B) has vanishing expectation, it belongs to the second homogeneous Wiener chaos, so that there existsK δ ∈ (H ⊕H) ⊗ s (H ⊕H) with K δ (B,B) = I 2 (K δ ), where I k denotes the usual isometry between kth symmetric tensor power and kth homogeneous Wiener chaos, see [Nua ]. Note now that, interpretingK δ as a Hilbert-Schmidt operator on H ⊕H, there exists K δ ∈H ⊗ H such thatK δ = ιK δ , where ι :H ⊗ H → (H ⊕H) ⊗ s (H ⊕H) is given by ιK = 1 2 0 τ K K 0 , with the obvious matrix notation and τ :H⊗H → H⊗H the transposition operator. This is because the first diagonal block is obtained by testing against k, The second diagonal block vanishes for the same reason with the roles of B andB exchanged. (Here we denote by x → x, f the unique element of L 2 (E) which is linear on a set of full measure containing H and coincides with f, · there. In fact, On the other hand, one has where this time I 2 refers to the isometry between H ⊗ s H and the second chaos generated by B only. Since I 2 and τ are both isometries, it immediately follows that E(K δ ⋄ (B, B)) 2 ≤ K δ 2 = 2 ιK δ 2 = 2E (K δ (B,B)) 2 , and similarly for differences K δ − K δ ′ , so that the claim follows.
Remark . It follows immediately from ( . ) that if we replace K δ by the same sequence of bilinear maps, but with their two arguments exchanged, the limit one obtains is the same.
Before we turn to the precise statement, we introduce the following notation which will be used repeatedly in the sequel. We write η for the distribution on R given by and η ′′ for its second distributional derivative. For a < 0 < b, we will then make the abuse of notation Lemma . Let a < 0 < b and H ∈ ( 1 3 , 1 2 ). Setting α H = H(1 − 2H) and ϕ 0 = ϕ(0), the limit above is given by independently of the choice of mollifier, thus justifying the notation. For a = 0, we set which can be justified in a similar way, provided that the mollifier one uses is symmetric. (This in turn is the case if we view η as the limit of covariances of smooth approximations to fractional Brownian motion.) For H = 1 2 , one similarly has b a η ′′ (t) ϕ(t) dt = ϕ 0 and b 0 η ′′ (t) ϕ(t) dt = 1 2 ϕ 0 , while for H ∈ ( 1 2 , 1), η ′′ is given by the locally integrable function t → −2α H |t| 2H−2 .
We now show that for any fixed ε > 0, the processes Z ε satisfy a suitable form of Hölder regularity. To keep notations shorter, we define the collection of processes indexed by R m -valued functions f that belong to the reproducing kernel space of the fractional Brownian motion B. Here {e i } is an o.n.b. of R m and B r = (B 1 r , . . . , B m r ). We start our analysis with some preliminary result for the irregular case H < 1 2 .

Lemma .
Let H ∈ ( 1 3 , 1 2 ) and let f ∈ C β ([0, 1], R m ) for some β > (1 − 2H) ∨ 0. The processes Z f satisfy the Coutin-Qian conditions [CQ , FV a, Def. ] in the sense that for all 0 < s < t < 1 and all h ∈ (0, t − s]. Proof. The mixed second order distributional derivative of E(B δ s B δ t ) is given by , the convolution of 1 2 η ′′ with a symmetric mollifier at scale δ. Mollifying B and taking limits shows that we have the identity (with summation over the components of f implied). For H < 1 2 , this yields the bound as required.
Regarding the covariance, we have when 0 < h ≤ t − s so the intervals overlap only at most one point, as required.
IfB is an independent copy of B, we can combine this result with those of [FV a] to conclude that there is a canonical rough path associated with the path ( f (r) dB(r), g(r) dB(r)) for any f, g ∈ C β ([0, 1], R m ). We now show that there also exists a canonical lift for (Z f , Z g ), where the integrals are defined with respect to the same fractional Brownian motion B. With this notation, we then have the following result. Then, for any finite collection given by ( . ) to a geometric rough path for every q ≥ 1. This is obtained by taking the limit as δ → 0 of the canonical lift of the smooth paths Z δ,f defined as in ( . ) but with B replaced by B δ .
Remark . As usual, "geometric" here means that Z f is the limit of canonical lifts of smooth functions. Indeed, for B δ the convolution of B with a mollifier at scale δ > 0, Z f is given by and this limit is independent of the choice of mollifier (and therefore "canonical").
Proof. We only need to show ( . ) for q = 2 since Z and Z belong to a Wiener chaos of fixed order. (Recall that the f i are considered deterministic here.) We start with the case H ∈ ( 1 3 , 1 2 ). LetB denote an independent copy of the fractional Brownian motion B and letZ f,g = (Z f ,Z g ) whereZ g is defined like Z g but with B replaced byB. By Lemma . , for any f, g ∈ C β , we can then apply [FV a, Thm ] to construct a second-order processZ f,g s,t satisfying the Chen identityZ f,g s,t −Z f,g s,u −Z f,g u,t = Z f s,uZ g u,t . It furthermore coincides with the Wiener integral which makes sense since the Coutin-Qian condition guarantees that r → Z f s,r belongs to the reproducing kernel space of Z g . It is furthermore such that smooth approximations to ( . ) (replace B andB by B δ andB δ , obtained by convolution with a mollifier at scale δ → 0) converge to it in L 2 . In particular,Z f,g belongs to the second Wiener chaos generated by (B,B) and is of the form of the limits considered in Proposition . . We now want to replaceB by B. For an approximation B δ as mentioned above, setting where η δ is an even δ-mollification of t → |t| 2H , so that in particular ∞ 0 η ′′ δ (s) ds = 0, since η ′ δ (0) = 0. Similarly to ( . ), we can then rewrite ( . ) as It follows immediately that one has which is bounded by some multiple of Combining this with Proposition . , we conclude that exists in probability, is independent of the choice of mollification, and satisfies the bound It now suffices to set Z f,ij s,t = Z f i ,f j s,t . Both the fact that Chen's relation holds and the fact that the resulting rough path is geometric follow at once from the fact that these properties hold for the smooth approximations.
Combining ( . ) with the fact that the rough path obtained from [FV a, Thm ] satisfies the bound ( . ) with α = H as a consequence of Lemma . , the claim follows.
We now turn to the case H = 1 2 where it is well known that Z δ,f,g s,t defined in ( . ) converges to the Stratonovich integral t s r s f (u) dB(u) g(r) • dB(r), so that A simple consequence of Hölder's inequality then leads to the bounds . This shows again that the bound ( . ) holds, this time with α = 1 2 − 1 p (and q arbitrary), and our condition on p guarantees that this is greater than 1 3 . For H > 1 2 , the first identity in ( . ) above combined with the positivity of the distribution −η ′′ and Hölder's inequality yields the bound Similarly, again as a consequence of the positivity of −η ′′ , we have the bound which shows again that ( . ) holds.
. Construction and convergence of the rough driver as δ → 0 The aim of this section is to construct the rough path Z ε (this is the content of Proposition . ) and to show that this construction enjoys good stability properties. This is done by stitching together the "canonical" rough path lift for the collection {Z ε (x)} x∈R d obtained in Proposition . . For H > 1 2 , this is just iterated Young integrals. For H = 1 2 the iterated integrals are considered in the Stratonovich sense and, thanks to the independence of Y and B, the first order process can be interpreted indifferently as either an Itô or a Stratonovich integral.
In order to make use of Proposition . , we use the following lemma, where Y ε t denotes the Markov process from Section . .

Lemma .
Let U be as in Proposition . and let Assumption . hold for some p ⋆ . When H < 1 2 , we further assume E ⊂ L p⋆ and β < H − 1 p⋆ , where U = C β . Then, given f ∈ E and settingf (t) = f (Y ε t ), one has for every p < p ⋆ the bound uniformly over E. (Here we use the convention p ⋆ = ∞ when H > 1 2 .) Proof. For H ∈ ( 1 3 , 1 2 ), the assertion follows immediately from Kolmogorov's continuity test, using Assumption . and E ⊂ L p . For H ≥ 1 2 , it suffices to note that if f ∈ L p (Y, µ), then for any fixed ε > 0 the mapf : We now show how to collect these objects into one "large" Banach space-valued rough path. The process itself will take values in B = C 3 b (R d , R d ), with the second order processes taking value in B Our aim is then to define a B ⊕ B 2 -valued rough path (Z ε , Z ε ) which is the canonical lift (in the sense of Proposition . for any finite collection of x's) of C Let {f ε i,x } i≤d be the collection of maps from R + to R m determined by With this notation at hand and recalling the construction of Z f and Z f,g as in the proof of Proposition . , it is then natural to look for a B ⊕ B 2 -valued rough path (Z ε , Z ε ) such that, for every x,x ∈ R d , the identities hold almost surely. (Provided that F (x, ·) ∈ E for every x, the right-hand sides make sense by combining Lemma . with Proposition . .) We claim that this does indeed define a bona fide infinite-dimensional rough path Recall also that a rough path Z = (Z, Z) is weakly geometric if the identity holds, where the transposition map (·) ⊤ : B ⊗ B → B ⊗ B swapping the two factors is continuously extended to B 2 . With these notations, we have the following.
Proof. Since the Chen relations and ( . ) are obviously satisfied for smooth approximations as in ( . ), we only need to show that the analytic constraints hold. In other words, for any fixed T > 0 and ε > 0, we look for an almost surely finite random variable C ε such that holds uniformly over all 0 ≤ s < t ≤ T . By the Kolmogorov criterion for rough paths [FH , Thm . ], it suffices to show that, for some β > 0 and p ≥ 1 such that γ − 1 p > α, one has the bounds By Lemma A. below, it suffices to show that for k + ℓ ≤ 4 and some p such that p > (4d/κ) ∨ (γ − α) −1 .
(1 + |x|) −κ by Assumption . , it follows immediately from Proposition . combined with Lemma . that the bound ( . a) holds for γ = H and p ≤ p ⋆ when H < 1 2 and for any γ < H and p ≥ 1 when H ≥ 1 2 . The bound ( . b) follows in the same way. (These arguments are somewhat formal, but can readily be justified by taking limits of smooth approximations.) In order to prove Proposition . , we make use of the following variant "in probability" of the usual tightness criterion for convergence in law.
Proposition . Let (Z, d) be a complete separable metric space and let {L k : k ∈ N} be a countable collection of continuous maps L k : Z → R that separate elements of Z in the sense that, for every x, y ∈ Z with x = y there exist k such that L k (x) = L k (y).
Let {Z n } n≥0 and Z ∞ be Z-valued random variables such that the collection of their laws is tight and such that L k (Z n ) → L k (Z ∞ ) in probability for every k. Then, Z n → Z ∞ in probability.
It follows that, for every n ≥ N one has which implies the claim.
Regarding tightness itself, the following lemma is a slight variation of well known results.

C
Proof. Write G for the metric space given by B ⊕ B 2 endowed with the metric Recall then that C α ([0, T ], B ⊕ B 2 ) can be identified with the usual space of α-Hölder functions with values in G by identifying Z = (Z, Z) with the function t → Z t def = Z 0,t ⊕ Z 0,t and noting that, thanks to Chen's relations, (See [FV b, Sec. . ] for more details and motivation.) Since d generates the same topology on G as that given by the Banach space structure of B ⊕ B 2 , balls ofB ⊕B 2 are compact in G. The claim then follows at once from Kolmogorov's continuity test, combined with the fact that, given a compact metric space (X , d) and a compact subset K of a Polish space (Y,d), the set C β (X , K) is compact in C α (X , Y) for any β > α.
Proof of Proposition . . We apply Proposition . with the metric space Z given by C α ([0, T ], B⊕B 2 ), Z n = Z ε,δn for any given sequence δ n → 0, and Z ∞ = Z ε as constructed in Proposition . . The continuous maps L k appearing in the statement are given by the collection of maps (Z, Z) → Z t (x) and (Z, Z) → Z s,t (x,x) for a countable dense set of times s and t and elements x,x ∈ R d . It follows from ( . ) and Lemma A. that the bound ( . ) holds for Z ε,δ , uniformly in δ (but with ε fixed), so that the required tightness condition holds by Lemma . . For any fixed ε > 0, the convergences in probability Z ε,δ t (x) i → Z ε t (x) i and Z ε,δ s,t (x,x) ij → Z ε s,t (x,x) ij were shown in Proposition . . (It suffices to apply it with the choices f = f ε i,x and g = f ε j,x .)

. Formulation of the main technical result
The main technical result of this article can then be formulated in the following way.

Theorem .
Let H ∈ ( 1 3 , 1), let Assumptions . -. , and . hold, and let α and Z ε be as in Proposition . . Then, as ε → 0, Z ε converges weakly in the space of α-Hölder continuous (B, B 2 )-valued rough paths to a limit Z. Furthermore, there is a Gaussian random field W as in ( . ) such that , interpreted as an Itô integral and Σ is as in ( . ).
The proof of this result will be given in Sections and below, see Proposition . which is just a slight reformulation of Theorem . . We first show in Section that the family {Z ε } ε≤1 is tight in a suitable space of rough paths and then identify its limit in Section .

Corollary .
Under the assumptions of Theorem . , the solution flow of ( . ) converges weakly to that of the Kunita-type SDE ( . ).
Proof. Define the B-valued process It follows from Assumption . that, for k ≤ 4 and p ≤ p ⋆ , uniformly over ε. This shows that the family {Z 0,ε } ε≤1 is tight in C β ([0, 1], B) for every β < 1. Furthermore, by the ergodic theorem which holds under Assumption . , for every x, almost surely. Since we can choose β and α such that α + β > 1 and 2β > 1, it follows that there is no need to control any cross terms between Z 0,ε and either Z ε or Z 0,ε itself in order to be able to solve equations driven by both [Lyo , Gyu ]. Furthermore, since the limit of Z 0,ε is deterministic, one deduces joint convergence from Theorem . . By the continuity theorem for rough differential equations, the solutions of ( . ) written in the form ( . ) converge weakly to those of the rough differential equation It remains to identify solutions to this equation with those of ( . ). This is straightforward and follows as in [FH , Sec . ] for example. Since the Gubinelli derivative x ′ of the solution X = (x, x ′ ) to ( . ) is given by δ(x), the integral t 0 δ(X s )dZ s is obtained as limit of the compensated Riemann sum [u,v] where P is a partition on [0, t] and Dδ · δ is as in ( . ). Since x is continuous and adapted to the filtration generated by W , the first term converges to The last term on the other hand converges to 0 in probability since it is a discrete martingale and its summands are centred random variables of variance O(|v − u| 2 ).

Tightness of the rough driver as ε → 0
The content of this section is the proof of the following tightness result. Let {Z ε } ε≤1 be given as in Proposition . . Let Assumptions . -. , and . hold. , 1), if in addition F (x, y) µ(dy) = 0 for every x, then the family of rough paths Z ε is tight in C α for every α ∈ ( 1 3 , 1 2 ).

Proposition .
It will be convenient to introduce the following notation. Given f, g ∈ E, we use the shorthand We then have the following tightness criterion.
Proof of Proposition . . The arguments are quite different for the different ranges of H, but they will always reduce to verifying the assumptions of Lemma . . First let H ∈ ( 1 3 , 1 2 ). The first assumption of Lemma . follows from Proposition . below with α 0 = H and from the trivial bound while the second assumption follows from Proposition . below. Both hold for any p ≤ p ⋆ /4 where p ⋆ > max{4d, 6 (3H−1) }, and the proofs of the propositions are the content of Section . .
The ingredients for showing tightness of Z ε where H ∈ ( 1 2 , 1) are given in Section . , starting with a bound on J analogous to that of Proposition . . Unlike in the proof of that statement though, we do not show this by bounding the conditional variance E(|J ε s,t (f )| 2 | F Y ). This is because, as a consequence of the lack of integrability at infinity of η ′′ when H > 1 2 , it appears difficult to obtain a sufficiently good bound on it, especially for H close to 1. (In particular, the best bounds one can expect to obtain from a quantitative law of large numbers don't appear to be sufficient when H > 3 4 .) The required bounds are collected in Corollary . which yields the assumptions of Lemma . with α 0 = 1 2 and arbitrary p.
Finally we take α 0 = 1 2 when H = 1 2 , then J ε s,t (f ) L p ≤ t s |f (Y ε r )| 2 dr C f E |t − s| by Burkholder-Davies-Gundy inequality, and similarly the second order processes satisfies the bound: allowing us to again apply Lemma . and concluding the proof.

. The low regularity case
This section consists of a number of a priori moment bounds, which we then combine at the end to provide the proof of Proposition . . These uniform in ε moment bounds follow from the Hölder continuity of Y in a subspace of L p * for a sufficiently large p * , in particular ergodicity of Y does not play any role. We will make repeated use of the following simple calculation, where we recall ( . ) for the definition of the distribution η.
T ε → 0 Lemma . Given t > 0 and H < 1 2 , let Ψ : [0, t] 2 → R be a continuous function such that for some numbers ε > 0, β > −2H, γ, ζ > 1 − 2H, and C,Ĉ,C ≥ 0 it holds that Proof. Let I be the double integral appearing in ( . ). As a consequence of Lemma . , we can rewrite it as We then have We used the conditioned imposed on β, γ and ζ. Regarding I 2 , we have the bound and the claim follows.
T ε → 0 Remark . The proof of Lemma . works mutatis mutandis for Ψ taking values in a Banach space, for example L p . We also see that if Ψ is upper bounded by a finite sum of terms of the type ( . ) with different exponents β and γ, then the bound ( . ) still holds with the corresponding sum in the right-hand side.
We perform a number of preliminary calculations. For this, it will be notationally convenient to introduce the shortcuts Proposition . Let H ∈ ( 1 3 , 1 2 ) and let Assumptions . and . hold for some p ⋆ ≥ 2. Then there exists a constant C such that, uniformly over s ≥ 0, t ≥ 0, and Proof. Let F Y denote the σ-algebra generated by all point evaluations of the process Y , and F Y t the corresponding filtration. Write p for p ⋆ for brevity. Since B is independent of F Y and the L p norm of an element of a Wiener chaos of fixed degree is controlled by its L 2 norm, we have for some universal constant c depending only on p and on the degree of the Wiener chaos, so that Since f ∈ E is in L p by Assumption . , it follows from Assumption . that We can therefore apply Lemma . with γ = H and β = 0 so that, for f E ≤ 1, one has whence the desired bound follows. (The condition γ > 1 − 2H is satisfied since H > 1 3 by assumption.) We now consider the second-order process J given by and bound it in a similar way. Recalling that f, g µ = Y f, g dµ, we first obtain a bound on its expectation. T ε → 0 Proposition . Let H ∈ ( 1 3 , 1 2 ), let Assumptions . and . hold for some p ⋆ ≥ 2, and let f, g ∈ E. One has Proof. It follows from ( . ) that we have the identity and we conclude from Lemma . and the bound g( Proposition . Let H ∈ ( 1 3 , 1 2 ), let Assumptions . and . hold for some p ⋆ ≥ 2, let f, g ∈ E, and let 2p ≤ p ⋆ . Then there exists a constant C such that

If g is a constant, one obtains a stronger upper bound of the form
Proof. By Proposition . , it suffices to obtain a bound on As a consequence of Proposition . and ( . ), we have the bound for a fractional Brownian motionB independent of B (and Y ). We furthermore restrict ourselves to the case s = 0 and m = 1 without loss of generality. At this point we note that for every H > 1 3 , one has the identity where we have set As a consequence of ( . ), we deduce from ( . ) the bound We now bound ϕ ε in such a way that Lemma . (combined with Remark . ) applies with Ψ = ϕ ε . In order to apply this result, we first verify that the first bound in ( . ) holds. Applying Hölder's inequality we obtain for 2p ≤ p ⋆ , Since the last factor is the same expression as the right-hand side of ( . ), it is bounded as in ( . ), thus yielding Regarding the second bound in ( . ), we note that, for s ′ ≥ s and α > 1 3 , one has Since 2p ≤ p ⋆ and s 0 s ′ s |r − r ′ | 2H−2 dr dr ′ |s ′ − s| 2H , the L p/2 norm of the first term is of order ε 2−4H f 2 E g 2 E |s ′ −s| 2H . By Hölder's inequality, the second term is bounded similarly to before by By Assumptions . and . , the factors involving g are bounded by while the remaining factor is the same is in ( . ), thus yielding a bound of the order Applying Lemma . (and Remark . ) and inserting the resulting bound into ( . ), eventually yields the bound as desired. (Note that the second term is bounded by the first and the last one which is why it was omitted in the statement.) In case g is a constant, the second term in the expression for ϕ ε (s, s ′ ) − ϕ ε (s, s) vanishes identically. Since this is the term responsible for the summand proportional to |t| 2 in ( . ), the claim follows.
T ε → 0 . The regular case H ∈ ( 1 2 , 1) For the case where the slow variables are driven by a fractional Brownian motion of higher regularity, H > 1 2 , we exploit the ergodicity of the fast motion even for proving tightness for the first order processes.
To prove the tightness of the processes Z ε t , we take a different strategy and estimate higher order moments of the Z ε s,t and Z ε s,t . This requires us to estimate the expectation of multiple integrals of the form For the second order processes, half of the upper limits of the integrals are given by one of the t i 's, but since we will not need to exploit any cancellations these integrals are controlled by the bounds on the hypercube. For p = 1, it is easy to see that this integral is of order ε 2H−1 t, but the case p = 2 is already more complicated: If we look at the regime t 1 < t 2 < t 3 < t 4 say and write P ε t = P t/ε , the first factor is given by where now the first term is bounded by exp(−c|t 4 − t 2 |/ε) and the second term is bounded by exp(−c|t 4 − t 3 |/ε − c|t 2 − t 1 |/ε). This is still not optimal: we note this time that we can recenter f 2,3,4 = f 2 P ε t 3 −t 2 (f 3,4 −f 3,4 ) "for free" since f 1 has mean zero, so the first term is actually of order exp(−c|t 4 − t 1 |/ε). It is then not too difficult to see that, the contribution of the second term of ( . ) to the integral ( . ) is of order ε 4H−2 t 2 , while the contribution of the first term is ε 4H−1 t, which is of lower order for t ≥ ε. Our aim is to generalise such considerations to arbitrarily high moments.
In particular, the "correct" way of rewriting the factor E( 2p i=1 f i (Y ε t i )) so that it yields usable bounds is in terms of its cumulants. Given a collection {X i } i∈I of T ε → 0 random variables and a subset A ⊂ I, we write X A as a shorthand for the collection {X i } i∈A and X A as a shorthand for i∈A X i . Given a finite set A, we write P(A) for the set of partitions of A. We also write E c X A for the joint cumulant, so that one has the identities where C ∆ = (|∆| − 1)!(−1) |∆|−1 .
Then, one has the bound Proof. Note first that since c is allowed to depend on k, it actually suffices to show that E c X . From now on we fix i ⋆ ∈ {1, . . . , k} to be the index which realises that supremum. LetỸ be an independent copy of Y and setX The most important property of the joint cumulant of a collection of random variables is that if it can be broken into two independent sub-collections, then the joint cumulant vanishes. As a consequence, we have We now put a total order on the elements of a partition ∆ by postulating that A 1 ≤ A 2 whenever inf{a ∈ A 1 } ≤ inf{a ∈ A 2 } (this is just for definiteness, the actual choice of order is unimportant). We can then write the above as a telescoping sum, yielding ( . ) We fix A ⊂ [k] such that EX A = EX A and write A = {a 1 , . . . , a ℓ } with ℓ = |A| and i → a i increasing. We also write j ⋆ < ℓ for the index such that a j⋆ ≤ i ⋆ and a j⋆+1 > i ⋆ . (This necessarily exists since otherwise EX A = EX A .) For i < ℓ and n ≥ 1, we also write T i : E n → E n+1 for the operator given by T ε → 0 whose norm, as an operator from E n to E n+1 , is bounded by a (possibly ndependent) multiple of f a i E , since it is of order f a i E e −ct i from E × E n to E n+1 , when restricted to functions of vanishing mean, by Assumption . . We used the continuity of multiplication of functions. It then follows from the Markov property that (this is easily shown by induction over ℓ) while we similarly have by the definition ofX This is because, setting A 1 = {a 1 , . . . , a i⋆ } and A 2 = A \ A 1 , one has EX A = EX A 1 EX A 2 by the definition ofX, so that ( . ) follows from ( . ). Writing The spectral gap assumption ( . ) and the definition of i ⋆ then imply that Combining this with ( . ) immediately leads to the claimed bound on the corresponding cumulant.
The first identity of ( . ) combined with Wick's formula for the moments of Gaussians now suggest that we should rewrite the expectation of ( . ) as a sum over terms indexed by pairs (∆, π) where ∆ is a partition of [2p] arising from ( . ) and representing a product of cumulants of the f (Y t i ) and π is a pairing of [2p] arising from Wick's formula.
Each pairing (i, j) yields a factor |s i − s j | 2H−2 while each element B of the partition yields an exponential factor of the form i,j∈B exp ( − c|s i − s j |) thanks to Proposition . . Since we consider the case H > 1 2 , this yields a locally integrable function in the expression for the expectation of ( . ), so our analysis mainly focuses on the large-scale behaviour. We will show then that the terms with ∆ = π, yield a contribution of order ε (2H−1)p t p which dominates our bound, while all other terms are of higher order in ε/t. We now proceed to formalising this.
Let G = (V, E) be a graph with vertex set V and edge multiset E (multiple edges are allowed). Edges e ∈ E are oriented from e − to e + and we only consider graphs with e + = e − . We also label the vertices by two exponents α ± : E → R − . Finally, we assume that we have a "kernel assignment", i.e. a collection of functions K e : R → R (with e ∈ E) such that |K e (t)| ≤ C(|t| α − (e) 1 |t|≤1 + |t| α + (e) 1 |t|≥1 ) .
( . ) We denote by K e e the smallest possible constant C appearing in the above expression. For those who are not familiar with such graph representations, such a graph will be used to encode expressions of the type Each of its nodes u represents an integration variable s u , each edge e represents a factor K e , and the resulting expression is integrated against some bounded function ϕ. The exponents α − (e) and α + (e) indicate the singularity of K e at 0 and at infinity respectively.

Definition .
We say that such a labelled graph is "regular" if, for every subset The significance of this condition (also called Weinberg's condition) is that it guarantees that the function K G on R V given by is locally integrable [Wei ] where s e ± is the e ± component of s ∈ R V . (See also [Hai , Prop. . ] for a formulation closer to the one given here.) We will be mainly interested in the large-scale behaviour here. To describe this, consider a partition P of V. We say that such a partition is tight if there exists A ∈ P such that A ∩ V i = for every connected component V i of G. Given P, we then also write u ∼ v if there exists A ∈ P with {u, v} ⊂ A.
Definition . We then say that a labelled graph as above is "integrable" if for every tight partition P. (Note the similarity with Weinberg's condition.) T ε → 0 The following is then an immediate consequence of [Hai , Thm . ].
Proposition . Let G be a regular and integrable graph with m connected components. Then, there exists C depending on G such that uniformly over L ≥ 1, with proportionality constant depending only on the labelled graph G, where K e e is as defined by ( . ).
Remark . Our bound on the large-scale behaviour of the kernels K t is weaker than the bound [Hai , Eq. . ] since we assume no bounds on the derivatives. The reason why the result still holds is that we assume local integrability, which avoids all renormalisation issues and therefore gets rid of regularity requirements.
An immediate, but very useful, corollary is the following.

Corollary .
Let G be a regular graph with m connected components and let L ≥ 1. Let β : E → R + be such that for every tight partition P. Then, there exists C depending on G and β such that Proof. It suffices to note that we can assume that the kernels K e vanish outside of [−2L, 2L] since this does not affect the value of the integral. If we then consider the graph identical to G, but with its labels replaced by (α − , α + − β), then ( . ) implies integrability for the new graph by ( . ). The local integrability condition (regularity) still holds, so Proposition . applies. It remains to note that since 1 ≤ (L/|t|) β 1 1≤|t|≤L , decreasing α + (e) by β(e) in ( . ) has the effect of increasing the norm · e by (at most) a factor (2L) β(e) , provided that we do consider functions supported in [−2L, 2L].
We will make use of the following property.
Lemma . LetG be a graph obtained by deleting some of the edges of G but without changing its connected components. IfG is integrable, then so is G itself.
Proof. This is immediate from Definition . , combined with the fact that the α + (e) are negative by assumption.
The following simple result will also be useful.
Proof. Let P be a tight partition of the vertex set V of G and let G P = (V P , E P ) denote the graph obtained by removing self-loops from G/∼, with ∼ obtained from P as in ( . ). Then G P is connected by the definition of tightness so that |E P | ≥ |V P | − 1, which translates into |{e ∈ E : e − ∼ e + }| ≥ |P| − 1. Since α + (e) < −1 for every edge, the bound ( . ), and therefore the desired claim, then follow at once.
We now use these preliminary results both to bound J and J and to determine their limits in the case H > 1 2 . Our main technical result is the following bound.
Proposition . Let Assumptions . and . hold for H > 1 2 and let κ ∈ (0, 2 − 2H). For f, g ∈ E with f dµ = gdµ = 0, set Then, for every p ≥ 1 and f ∈ E 2p with f i dµ = 0 for every i, setting there exists a constant K > 0 such that Proof. We fix p and write I as a shorthand for I 2p (f ). The properties of cumulants show that, setting X i = f i (Y t i ) as previously, I is given by Note first that since the f i are centred, we have I ∆ = 0 unless |A| ≥ 2 for every A ∈ ∆. There is furthermore one special partition, namely ∆ ⋆ = {{2k − 1, 2k} : k ∈ [p]}. For the summand generated by this 'base' partition we have I ∆⋆ = p k=1 I(2k − 1, 2k), where we set We then note that, for a < b and f, g ∈ E centred, it follows from the spectral gap assumption and the fact that E, E 1 ⊂ L 2 (µ) by Assumption . , that for some fixed constant c. It follows from ( . ) that and that I ∆⋆ differs from the desired expression in the statement by an error of at most O(L p−1 ).
Since I = ∆∈P([2p]) I ∆ and we already obtained ( . ) for I replaced by I ∆⋆ , it remains to show that |I ∆ | L p−κ for every partition ∆ = ∆ ⋆ with κ as in the statement. Fix such a partition ∆ from now on and write again ∼ for the equivalence relation induced by ∆ on [2p]. We then define a graph G ∆ with vertex We furthermore assign kernels to the edges of G ∆ by so that Proposition . yields the bound The kernel assignment ( . ) is consistent with the exponents given by whence it immediately follows that G ∆ is regular. It now remains to find a function β : E → R + allowing us to apply Corollary . to G ∆ . For this, we construct a set T and set β(e) = 1−κ for e ∈ T and 0 otherwise. To construct T , consider the graphĜ ∆ which hasV := ∆ as its vertex set and such that its edge setÊ is given bŷ where π ∆ : [2p] → ∆ maps an element to the unique element of the partition ∆ that contains it. In other words,Ĝ ∆ is obtained by quotienting G ∆ by the partition ∆ and then removing self-loops. Let now T ⊂ E B be such thatT = π ∆ T is a maximal spanning forest forĜ ∆ . In the case of ( . ) for example, one could take T = {(1, 2), (5, 6)}. With κ as in the statement of the proposition, we now set β(e) = 1 − κ for e ∈ T and 0 otherwise. The reason why this choice of β satisfies ( . ) for the graph G ∆ is that by construction the labelling γ = α + − β is such that G ∆ contains a spanning forestT consisting of edges e with γ(e) = 2H −3+κ < −1.
(To build a reduced set of edges from E = E B ∪E ∆ , we start with T and then connect its components using edges in E ∆ .) It then remains to first apply Lemma . to reduce ourselves to considering G ∆ and then apply Lemma . .
Denote now by m the number of connected components ofĜ ∆ and note that since every element of ∆ is of size at least 2,Ĝ ∆ has at most p vertices. It follows that the number of elements in T equals at most p − m, so that Corollary . yields the bound I ∆ L m L (1−κ)(p−m) = L p−κ(p−m) , which is bounded by L p−κ unless m = p. Since the only partition ∆ yielding m = p is the complete pairing ∆ ⋆ , the claim follows at once.
Proof. For integer p ≥ 1, we note that as a consequence of Wick's theorem and the fact that Y is independent of B, one has the identity, η ′′ (s 2k − s 2k−1 ) ds .

Identification of the limit
In this section we complete the proof of Theorem . by identifying the limit (in law) of Z ε as ε → 0. The proof proceeds in two steps. First, in Proposition . , we show that the first-order process Z itself converges in law to a limit W with covariance given as in ( . ). In a second step, we then exploit martingale techniques, and in particular [JS ], to obtain convergence of the second-order process Z to the limit described in Theorem . . Recall that, by ( . ), (Z ε s,t (x)) i = ε

Proposition .
In the setting of Proposition . , the family of random rough paths Z ε converges in law, as ε → 0, to the unique (in law) random rough path Z such that the following hold. The process Z is a B-valued Wiener process with covariance given by with Σ as defined in ( . ). The "second-order" process Z is the B 2 -valued process such that for any x, where the integral is interpreted in the Itô sense.
Proof. The convergence in distribution of any finite collections of the stochastic processes follows from Proposition . below. By Proposition . , (Z ε , Z ε ) is tight in C α ([0, T ], B ⊕ B 2 ) for suitable α ∈ ( 1 3 , H), so the weak convergence holds with respect to the rough path norm on C α ([0, T ], B ⊕ B 2 ).

. Law of large numbers
We will need the following quantitative version of the law of large numbers. Let E ⊂ E 1 ⊂ E 2 be Banach spaces of functions Y → R containing constants and such that pointwise multiplication from E × E 1 into E 2 is continuous.

Lemma .
Let E ⊂ L 4 and E 2 ⊂ L 2 , let the spectral gap condition ( . ) hold for n = 1, 2, and let f, g ∈ E. Then, the bound T 0 f (Y s )g(Y s+t ) ds − T f, P t g µ L 2 (1 + t)T f E g E , holds uniformly over t, T ∈ R + with a proportionality constant depending only on the constants appearing in the two assumptions.
Proof. Writing f s as a shorthand for f (Y s ) and similarly for g, the square of the left-hand side of ( . ) is given by 2 T 0 r 0 (E(f s g s+t f r g r+t ) − E(f s g s+t )E(f r g r+t )) ds dr .
Since E ⊂ L 4 , Hölder's inequality shows that the integrand is bounded by some multiple of f 2 E g 2 E . It thus follows from the triangle inequality that the required bound follows for t ≥ T so we assume t ≤ T from now on. Using the same bound on the integrand, we can further restrict the inner integral to impose s + t ≤ r at an additive cost of order at most tT f 2 E g 2 E . On that smaller domain, we can then rewrite the integrand as E(f s g s+t (P r−s−t (f P t g) − f, P t g µ )(Y s+t )) .
which does indeed converge to 0 as ε → 0 as desired.
It remains to consider the case H < 1 2 , so we restrict ourselves to this case from now on. Regarding δϕ ε (s, s ′ ) = ϕ ε (s, s ′ ) − ϕ ε (s, s) for s ′ > s (the case s < s ′ is analogous), we write δϕ ε = δϕ (i) ε with We obtain the bound δϕ (1) ε (s, s ′ ) L 1 ε 2−4H In view of ( . ) and Assumption . , we obtain for δϕ (5) ε the bound δϕ (5) ε (s, s ′ ) L 1 ε 2−3H |s ′ − s| H , using f ε s ′ − f ε s L p (|s − s ′ |/ε) H to obtain the increment in time. In order to bound δϕ (6) ε , we note that one has the bound As a consequence of Assumption . , we thus obtain the bound Similarly, we defineB 2 as the space of functions on R d × R d such that ) < ∞ .
We then have the following.
Lemma A. The embeddingsB ⊂ B andB 2 ⊂ B 2 are compact for any ζ ∈ (0, 1) and κ > 0. Furthermore, there exists a constant C such that for any random C 3 functions Z andZ one has the bound provided that p ≥ d, ζ < 1 − d/p, and κ > 4d/p.
Proof. The compactness statement is a routine modification of Arzelà-Ascoli. Regarding the first bound, it follows from Kolmogorov's continuity criterion [RY , Thm . , p ] that, writing K for the right-hand-side of (A. ), there is C > 0 such that E Z p C 3+ζ x ≤ C(1 + |x|) −κp K , provided that 0 < ζ < 1 − d/p. We then cover R d with balls of diameter 1 and note that a norm equivalent to that ofB is obtained by restricting the supremum in (A. ) over the centres of these balls. Since κ > 2d/p, we can then simply bound the supremum by the sum, yielding as claimed. The second bound is identical, except that d is replaced by 2d.
[BGS ] S. B , S. G , and K. S . Typical dynamics and fluctuation analysis of slow-fast systems driven by fractional brownian motion, . arXiv:1906.02131.