Optimal rate of convergence for stochastic Burgers-type equations

Recently, a solution theory for one-dimensional stochastic PDEs of Burgers type driven by space-time white noise was developed. In particular, it was shown that natural numerical approximations of these equations converge and that their convergence rate in the uniform topology is arbitrarily close to $\frac{1}{6}$. In the present article we improve this result in the case of additive noise by proving that the optimal rate of convergence is arbitrarily close to $\frac{1}{2}$.


Introduction
The goal of this article is to study numerical approximations of stochastic PDEs of Burgers type on the circle T = R/(2πZ) given by du = [ν∆u + F (u) + G(u)∂ x u] dt + σdW (t) , u(0) = u 0 . (1.1) Here, u : R + × T × Ω → R n , where (Ω, F, P) is a probability space, ∆ = ∂ 2 x is the Laplace operator on the circle T, the derivative ∂ x is understood in the sense of distributions, the function F : R n → R n is of class C 1 , the function G : R n → R n×n is of class C ∞ , and ν, σ ∈ R + are positive constants. Finally, W is an L 2 -cylindrical Wiener process [DPZ02], i.e. equation (1.1) is driven by spacetime white noise. The product appearing in the term G(u)∂ x u is matrix-vector multiplication.
The difficulty in dealing with (1.1) comes from the nonlinearity G(u)∂ x u and is caused by the low space-time regularity of the driving noise. Indeed, it is wellknown that the pairing is well defined if and only if α + β > 1 (see Appendix A and [BCD11]). On the other hand, one expects solutions to (1.1) to have the spatial regularity of the solution of the linearised equation dX(t) = ν∆Xdt + σdW (t) . (1.2)

INTRODUCTION
For any fixed time t > 0, the solution to the stochastic heat equation (1.2) has Hölder regularity α < 1 2 , but is not 1 2 -Hölder continuous (see [Wal86,DPZ02,Hai09]). This implies in particular that the product G(X)∂ x X is not well-defined in this case, and it is not a priori clear how to define a solution to the equation (1.1).
In the case G ≡ 0 this problem does of course not occur. Equations of this type and their numerical approximations were well studied and the results can be found in [Gyö98b,Gyö99]. Moreover, it was shown in [DG01] that the optimal rate of uniform convergence in this case is 1 2 − κ, for every κ > 0, as the spatial discretisation tends to zero.
For non-zero G, the difficulty can easily be overcome in the gradient case, i.e. when G = ∇G for some smooth function G : R n → R n . In this case, postulating the chain rule, the nonlinear term can be rewritten as which is a well-defined distribution as soon as u is continuous. The existence and uniqueness results in the gradient case can be found in [Gyö98a,DPDT94]. In the article [AG06], the finite difference scheme was studied for the case G(u) = u, and L 2 -convergence was shown with rate γ, for every γ < 1 2 . The same rate of convergence was obtained in [BJ13] in the L ∞ topology for Galerkin approximations.
For a general sufficiently smooth function G, a notion of solution was given in [Hai11]. The key idea of the approach was to test the nonlinearity with a smooth test function ϕ and to formally rewrite it as π −π ϕ(x)G (u(t, x)) ∂ x u(t, x) dx = π −π ϕ(x)G (u(t, x)) d x u(t, x) . (1.4) As it was stated above, we expect u to behave locally like the solution to the linearised equation (1.2). It was shown in [Hai11] that the latter can be viewed in a canonical way as a process with values in a space of rough paths. This correctly suggests that the theory of controlled rough paths [Gub04,Gub10] could be used to deal with the integral (1.4) in the pathwise sense. The quantity (1.4) is uniquely defined up to a choice of the iterated integral which represents the integral of u with respect to itself. This implies that for different choices of the iterated integral we obtain different solutions, which is similar to the choice between Itô and Stratonovich stochastic integrals in the theory of SDEs. In the present situation however, there is a unique choice for the iterated integral which respects the symmetry of the linearised equation under the substitution x → −x, and this corresponds to the "Stratonovich solution". This natural choice is also the one for which the chain rule (1.3) holds in the particular case when G is a gradient. Using the rough path approach, numerical approximations to (1.1) in the gradient case without using the chain rule were studied in [HM12]. It was shown that the corresponding approximate solutions converge in suitable Sobolev spaces to a limit which solves (1.1) with an additional correction term, which can be computed explicitly. This term is an analogue to the Itô-Stratonovich correction term in the classical theory of SDEs.
In [HW13], the solution theory was extended to Burgers-type equations with multiplicative noise (i.e. when the multiplier of the noise term is a nonlinear local function θ(u) of the solution). Analysis of numerical schemes approximating the equation in the multiplicative case was performed in [HMW14], where the appearance of a correction term was observed and the rate of convergence in the uniform topology was shown to be of order 1 6 − κ, for every κ > 0. In this article, we prove that in the case of additive noise the rate of convergence in the supremum norm is 1 2 − κ, for every κ > 0. Actually, it turns out to be technically advantageous to consider convergence in Hölder spaces with Hölder exponent very close to zero. The main difference to [HMW14] is that we cannot use the classical theory of controlled rough paths which applies only in the Hölder spaces of regularity from 1 3 , 1 2 , to approximate the rough integral (1.4). To show the convergence in the Hölder spaces of lower regularity, we use the results from [Gub10], which generalize the theory of controlled rough paths for functions of any positive regularity.

Assumptions and statement of the main result
As before we assume that F ∈ C 1 and G ∈ C ∞ in (1.1). For ε > 0 we consider the approximate stochastic PDEs on the circle T given by Here, the operators ∆ ε , D ε and H ε are defined as Fourier multipliers providing approximations of ∆, ∂ x and the identity operator respectively, and are given by Below we provide the assumptions on the functions f , g and h. We start with the assumptions on f . Assumption 1. The function f : R → (0, ∞] is even, satisfies f (0) = 1, is continuously differentiable on the interval [−δ, δ] for some δ > 0, and there exists c f ∈ (0, 1) such that f ≥ c f . Furthermore, the functions b t given by b t (x) := exp −x 2 f (x)t are uniformly bounded in t > 0 in the bounded variation norm, i.e. sup t>0 |b t | BV < ∞.
Our next assumption concerns g, which defines the approximation to the spatial derivative.

INTRODUCTION
In particular, the approximate derivative can be expressed as where we identify u : T → R with its periodic extension to all R. Our last assumption is on the function h, which defines the approximation of noise.
Assumption 3. The function h is even, bounded, and such that h 2 /f and h/(f + 1) are of bounded variation. Furthermore, h is twice differentiable at the origin with h(0) = 1 and h ′ (0) = 0.
The difference with the assumptions in [HMW14] is that we require in Assumption 2 all the moments of the measure µ to be finite and in Assumption 3 the function h/(f + 1) to be of bounded variation. We use the latter assumption in Lemma 4.1 in order to use the bounds on lifted rough paths obtained in [FGGR13]. All the examples of approximations provided in [HM12] (including finite difference schemes) still satisfy our assumptions.
Letū be the solution to the modified equation (1.1), where, for i = 1, . . . , n, the modified reaction term is given bȳ Here, we denote by G i the ith row of the matrix-valued function G, and the correction constant is defined by It follows from the assumptions that Λ is well-defined. In fact, the Assumption 3 says that |h 2 /f | is bounded, and by the Assumption 2 the measure µ has a finite second moment, what yields the existence of Λ.
As we do not assume boundedness of the functions F and G, and their derivatives, the solution can blow up in finite time. To overcome this difficulty we consider solutions only up to some stopping times. More precisely, for any K > 0 we define the stopping times where · C 0 is the supremum norm. The blow-up time ofū is then defined as τ * := lim K↑∞ τ * K in probability. Our main theorem gives the convergence rate of the solutions of the approximate equations (1.5) to the solution of the modified equation (1.6). Theorem 1.1. Let for every 0 < η < 1 2 the initial values satisfy Moreover, we assume that for every α > 0 small enough the following estimate holds where the proportionality constant can depend on α. Then, for every such α > 0, there exists a sequence of stopping times τ ε satisfying lim ε↓0 τ ε = τ * in probability, such that the following convergence holds Remark 1.2. The rate of convergence obtained in [HMW14] was "almost" 1 6 , in the sense that it is 1 6 − κ for any κ > 0. To improve this result we consider convergence of the solutions in the Hölder spaces of the regularities close to zero. This approach creates difficulties when working with the rough integrals (1.4). In fact, the bounds on the rough integrals, in particular in [HMW14,Lemma 5.3], hold only in the Hölder spaces C α with α ∈ 1 3 , 1 2 and the norms explode as α approaches 1 3 . To have reasonable bounds in the Hölder spaces of lower regularity, we have to include into the definition of the rough integrals the iterated integrals of the controlling process X of higher order. In [HMW14] it was enough to consider only the iterated integrals of order two. In particular, the smaller α is in Theorem 1.1, the more iterated integrals we have to consider to define the rough integral (1.4) (see Section 2 for more details). If the function G is only of class C p for some p ≥ 3, we can consider the iterated integrals of X only up to the order p−1 (see Subsection 4.1). As a consequence, the argument in the proof of Theorem 1.1 gives the rate of convergence only "almost" 1 2 − 1 p . This is precisely the rate of convergence obtained in [HMW14], where p was taken to be 3. Remark 1.3. By changing the time variable and the functions in (1.1) by a constant multiplier, we can obtain an equivalent equation with ν = 1. Moreover, we can assume σ = 1. In what follows we only consider these values of the constants.

Structure of the article
In Section 2 we review the theories of rough paths and controlled rough paths. Section 3 is devoted to the results obtained in [Hai11]. In particular, here we provide a notion of solution and the existence and uniqueness results for the Burgers type equations with additive noise. In Section 4 we define the rough integrals and formulate the mild solution to the approximate equation (1.5) in a way appropriate for working in the Hölder spaces of low regularity. The proof of Theorem 1.1 is provided in Section 5. The following sections give bounds on the corresponding terms in the equations (1.6) and (1.5): in Sections 6 and 7 we consider the reaction terms INTRODUCTION and Section 8 is devoted to the terms involving the rough integrals. In Appendix A we prove a Kolmogorov-like criterion for distribution-valued processes. Appendix B provides regularity properties of the heat semigroup and its approximate counterpart on the Hölder spaces.

Spaces, norms and notation
Throughout this article, we denote by C 0 the space of continuous functions on the circle T endowed with the supremum norm.
For functions X : R → R n (or R n×n ) and R : R 2 → R n (or R n×n ), such that R vanishes on the diagonal, we define respectively Hölder seminorms with a given parameter α ∈ (0, 1): By C α and B α respectively we denote the spaces of functions for which these seminorms are finite. Then C α endowed with the norm The Hölder space C α of regularity α ≥ 1 consists of ⌊α⌋ times continuously differentiable functions whose ⌊α⌋-th derivative is (α − ⌊α⌋)-Hölder continuous. For α < 0 we denote by C α the Besov space B α ∞,∞ (see Appendix A for the definition).
We also define space-time Hölder norms, i.e. for some T > 0 and functions X : [0, T ] × T → R n (or R n×n ) and R : [0, T ] × T 2 → R n (or R n×n ), any α ∈ R and any β > 0 we define (1.7) We denote by C α T and B α T respectively the spaces of functions/distributions for which the norms (1.7) are finite. Furthermore, in order to deal with functions X exhibiting a blow-up with rate η > 0 near t = 0, we define the norm Similarly to above, we denote by C α η,T the space of functions/distributions for which this norm is finite.
By · C α →C β we denote the operator norm of a linear map acting from the space C α to C β . When we write x y, we mean that there is a constant C, independent of the relevant quantities, such that x ≤ Cy.

Elements of rough path theory
In this section we provide an overview of rough path theory and controlled rough paths. For more information on rough paths theory we refer to the original article [Lyo98] and to the monographs [LQ02,LCL07,FV10,FH14].
One of the aims of rough paths theory is to provide a consistent and robust way of defining the integral t s Y (r) ⊗ dX(r) , (2.1) for processes Y, X ∈ C α with any Hölder exponent α ∈ 0, 1 2 . If α > 1 2 , then the integral can be defined in Young's sense [You36] as the limit of Riemann sums. If α ≤ 1 2 , however, the Riemann sums may diverge (or fail to converge to a limit independent of the partition) and the integral cannot be defined in this way. Given X ∈ C α with α ∈ 0, 1 2 , the theory of (controlled) rough paths allows to define (2.1) in a consistent way for a certain class of integrands Y . To this end however, one has to consider not only the processes X and Y , but suitable additional "higher order" information.
We fix 0 < α ≤ 1 2 and p = ⌊1/α⌋ to be the largest integer such that pα ≤ 1. We then define the p-step truncated tensor algebra whose basis elements can be labelled by words of length not exceeding p (including the empty word), based on the alphabet A = {1, . . . , n}. We denote this set of words by A p . Then the correspondence A p → T (p) (R n ) is given by w → e w with e w = e w 1 ⊗ . . . ⊗ e w k , for w = w 1 . . . w k and e ∅ = 1 ∈ (R n ) ⊗0 ≈ R, where {e i } i∈A is the canonical basis of R n .
There is an operation ¡, called shuffle product [Reu93], defined on the free algebra generated by A. For any two words the shuffle product gives all the possible ways of interleaving them in the ways that preserve the original order of the letters. For example, if a, b and c are letters from A, then one has the identity ab ¡ ac = abac + 2aabc + 2aacb + acab .
We also define both the shuffle and the concatenation product of two elements from T (p) (R n ), i.e. for any two words w,w ∈ A p we define e w ¡ ew := e w¡w , e w ⊗ ew := e ww , if the sums of the lengths of the two words do not exceed p and e w ¡ ew = e w ⊗ ew = 0 otherwise. This is extended to all of T (p) (R n ) by linearity. With these notations at hand, we give the following definition: Definition 2.1. A geometric rough path of regularity α ∈ 0, 1 2 is a map X : where as above p = ⌊1/α⌋, such that 1. X(s, t), e w ¡ ew = X(s, t), e w X(s, t), ew , for any w,w ∈ A p with |w| + |w| ≤ p, 2. X(s, t) = X(s, u) ⊗ X(u, t), for any s, u, t ∈ R, 3. X, e w B α|w| < ∞, for any word w ∈ A p of length |w|.
If we define X i (t) := X(0, t), e i for any i ∈ A, then the components of X(s, t) of higher order should be thought of as defining the iterated integrals Of course, the integrals on the right hand side of (2.2) are not defined, as mentioned at the start of this section. Hence, for a given rough path X, then the left hand side of (2.2) is the definition of the right hand side. The conditions in Definition 2.1 ensure that the quantities (2.2) behave like iterated integrals. In particular, if X is a smooth function and we define X by (2.2) in Young's sense, then X satisfies the conditions of Definition 2.1, as was shown in [Che54]. In particular, if x = e i and y = e j , for any two letters i, j ∈ A, then the first property gives where we write X i (s, t) := X i (t) − X i (s). This is the usual integration by parts formula. The second condition of Definition 2.1 provides the additivity property of the integral over consecutive intervals.
Given an α-regular rough path X, we define the following quantity

Controlled rough paths
The theory of controlled rough paths was introduced in [Gub04] for geometric rough paths of Hölder regularity from ( 1 3 , 1 2 ]. In [Gub10], the theory was generalised to rough paths of arbitrary positive regularity. Definition 2.2. Given α ∈ 0, 1 2 , p = ⌊1/α⌋, a geometric rough path X of regularity α, and a function Y : R → (T (p−1) (R n )) * (the dual of the truncated tensor algebra), we say that Y is controlled by X if, for every word w ∈ A p−1 , one has the bound for some constant C > 0.

An alternative statement of Definition 2.2 is that for every word
Given an α-regular geometric rough path X, we then endow the space of all controlled paths Y with the semi-norm Given a rough path Y controlled by X, one can define the integral (2.1) by Here, the limit is taken over a sequence of partitions P of the interval [s, t], whose diameters |P| tend to 0. It was proved in [Gub10, Theorem 8.5] that the rough integral (2.5) is well defined, i.e. the limit in (2.5) exists and is independent of the choice of partitions P.
If every coordinate Y j of the process Y is controlled by X, then we denote the rough integral of Y with respect to X by We use the symbol − for the rough integral in (2.5), in order to remind the abuse of notation, since the integral depends not only on X i and Y j , but on much more information contained in X and Y . In the following proposition we provide several bounds on the rough integrals.

Proposition 2.3. Let Y be controlled by a geometric rough path X of regularity
Moreover, ifȲ is controlled by another rough pathX of regularity α, then there is a constant C, independent of X,X, Y andȲ , such that where we have used the quantity Proof. The bounds follow from [Gub10, Theorem 8.5, Proposition 6.1].
Remark 2.4. The notation | | |X −X| | | α is a slight abuse of notation since X −X is not a rough path in general. The definition (2.3) does however make perfect sense for the difference.
In fact, the article [Gub10] gives more precise bounds on the rough integrals than those provided in Proposition 2.3, but we prefer to have them in this form for the sake of conciseness.

Definition and well-posedness of the solution
Let us now give a short discussion of what we mean by "solutions" to (1.1), as introduced in [Hai11]. The idea is to find a process X such that v = u − X is of class C 1 (in space), so that the definition of the integral (1.4) boils down to defining the integral If we have a canonical way of lifting X to a rough path X, this integral can be interpreted in the sense of rough paths.
A natural choice for X is the solution to the linear stochastic heat equation. In order to get nice properties for this process, we build it in a slightly different way from [Hai11]. First, we define the stationary solution to the modified SPDE on the circle T, where Π denotes the orthogonal projection in L 2 onto the space of functions with zero mean. Second, we define the process where w 0 if the zeroth Fourier mode of W .
Remark 3.1. We need to use Π in (3.1) in order to obtain a stationary solution. In [Hai11], the author used instead the stationary solution to dX = ∆Xdt − Xdt + dW as a reference path. Our choice of X was used in [HMW14] and does not change the results of [Hai11].
The following lemma shows that there is a natural way to extend X to a rough path.
Lemma 3.2. For every 1 3 < α < 1 2 , the stochastic process X can be canonically lifted to a process X : R × T 2 → T (2) (R n ), such that for every fixed t ∈ R, the process X(t) is a geometric α-rough path.
The term "canonically" means that for a large class of natural approximations of the process X by smooth Gaussian processes X ε , the iterated integrals of X ε , defined by (2.2), converge in L 2 to the corresponding elements of X (see [FV10] for a precise definition and the proof). Denote by S t = e t∆ the heat semigroup, which is given by convolution on the circle with the heat kernel Assuming that the rough path-valued process X is given, we then define solutions to (1.1) as follows: (3.4) holds almost surely. Here, we write for brevity u(t) = v(t) + X(t) + U (t), and the process Z(s, x) is a rough integral whose derivative we consider in the sense of distributions.
Remark 3.4. In [Hai11], the last integral in (3.4) was defined by but as noticed in [HMW14], the notion of solution in Definition 3.3 is more convenient, as it simplifies treatment of the rough integral. This change does not affect the existence and uniqueness results of [Hai11], and the resulting solutions are the same.
For our convenience we rewrite the mild formulation of (1.6) as where we have set and as beforeū =v Although the two terms Φv and Υv are of the same type, we give them different names since they will arise in completely different ways from the approximation.

Existence and uniqueness results
The next theorem provides the well-posedness result for a mild solution to the equation (1.1).
Theorem 3.5. Let us assume that u 0 ∈ C β for some 1 3 < β < 1 2 . Furthermore, let F ∈ C 1 and G ∈ Proof. The proof can be done by performing a classical Picard iteration for v given by (3.4) on the space C 1 T for some T ≤ 1, see [Hai11].
Remark 3.6. The argument of [Hai11, Theorem 3.7] also works in the space C 1+α α/2,T , for any α ∈ 0, 1 2 . Hence, the real regularity of v(t) is 1 + α rather than 1. This fact will be used in Section 6 to estimate how close the approximate derivative of v is to ∂ x v.

Solutions of the approximate equations
In this section we rewrite the mild solution to the approximate equation (1.5) in a way convenient for working in Hölder spaces of low regularity. In particular, we define the iterated integrals of higher order of the controlling process.
Similarly to (3.1) and (3.2) we define the stationary process Y ε and X ε by where w 0 is the zeroth Fourier mode of W . Moreover, we define the approximate semigroup S (ε) t = e t∆ε generated by the approximate Laplacian and given by convolution on the circle T with the approximate heat kernel Then the mild version of the approximate equation (1.5) can be rewritten as where we write for brevity u ε = v ε + X ε + U ε , and set (4.4) As already mentioned in Section 2, the rough integrals are approximated by Riemannlike sums, but these include additional higher-order correction terms. Hence, we cannot expect in general that Z(s, x), defined in (3.5), is approximated by x −π G(u ε (s, y))D ε X ε (s, y) dy , (4.5) as ε ↓ 0. In order to approximate Z(s, x), we have to add some extra terms to (4.5). These extra terms give raise to the correction term in the limiting equation, mentioned in the introduction. In the rest of this section we build these missing extra terms.

Iterated integrals
In order to use the theory of rough paths with regularities close to zero, we need to build the iterated integrals of arbitrarily high orders of X and X ε with respect to themselves. The expansion of X ε defined in (4.1) in the Fourier basis is given by (4.6) Here, w k are C n -valued standard Brownian motions (i.e. real and imaginary parts of every component are independent real-valued Brownian motions so that E|w i k (t)| 2 = t), which are independent up to the constraint w k =w −k ensuring that X ε is realvalued. Furthermore, for every fixed t ≥ 0, η (ε) k (t) are independent R n -valued standard Gaussian random vectors such that and the coefficients q (ε) k are defined by Similarly, the Fourier expansion of the process X is where η k (t) are independent R n -valued standard Gaussian random vectors such that The following lemma provides bounds on the canonical lifts of X(t) and X ε (t) to Gaussian rough paths.
Furthermore, for any λ < 1 2 − α and any T > 0 the following bounds hold Moreover, for any word w ∈ A p with |w| ≥ 2 we have where we use the notation X w = X, e w .
Proof. The proof of (4.9) is provided in [HMW14, Lemma 3.3]. We only have to show that there exist the claimed lifts which satisfy the estimates (4.10). To this end, we define, for some κ > 0, the following sequences where k ≥ 1. First, for the increments of β (ε,κ) for some constant C > 0, where q (ε) k is defined in (4.7). To get the last inequality we have used the bounds on the functions f and h, provided in Assumptions 1 and 3, and the estimate |(q (ε) k+1 ) 2 − (q (ε) k ) 2 | ≤ Ck −1 , which follows from the bound on the total variation of the function h 2 /f , provided by Assumption 3. Second, the convergence β (ε,κ) k log k → 0 holds as k → ∞. Using these properties of β (ε,κ) k , we obtain from [Tel73, Theorem 4] that the series N k=1 β (ε,κ) k cos kx converge in L 1 as N → ∞, and the L 1 -norm of the limit is independent of ε, which proves that for any κ > 0 the parametrized sequence β (ε,κ) k is uniformly negligible in ε ∈ (0, 1) in the sense of [FGGR13, Definition 3.6].
Similarly, using the bound on the total variation of h/(f + 1), which is stated in Assumption 3, we can obtain that for any κ > 0 the sequence ρ (ε,κ) k is uniformly negligible in ε ∈ (0, 1) as well.
Noticing that the coefficients of the Fourier expansions (4.6) and (4.8) satisfy we can apply [FGGR13, Theorem 3.14] and obtain that for every t and α < 1 2 the processes X(t) and X ε (t) can indeed be lifted to α-regular rough paths X(t) and X ε (t) respectively, such that for any word w ∈ A p with |w| ≥ 2 the bounds hold uniformly in t ∈ [0, T ]. Furthermore, by [FGGR13, Theorem 3.15] we obtain that for all γ < 1 2 − α and κ > 0 small enough, uniformly in t ∈ [0, T ]. The last bound can be shown almost identically to [HMW14, (3.16d)], but taking θ ≡ 1 and the time interval from −∞. Now we will investigate the temporal regularity of X ε . Our aim is to apply [FGGR13, Theorem 3.15] to the processes X ε (s) and X ε (t), with s, t ∈ [0, T ]. To this end, let us define τ = |t−s| and the parametrized sequence µ (τ,ε) k = e −k 2 f (εk)τ . Then, in the same way as in the beginning of the proof and using Assumptions 1 and 3, we obtain that for any κ > 0 the sequence β (κ,ε) k µ (τ,ε) k is uniformly negligible in τ > 0 and ε ∈ (0, 1) and by [FGGR13,Theorem 3.15] we obtain, for any word w ∈ A p with |w| ≥ 2, for all γ < 1 2 − α. Here, the last bound can be derived similarly to [HMW14, (3.16a)], but with θ ≡ 1 and the time interval from −∞. In the same way, we get (4.14) Applying the Kolmogorov criterion [Kal02] together with the bounds (4.11) and (4.14), we get the first estimate in (4.10). Now, let us take any word w ∈ A p with |w| ≥ 2. Then, on the one hand, the estimate (4.12) gives On the other hand, from (4.14) and (4.13) the following estimate follows Combining these two bunds we obtain for any δ > 0 small enough and uniformly in s, t ∈ [0, T ]. From this bound, estimate (4.12) and the Kolmogorov criterion [Kal02] we obtain the second bound in (4.10).

Approximation of the rough integral
Now, having defined the iterated integrals of X ε , we can build an approximation of the process Z defined in (3.5). The idea comes from the fact that if u(t) is controlled by X(t), then the process G(u(t)) is controlled by X(t) as well. The Taylor expansion gives an approximation for G ij (u(t)), Here,C w are combinatorial factors which can be calculated explicitly. Furthermore, we use the following notation: for w = w 1 · · · w k ∈ A p−1 and k ≥ 1 we denote D w = D w 1 · · · D w k and u(t, x) w = u w 1 (t, x) · · · u w k (t, x).
Recalling that we will look for solutions such that u(t) − X(t) ∈ C 1 , we obtain an approximation of G ij (u(t)) via X(t), Symmetrising this expression and using Definition 2.1, this can be rewritten as for some slightly different constants C w . This expansion motivates our choice of the terms in the approximation of the rough integral.
In view of Assumption 2, it is natural to define the process D ε X ε : R + × T → T (p) (R n ) in the following way: for any word w ∈ A p we set D ε X ε (t; y), e w := 1 ε R X ε (t; y, y + εz), e w µ(dz) . (4.16) Combining the expansion (4.15) with the definition (2.6), it appears plausible that a good approximation of Z is given by Here, to simplify the notation we have omitted the sum over j. Now we can rewrite the mild solution (4.3) as where the functions Φ vε ε and Ψ vε ε are defined in (4.4). The term involving the rough integral is denoted by The additional terms in (4.18) which we used to approximate the rough integral we denote by In the next sections we will show that the termῩ vε ε tends to 0 and the other terms in (4.18) converge to the corresponding terms in (3.6) in the space C 1 T .

Convergence of the solutions of the approximate equations
In this section we provide a proof of Theorem 1.1. In what follows we use the constant α ⋆ = 1 2 − α, for some fixed small α > 0. This constant represents the real spatial regularity of the process X defined in (3.2). To obtain better bounds we will work in the spaces of regularity α, which is close to 0. The constants α and α ⋆ are used throughout the article as fixed values.
To shorten notations we define the norm 3) for the definition of the norm of a rough path. For any K > 0 we define the stopping time Note that in view of Remark 3.6, the condition on the norm v C 1+α⋆ α⋆ /2,t is reasonable.
For any two letters i, j ∈ A we define the process where δ is the Kronecker delta. To have a priori bounds on the corresponding ε-quantities we introduce the stopping time The blow-up of the norm v(t) − v ε (t) C 1 comes from the regularization property of the heat semigroup and the fact that we work in the α-regular spaces, i.e. we use the bound See Appendix B for the properties of the heat semigroup. Finally, we define the stopping time ̺ K,ε := σ K ∧ σ K,ε and write in what follows Remark 5.1. In the article we always consider time intervals up to the stopping time ̺ K,ε . Therefore, all the quantities involved in the definition of ̺ K,ε are bounded by K + 1 and all the proportionality constants can depend on K.
Proof of Theorem 1.1. For α > 0 as in the beginning of this section we define p = ⌊1/α⌋. From the derivation of the bounds below we will see how small the value of α must be. To make the notation shorter, we introduce the following norm · α,t := · C α t + · C 1 For t ≤ ̺ K,ε , we obtain from (3.6) and (4. We consider only time periods t < 1, for larger times the claim can easily be obtained by iteration. To find a bound on the first term in (5.3) we use the results of Section 6. Applying Proposition 6.1 with a small constant κ = α we get In order to bound the second term in (5.3), we use Proposition 6.2 with κ = α, Applying Proposition 7.2 with the parameter κ = α, we bound the expectation of the third term in (5.3) by (5.6) A bound on the fourth term in (5.3) is a straightforward application of Proposition 6.4, Ῡ vε Using Proposition 8.2 with the small parameter κ = α/2 we can bound the last term in (5.3) by where D ε is defined in (8.1). Combining the bounds (5.3)-(5.8) together we obtain where we have used α ⋆ = 1 2 − α. By Lemma 4.1 we can bound the norms of the controlling processes, Furthermore, by choosing t = t * small enough we can absorb the first term on the right-hand side of (5.9) into the left-hand side and obtain From the definition ofū viav and (5.10) we conclude Here, we have also used Lemma 4.1 and the bound which can be derived similarly to (6.7). The rest of the proof is almost identical to the proof of [HMW14, Theorem 1.5].

Estimates on the reaction term
In this section we prove convergence of the reaction terms of the approximate equation (4.18) to the corresponding terms of (3.6). Let us recall the notation (5.2) and Remark 5.1, which says that all the quantities involved in the definition of the stopping time ̺ K,ε are bounded on the interval (0, t ε ] by the constant K + 1 and all the proportionality constants below can depend on K. The next proposition gives a bound on the terms Ψv and Ψ vε ε defined in (3.7) and (4.4) respectively.
Proposition 6.1. For any γ ∈ (0, 1], t > 0 and κ > 0 small enough the following bound holds (1−α)/2,tε (6.1) Proof. For any t > 0, using the notation (5.2), we can rewrite To bound the term J 1 , we first investigate how good the operator D ε approximates ∂ x . Let us take a function ϕ ∈ C 1+α⋆ (T). Then by the Assumption 2, we can rewrite Using the fact, that the Hölder regularity of ϕ is 1 + α ⋆ , we obtain This yields the estimate where we have used the boundedness of the (1 + α ⋆ )th moment of µ.
Using this estimate we derive where we have used boundedness of ū C 0 tε and v C 1+α⋆ To derive a bound on J 2 , we notice that which follows from Lemma B.1. Hence, using the estimate (6.2) for U , we obtain (6.4) Note, that for any function ϕ ∈ C 1 (T) we have by Assumption 2, Using this bound we obtain where we have used boundedness of ū C 0 tε .
To bound J 4 we note that for any κ > 0 sufficiently small. Here, in the last estimate we used Lemma B.2 with λ = α ⋆ − κ. Using this estimate and (6.5) we obtain Exploiting continuous differentiability of the function G we get where in the second line we have used a bound, similar to (6.7), (6.10) Moreover, in the estimate (6.9) we have used the bound which is obtained in a way similar to (6.7). Using Lemma B.2, the integral J 6 can be bounded by where we have used the bound (6.10).
In the following proposition we provide a bound on the terms Φv and Φ vε ε defined in (3.7) and (4.4) respectively.
Proposition 6.2. For any γ ∈ (0, 1] and κ > 0 small enough the following bound holds Proof. Using continuous differentiability of the function F , Lemma B.2 and recalling thatū =v + X + U we get Here, we have used boundedness of u ε C 0 tε and the estimate (6.11). The following lemma shows how the processes (4.16) behave in the supremum norm. In particular, it shows that they converge to 0 as soon as |w| > 2. D ε X ε (s; ·), e w C 0 ε |w|α⋆−1 , holds uniformly in ε and t.
Proof. Since X ε (s) is a rough path of regularity α ⋆ , we can use the third property in Definition 2.1 to get Here, we have used the assumption on the moments of |µ|.
In the following proposition we obtain a bound on the termῩ vε ε defined in (4.20).
Proof. We use Lemma B.3 to estimate the approximate heat semigroup, and Lemma 6.3: for κ > 0 small enough. This is the claimed bound.

Convergence of the correction term
In this section we show that the term Υ vε ε , defined in (4.20), converges to the correction term Υv from (3.7). In view of Remark 5.1, we only consider time intervals up to the stopping time ̺ K,ε , by using the notation (5.2).
To shorten the notation we define X ε (t) to be the projection of the rough path X ε (t) to the second level of the tensor algebra. The following lemma is similar to [HMW14, Proposition 4.1], but the bound is in a Hölder norm rather than a Sobolev norm.
Lemma 7.1. For any γ ∈ 0, 1 2 , any t > 0 and any κ > 0 small enough we have Proof. The proof is almost identical to that of [HMW14, Proposition 4.1], but we use Lemma A.3 to reduce oneself to moment bounds on the Paley-Littlewood blocks of D ε X ε , instead of using pointwise bounds.
A bound on Υv and Υ vε ε , defined in (3.7) and (4.20) respectively, is given in the next proposition.
Proposition 7.2. For any γ ∈ (0, 1] and any κ > 0 sufficiently small we have Proof. Let us define the functions F(u) i = Λ divG i (u) and where as usual the sum over j is omitted. Then we can write To bound J 1 we note that we can rewrite Therefore, applying Lemma B.1 with η ∈ (0, α ⋆ ) and Lemma A.4, we obtain That gives us, using the boundedness of u ε C α⋆ tε and Lemma 7.1, A bound on J 2 follows from Lemma B.1 and regularity of G, Here, we have used the representation ofū viav and the bound (6.11).
For the third term we use Lemma B.2 with λ = 1 2 − κ, where we have used boundedness of the second-order iterated integral X ε and u ε C α tε . Combining the estimates (7.1), (7.2) and (7.3) we obtain the claimed bound.

Estimates on rough terms
In this section we obtain bounds on the terms involving rough integrals. As usual, we will use the notation (5.2), which in view of Remark 5.1 means that all the quantities involved in the definition of ̺ K,ε are bounded. Furthermore, let us define the quantity (1−α)/2,tε where the norm | | | · | | | α,tε was introduced in (5.1). The next lemma provides bounds on the rough integrals Z and Z ε defined in (3.5) and (4.17) respectively.
where, for κ > 0 small enough, the bounds Proof. Sinceū(s) − X(s) ∈ C 1 , for s ≤ t ε , the process Y ij (s) = G ij (ū(s)) is controlled by the α ⋆ -regular rough path X(s) with the rough path derivative Y ′ ij (s) = DG ij (ū(s)) and the remainder R Y ij (s; x, y) = DG ij (ū(s, x)) (v(s; x, y) + U (s; x, y)) where we use the notationv(s; x, y) =v(s, y) −v(s, y) and respectively for U and u. Here, by the rough path derivative we mean the projection of the controlled rough path on (R n ) * in Definition 2.2, and the remainder is a collection of all the processes R w Y from (2.4). From the regularity assumptions for the function G and the processesū andv, we obtain the bounds The power of s in the last estimate comes from the bound U (s) 2α⋆ s − α⋆ 2 , which is a consequence of Lemma B.1. The estimate (8.2) follows from (2.8) and (8.4).
Similarly, for s ≤ t ε , the process Y ε,ij (s) = G ij (u ε (s)) is controlled by the α ⋆ -regular rough path X ε (s) with the rough path derivative Y ′ ε,ij (s) = DG ij (u ε (s)) and the remainder R Y ε,ij (s), such that the following bounds hold To prove the bound (8.3), we consider the processesū(s) and u ε (s) to be of Hölder regularity α. Then they are controlled by the α-regular rough paths X(s) and X ε (s) respectively. Hence, we can extend G ij (ū(s)) to the process G ij (s) : T → (T (p−1) (R n )) * which is controlled by X(s) as well and such that G ij (s, x), e w = D w G ij (ū(s, x)) , for w ∈ A p−1 . Then, as it was noticed in Subsection 4.2, for every w ∈ A p−1 the following expansion holds G ij (s, y), e w − G ij (s, x), e w = w∈A p−|w|−1 \∅ Cw G ij (s, x), ew ⊗ e w X(s; x, y), ew + R w G ij (s; x, y) .
For any word w ∈ A p−1 , the assumptions on G andū imply G ij (s), e w C α 1. Furthermore, from the argument of Subsection 4.2, it is not difficult to obtain the estimate on the remainder: . The latter bound follows from |ū(s; x, y)w| |y − x| (p−|w|)α , for any wordw such that |w| = p − |w|, and |ū(s; x, y)w − X(s; x, y)w| |ū(s; x, y) − X(s; x, y)| for any wordw ∈ A p−|w|−1 \ {∅}. Here, in the last line we have used the bound which follows from Lemma B.1.
In the same way the process G ij (u ε (s)) can be extended to G ε ij (s) : T → (T (p−1) (R n )) * which is controlled by X ε (s). We denote the remainders by R w G ε ij . Furthermore, the corresponding bounds hold for any word w ∈ A p−1 .
For m ≥ 1 we define the Dirichlet kernel Proof. In the case p = ∞, the function can be bounded by its value at 0, which gives |D m (x)| ≤ 2 m+1 . If 1 < p < ∞, then we can rewrite The latter integral is bounded by a constant C(p), since the integrand can be estimated up to a constant multiplier by 1∧|x| −p . That gives the claimed estimate. Now, we provide a Kolmogorov-like criterion for distribution-valued processes.  Proof. We can notice that δ m ψ(t, x) = D m * δ m ψ(t, x), where the convolution is taken over the variable x ∈ T. Therefore, the Hölder inequality yields |δ m ψ(t, x)| ≤ D m L p ′ (T) δ m ψ(t) L p (T) , (A.2) for any p ≥ 1, where as usual p ′ is the exponent conjugate of p. Since ψ(t) belongs to a fixed Wiener chaos, the same is true for the Paley-Littlewood block δ m ψ(t), and we can apply Nelson's lemma to it [Nel73], saying that all moments of δ m ψ(t) are bounded up to a constant multiplier by its second moment. Therefore, Since for γ < 1, γ = 0, the space C γ coincides with the Besov space B γ ∞,∞ , we obtain which is finite if γ < α − 1 p . Finally, we can notice that for any γ < α, we can choose p ≥ 1 large enough such that γ < α − 1 p , so that E ψ(t) C γ ≤ C(α, γ)A We finish the proof by applying the Banach space-valued version of the Kolmogorov continuity criterion [Kal02], which gives the estimate (A.1) from (A.4) and (A.5).
The following Lemma provides a bound on the product of two distributions from certain Hölder spaces.