Approximation of SDEs: a stochastic sewing approach

We give a new take on the error analysis of approximations of stochastic differential equations (SDEs), utilizing and developing the stochastic sewing lemma of Lê (Electron J Probab 25:55, 2020. 10.1214/20-EJP442). This approach allows one to exploit regularization by noise effects in obtaining convergence rates. In our first application we show convergence (to our knowledge for the first time) of the Euler–Maruyama scheme for SDEs driven by fractional Brownian motions with non-regular drift. When the Hurst parameter is \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H\in (0,1)$$\end{document}H∈(0,1) and the drift is \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {C}^\alpha $$\end{document}Cα, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha \in [0,1]$$\end{document}α∈[0,1] and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha >1-1/(2H)$$\end{document}α>1-1/(2H), we show the strong \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L_p$$\end{document}Lp and almost sure rates of convergence to be \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$((1/2+\alpha H)\wedge 1) -\varepsilon $$\end{document}((1/2+αH)∧1)-ε, for any \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varepsilon >0$$\end{document}ε>0. Our conditions on the regularity of the drift are optimal in the sense that they coincide with the conditions needed for the strong uniqueness of solutions from Catellier and Gubinelli (Stoch Process Appl 126(8):2323–2366, 2016. 10.1016/j.spa.2016.02.002). In a second application we consider the approximation of SDEs driven by multiplicative standard Brownian noise where we derive the almost optimal rate of convergence \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1/2-\varepsilon $$\end{document}1/2-ε of the Euler–Maruyama scheme for \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {C}^\alpha $$\end{document}Cα drift, for any \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varepsilon ,\alpha >0$$\end{document}ε,α>0.


Introduction
Since the 1970s, it has been observed that the addition of a random forcing into an ill-posed deterministic system could make it well-posed. Such phenomenon is called regularization by noise. One of the prime examples concerns differential equations of the form where b is a bounded vector field. While Eq. (1.1) might have infinitely many solutions when b fails to be Lipschitz continuous and might possess no solution when b fails to be continuous, Zvonkin [39] and Veretennikov [38] (see also the paper of Davie [9]) showed that the stochastic differential equation (SDE) driven by a Brownian motion B, has a unique strong solution when b is merely bounded measurable. This result was extended to the case of the fractional Brownian noise in [4,8,27,32,33]. These papers study the equation where B H is a d-dimensional fractional Brownian motion with Hurst parameter H ∈ (0, 1). It is known [8,Theorem 1.9] that this equation has a unique strong solution if b belongs to the Hölder-Besov space C α and α > 1 − 1/(2H ). Thus, the presence of the noise not only produces solutions in situations where there was none but also singles out a unique physical solution in situations where there were multiple. However, to the best of our knowledge, no construction of this solution through discrete approximations has been known (unless H = 1/2). In this article, we develop a new approach which allows to construct this solution and even obtain rate of convergence of the discrete approximations. Before the formal setup of Sect. 2, let us informally overview the results.
First, let us recall that in the standard Brownian case (H = 1/2) the seminal work of Gyöngy and Krylov [18] established the convergence in probability of the Euler-Maruyama scheme to the solution of (1.3). Here b is a bounded measurable function and κ n (t) := nt /n, n ∈ N. (1.5) In the present paper, we significantly extend these results by (a) establishing the convergence of the Euler-Maruyama scheme for all H ∈ (0, 1); (b) showing that the convergence takes place in a stronger (L p ( ) and almost sure) sense; (c) obtaining the explicit rate of convergence. More precisely, in Theorem 2.1 we show that if b is bounded and Hölder-continuous with exponent α > 1 − 1/(2H ), then the Euler-Maruyama scheme converges with rate ((1/2 + α H ) ∧ 1) − ε for any ε > 0. Thus, the approximation results are obtained under the minimal assumption on the drift b that is needed for strong uniqueness of solutions [8,32] and for the well-posedness of scheme (1.4). Let us also point out that in particular, for H < 1/2, one does not need to require any continuity from b to obtain a convergence rate 1/2−ε. Concerning approximations of SDEs driven by fractional Brownian motions with regular coefficients, we refer the reader to the recent works [15,22] and references therein. Concerning the implementation of such schemes and in particular the simulation of increments of fractional Brownian motions we refer to [37,Section 6] and its references.
Our second application is to study equations with multiplicative noise in the standard Brownian case: and their discretisations d X n t = b(X n κ n (t) ) dt + σ (X n κ n (t) ) d B t , X n 0 = x n 0 , t ≥ 0. (1.7) Here b, σ are measurable functions, B is a d-dimensional Brownian motion, and κ n is defined in (1.5). To ensure well-posedness, a nondegeneracy assumption on σ has to be assumed. In the standard Brownian case the rate of convergence for irregular b has been recently actively studied, see among many others [2,[28][29][30]36] and their references. However, the obtained rate deteriorates as b becomes more irregular: in the setting of (1.6)-(1.7), the best known rate is only proven to be (at least) α/2 for b ∈ C α , α > 0 in [2]. It was first shown in [10] that, at least for additive noise, the strong rate does not vanish as the regularity α approaches 0, and one in fact recovers the rate 1/2 − ε for arbitrary ε > 0, for all α > 0. In the present paper we establish the same for multiplicative noise, in which case the rate 1/2 is well-known to be optimal. Our proof offers several other improvements to earlier results: all moments of the error can be treated in the same way, the scalar and multidimensional cases are also not distinguished, and the main error bound (2.9) is uniform in time, showing that X · and X n · are close as paths. The topology (in time) where the error is measured is in fact even stronger, see Remark 2.3.
To obtain these results we develop a new strategy which utilizes the stochastic sewing lemma (SSL) of Lê [27] as well as some other specially developed tools. We believe that these tools might be also of independent interest; let us briefly describe them here.
First, we obtain a new stochastic sewing-type lemma, see Theorem 3.3. It provides bounds on the L p -norm of the increments of a process, with the correct dependence on p. This improves the corresponding bounds from SSL of Lê (although, under more restrictive conditions). This improved bound is used for proving stretched exponential moment bounds that play a key role in the convergence analysis of the Euler-Maruyama scheme for (1.3), see Sect. 4.3. In particular, using this new sewingtype lemma, we are able to extend the key bound of Davie [9, Proposition 2.1] (this bound was pivotal in his paper for establishing uniqueness of solutions to (1.2) when the driving noise is the standard Brownian motion) to the case of the fractional Brownian noise, see Lemma 4.3. Second, in Sect. 5 we derive density estimates of (a drift-free version of) the solution of (1.7) via Malliavin calculus. Classical results in this direction include that of Gyöngy and Krylov [18], and of Bally and Talay [5,6]: the former gives sharp short time asymptotics but no smoothness of the density, and the latter vice versa (see Remark 5.1 below). Since our approach requires both properties at the same time, we give a self-contained proof of such an estimate (5.2).
Finally let us mention that, as in [10,11,34], efficient quadrature bounds play a crucial role in the analysis. These are interesting approximation problems in their own right, see, e.g., [25] and the references therein. Such questions in the non-Markovian setting of fractional Brownian motion have only been addressed recently in [1]. However, there are a few key differences to our quadrature bounds from Lemma 4.1. First, we derive bounds in L p ( ) for all p, which by Proposition 2.9 also imply the corresponding almost sure rate (as opposed to L 2 ( ) rates only in [1]). Second, unlike the standard fractional Brownian motions considered here, [1] requires starting them at time 0 from a random variable with a density, which provides a strong smoothing effect. Third, when approximating the functional of the form our results not only imply pointwise error estimates on | T − n T |, but also on the error of the whole path · − n · C β measured in a Hölder norm C β with some β > 1/2. This is an immediate consequence of the bounds (4.1) in combination with Kolmogorov's continuity theorem.
The rest of the article is structured as follows. Our main results are presented in Sect. 2. In Sect. 3 we outline the main strategy and collect some necessary auxiliary results, including the new sewing lemma-type bound Theorem 3.3. Section 4 is devoted to the error analysis in the additive fractional noise case. In Sect. 5 we prove an auxiliary bound on the probability distribution of the Euler-Maruyama approximation of certain sufficiently nice SDEs. The proofs of the convergence in the multiplicative standard Brownian noise case are given in Sect. 6.

Main results
We begin by introducing the basic notation. Consider a probability space ( , F, P) Recall that the components of B H are independent and each component is a Gaussian process with zero mean and covariance where c H is a certain positive constant, see [35, (5.1)]. For α ∈ (0, 1] and a function f : If = (0, . . . , 0), then as usual, we use the convention ∂ f = f . In particular, the C α norm always includes the supremum of the function. We also set C 0 (Q, V ) to be the space of bounded measurable functions with the supremum norm. We emphasize that in our notation elements of C 0 need not be continuous! If α < 0, then by C α (R d , R) we denote the space of all distributions f ∈ D (R d ), such that where P ε f is the convolution of f with the d-dimensional Gaussian heat kernel at time ε.
In some cases we use shorthands: if Q = R d , or V = R d or V = R d×d , they are omitted from the notation. For instance, the reader understands that requiring the diffusion coefficient σ of (1.6) to be of class C α is to require it to have finite Convention on constants: throughout the paper N denotes a positive constant whose value may change from line to line; its dependence is always specified in the corresponding statement.

Additive fractional noise
Our first main result establishes the convergence of the numerical scheme (1.4) to the solution of Eq. (1.3). Fix H ∈ (0, 1). It is known ( [8, Theorem 1.9]) that if the drift b ∈ C α with α ∈ [0, 1] satisfying α > 1 − 1/(2H ), then for any fixed x 0 ∈ R d , Eq. (1.3) admits a unique strong solution, which we denote by X . For any n ∈ N we take x n 0 ∈ R d and denote the solution of (1.4) by X n . For a given α ∈ [0, 1] and H ∈ (0, 1), we set Now we are ready to present our first main result. Its proof is placed in Sect. 4, a brief outline of it is provided in Sect. 3.1.

Remark 2.2 An interesting question
left open is whether one can reach α = 0 in the H = 1/2 case. In dimension 1, this is positively answered [10] using PDE methods, but the sewing approach at the moment does not seem to handle such endpoint situations. For H = 1/2 even weak existence or uniqueness is not known for the endpoint

Remark 2.3
From (2.6), Kolmogorov's continuity theorem, and Jensen's inequality, one gets the bound for any ε > 0 (with N also depending on ε ). In the literature it is more common to derive error estimates in supremum norm, which of course follows: is quite a bit stronger.

Remark 2.4
A trivial lower bound on the rate of convergence of the solutions is the rate of convergence of the initial conditions. In (1.4) we lose δ compared to this rate, but δ > 0 can be chosen arbitrarily small. This becomes even less of an issue if one simply chooses x n 0 = x 0 .

Remark 2.5
The fact that the error is well-controlled even between the gridpoints is related to the choice of how we extend X n to continuous time from the points X n 0 , X n 1/n , . . .. For other type of extensions and their limitations we refer the reader to [31].

Corollary 2.6
Assume α ∈ [0, 1] satisfies (2.5) and suppose b ∈ C α . Take x 0 = x n 0 for all n ∈ N. Then for a sufficiently small θ > 0 and any ε > 0 there exists an almost surely finite random variable η such that for all n ∈ N, ω ∈ the following bound holds where γ was defined in (2.4).

Multiplicative Brownian noise
In the multiplicative case we work under the ellipticity and regularity conditions σ ∈ C 2 , σσ T λI , (2.8) in the sense of positive definite matrices, with some λ > 0. This, together with b ∈ C 0 , guarantees the strong well-posedness of equations (1.6) and (1.7) [38, Theorem 1], whose solutions we denote by X and X n , respectively. The second main result then reads as follows, its proof is the content of Sect. 6.
Suppose σ satisfies (2.8). Then for all n ∈ N the following bound holds .
Let us conclude by invoking a simple fact used in the proof of Corollaries 2.6 and 2.8, which goes back to at least [20, proof of Theorem 2.3] (see also [13,Lemma 2]).

Proposition 2.9
Let ρ > 0 and let (Z n ) n∈N be a sequence of random variables such that for all p > 0 and all n ∈ N one has the bound Z n L p ( ) ≤ N n −ρ for some N = N ( p). Then for all ε > 0 there exists an almost surely random variable η such that for all n ∈ N, ω ∈ |Z n | ≤ ηn −ρ+ε .
Proof Notice that for any q > 0 Choosing q = 2/ε, the above sum is finite, so by the Borel-Cantelli lemma there exists an almost surely finite N-valued random variable n 0 such that |Z n | ≤ n −ρ+ε for all n > n 0 . This yields the claim by setting

The outline of the strategy
The purpose of this section is to outline the main steps in a simple example. Hopefully this gives a clear picture of the strategy to the reader, which otherwise may be blurred by the some complications arising in the proofs of Theorems 2.1 and 2.7.
The 'simple example' will be the setting of (1.3) and (1.4) with H = 1/2 and f ∈ C α for some α > 0. We furthermore assume x 0 = x n 0 and that the time horizon is given by [0, T 0 ] instead of [0, 1], with some small 1 ≥ T 0 > 0 to be chosen later. Finally, we will only aim to prove (2.6) with τ = 1/2.
Step 1 ("Quadrature bounds"). Our first goal is to bound the quantity From the Hölder continuity of b, one would have the trivial bound of order n −α/2 in any L p ( ) norm, but in fact one can do much better, as follows. Fix ε ∈ (0, 1/2) and define (recall that by E s we denote the conditional expectation given F s ) The stochastic sewing lemma, Proposition 3.2 below, allows one to bound A through bounds on A. Given the preceding field A s,t , provided that the conditions (3.8) and (3.9) are satisfied, it is easy to check that the unique adapted process A constructed in Proposition 3.2 coincides with the one in (3.1). Indeed, the process in (3.1) satisfies (3.10) and (3.11) with ε 1 = ε, ε 2 = 1, K 1 = b C 0 and K 2 = 0. Therefore it remains to find C 1 and C 2 . In fact, it is immediate that one can choose We now claim that one can take

then one easily gets by the conditional Jensen's inequality
If |t − s| > 2n −1 , let s = κ n (s) + 2n −1 be the second gridpoint to the right of s.
In particular, r ≥ s implies κ n (r ) ≥ s. Let us furthermore notice that for any u ≥ v and any bounded measurable function f , one has where P is the standard heat kernel (see (3.22) below for a precise definition). One can then where in the third line we used a well-known estimate for heat kernels, see Proposition 3.7 (ii) with exponents β = 0, δ = 1/2 + α/2 − ε, and time points κ n (r ) − s in place of s, r − s in place of t. We also used that for r ≥ s , one has κ n (r ) − s ≥ r − s . By (3.2) and (3.3) we indeed get (3.8) with C 1 = N n −1/2−α/2+ε . Applying the stochastic sewing lemma, (3.12) yields Step 1.5 (Girsanov transform). An easy application of Girsanov's theorem yields In general (for example, for fractional Brownian motions) the Girsanov transformation can become involved, but for our present example this is completely straightforward.
Step 2 ("regularization bound"). Next, we estimate the quantity for some adapted processes ψ, ϕ whose Lipschitz norm is bounded by some constant K . As suggested by the above notation, we use the stochastic sewing lemma again, with A s,t defined as We do not give the details of the calculations at this point. It is an instructive exercise to the interested reader to verify that (3.8) and (3.9) are satisfied with ε 1 = α/2, Here N depends on p, α, d, K , b C α , but not on T 0 . The bound (3.10) is straightforward, with K 1 = b C 0 . Concerning (3.11), one can write and so K 2 = 2K b C α does the job. Therefore, by (3.12), we get We will only apply the following simple corollary of this bound: if ψ 0 = ϕ 0 , then Step 3 ("Buckling") Let ψ and ψ n be the drift component of X and X n , respectively: We apply (3.4) and (3.5) with ϕ = ψ n , to get Dividing by |t − s| 1/2 and take supremum over 0 ≤ s ≤ t ≤ T 0 , one gets Since so far N does not depend on T 0 , one can choose T 0 sufficiently small so that N T α/2 0 ≤ 1/2. This yields the desired bound Let us point out that the rate of convergence is determined by only the first step. Also, the second step is similar in spirit to the 'averaging bounds' appearing in sewing-based uniqueness proofs for SDEs (see e.g. [8,27]).
In the proof of Theorem 2.1, the more difficult part will be the regularization bound. Applying only the stochastic sewing lemma of Lê apparently does not lead to an optimal result for H > 1/2. Therefore at some point one has to move from almost sure bounds (which are similar to [8]) to L p bounds. This requires an extension of the Davie's moment bound [9, Proposition 2.1] to the case of the fractional Brownian motion. This is done in Lemma 4.3 using the new stochastic sewing lemma (Theorem 3.3).
In contrast, for Theorem 2.7 establishing the quadrature bound will be more difficult. In the above arguments, the heat kernel bounds have to be replaced by estimates on the transition densities of the Euler-Maruyama scheme. These bounds are established via Malliavin calculus, this is the content of Sect. 5.

Sewing lemmas
As mentioned above, the proof strategy relies on the sewing and stochastic sewing lemmas. For the convenience of the reader, we recall them here. The first two lemmas are well-known, the third one is new. We The first statement is the sewing lemma of Gubinelli.
Moreover, there exists a constant K 0 depending only on ε, d such that A in fact satisfies the above bound with K ≤ K 0 C.
Moreover, there exists a constant K depending only on ε 1 , ε 2 , d such that A satisfies the bound (3.12) The final statement of this section is new. It provides bounds on A s − A t L p ( ) with the correct dependence on p: namely these bounds are of order √ p, rather than p as in (3.12). This will be crucial for the proof of

Remark 3.4
Note that the right-hand side of bound (3.17) does not depend on C 1 .

Remark 3.5
Let us recall that the proof of stochastic sewing lemma in [27] requires to apply the BDG inequality infinitely many times but each time to a discrete-time martingale, thus yielding a constant p in the right-hand side of bound (3.12). In our proof we apply the BDG inequality only once, but to a continuous time martingale. This allows to get a better constant (namely √ p instead of p), since the constant in the BDG inequality for the continuous-time martingales is better than in the BDG inequality for general martingales.

Proof of Theorem 3.3 This proof is inspired by the ideas of [3, proof of Proposition 3.2] and [8, proof of Theorem 4.3].
For the sake of brevity, in this proof we will write L p for L p ( ). Fix s, t ∈ [S, T ] ≤ and for i ∈ {1, . . . , d} consider a martingale We will frequently use the following inequality.
We begin by observing that The first term in (3.19) is easy to bound. By assumption (3.15) we have To estimate I i 2 we first observe that for each i = 1, . . . , d the martingale M i is continuous. Indeed, for any s ≤ u ≤ v ≤ t we have using (3.18), (3.13), and (3.14) Therefore, the Kolmogorov continuity theorem implies that the martingale M i is continuous.

Some useful estimates
In this section we establish a number of useful technical bounds related to Gaussian kernels. Their proofs are mostly standard, however we were not able to find them in the literature. Therefore for the sake of completeness, we provide the proofs of these results in the "Appendix A". Fix an arbitrary H ∈ (0, 1). Define Let p t , t > 0, be the density of a d-dimensional vector with independent Gaussian components each of mean zero and variance t: For a measurable function f : R d → R we write P t f := p t * f , and occasionally we denote by p 0 the Dirac delta function.
Our first statement provides a number of technical bounds related to the fractional Brownian motion. Its proof is placed in the "Appendix A". Proposition 3.6 Let p ≥ 1. The process B H has the following properties: is independent of F s ; furthermore, this random variable is Gaussian with mean 0 and variance The next statement gives the heat kernel bounds which are necessary for the proofs of the main results. Its proof is also placed in the "Appendix A". Recall the definition of the function v in (3.23).
The following hold: for all x ∈ R d , 0 < s ≤ u ≤ t ≤ 1 and all random vectors ξ whose components are independent, N (0, v(s, u, t)) random variables.
Our next statement relates to the properties of Hölder norms. Its proof can be found in "Appendix A".
Finally, we will also need the following integral bounds. They follow immediately from a direct calculation.

Girsanov theorem for fractional Brownian motion
One of the tools which are important for the proof of Theorem 2.1 is the Girsanov theorem for fractional Brownian motion [12,Theorem 4.9], [32, Theorem 2]. We will frequently use the following technical corollary of this theorem. For the convenience of the reader we put its proof into "Appendix B".
almost surely. Further, assume that one of the following holds: or (ii) H > 1/2 and there exists a random variable ξ such that and E exp(λξ ) < ∞ for any λ > 0.
Then there exists a probability measure P which is equivalent to P such that the process B H := B H + · 0 u s ds is a fractional Brownain motion with Hurst parameter H under P. Furthermore for any λ > 0 we have In order to simplify the calculation of the integral in (3.27), we provide the following technical but useful lemma. Since the proof is purely technical, we put its proof in the "Appendix B".  N (H , ρ), such that for any function f ∈ C ρ ([0, 1], R d ) and any n ∈ N one has

Additive fractional noise
In this section we provide the proof of Theorem 2.1. We follow the strategy outlined on Sect. 3.1: In Sects. 4.1 and 4.2 we prove the quadrature bound and the regularization bound, respectively. Based on these bounds, the proof of the theorem is placed in Sect. 4.3.

Quadrature estimates
The goal of this subsection is to prove the quadrature bound (4.7). The proof consists of two steps. First, in Lemma 4.1 we prove this bound for the case of fractional Brownian motion; then we extend this result to the process X by applying the Girsanov theorem.
Recall the definition of functions κ n in (1.5) and γ in (2.4).
Proof It suffices to prove the bound for p ≥ 2.
Then, clearly, for any 0 Let us check that all the conditions of the stochastic sewing lemma (Proposition 3.2) are satisfied. Note that E s δ A s,u,t = 0, and so condition (3.9) trivially holds, with C 2 = 0. To establish (3.8), let s ∈ [k/n, (k+ 1)/n) for some k ∈ {0, . . . , n − 1}. Suppose first that t ∈ [(k + 4)/n, 1]. We write The bound for I 1 is straightforward: by conditional Jensen's inequality, the definition of C α norm, and Proposition 3.6 (i) we have where the last inequality follows from the fact that n −1 ≤ |t − s|. Now let us estimate I 2 . Using Proposition 3.6 (iii), we derive To bound I 21 , we apply Proposition 3.7 (ii) with β = 0, δ = 1 and Proposition 3.6 (iv). We get To deal with I 22 , we use Proposition 3.7 (i) with β = 1 and Proposition 3.6 (v). We deduce where in the second inequality we have also used that κ n (r )−s ≥ (r −s)/2. Combining (4.5) and (4.6), and taking again into account that n −1 ≤ |t − s|, we get

Recalling (4.3), we finally conclude
It remains to show the same bound for t ∈ (s, (k + 4)/n]. However this is almost straightforward. We write where the last inequality uses that in this case |t − s| ≤ 4n −1 . Thus, (3.8) holds, with Thus all the conditions of the stochastic sewing lemma are satisfied. The process is also F-adapted, satisfies (3.11) trivially (the left-hand side is 0), and which shows that it also satisfies (3.10). Therefore by uniqueness A t =Ã t . The bound (3.12) then yields precisely (4.1).
Let b ∈ C α and X n be the solution of (1.4).
Proof Without loss of generality, we assume α < 1. Let Let us apply the Girsanov theorem (Theorem 3.10) to the function u(t) = b(X n κ n (t) ). First let us check that all the conditions of this theorem hold.
First, we obviously have |u(t)| ≤ b C 0 , and thus (3.26) holds with M = b C 0 . Second, let us check condition (3.27) in the case H > 1/2. Fix λ > 0 and small δ > 0 such that α(H − δ) > H − 1/2; such δ exists thanks to the assumption α > 1 − 1/(2H ). We apply Lemma 3.11 for the function f := b(X n ) and ρ := α(H − δ). We have Therefore, where we used the fact that the Hölder constant Hence all the conditions of Theorem 3.10 hold. Thus, there exists a probability measure P equivalent to P such that the process Now we can derive the desired bound (4.7). We have Taking into account (4.8), we deduce by Theorem 3.10 that Hence, using (4.1), we can continue (4.9) in the following way: which implies the statement of the theorem.

A regularization lemma
The goal of this subsection is to establish the regularization bound (4.26

Proof of Lemma 4.3
Fix p ≥ 2. We will apply Theorem 3.3 to the process As usual, we write A s,t := A t − A s . Let us check that all the conditions of that theorem hold with m = 4 It is very easy to see that Thus (3.13) holds. By Proposition 3.6 (iii) and Proposition 3.7 (i) we have for some and condition (3.15) is met. We want to stress here that the constant N 1 here does not depend on p (this happens thanks to the a.s. bound (4.11); it will be crucial later in the proof) Thus, it remains to check conditions (3.14) and (3.16). Fix 0 ≤ s ≤ u ≤ t ≤ 1. Using Proposition 3.6 (iii), we get (4.12) Note that by Proposition 3.6 (ii), the random vector E u B H r − E s B H r is independent of F s . Taking this into account and applying the conditional Minkowski inequality, we get where for x ∈ R d , r ∈ [u, t] we denoted By Proposition 3.6 (ii), the random vector E u B H r − E s B H r is Gaussian and consists of d independent components with each component of mean 0 and variance v(s, u, t) (recall its definition in (3.23)). Hence Proposition 3.7 (iv) yields now for some Substituting this into (4.13), we finally get 14) for some N 3 = N 3 (d, α, H ) where we used that, by assumptions, H α − 1/2 > −1.
In a similar manner we check (3.16). We have Thus, and the constant 2(N 2 3 + N 2 4 ) does not depend on p. Therefore condition (3.16) holds. Thus all the conditions of Theorem 3.3 hold. The statement of the theorem follows now from (3.17).
To establish the regularization bound we need the following simple corollary of the above lemma.  N (d, α, H , δ) such that for any p ≥ 2, s, t ∈ [0, 1], The corollary follows now immediately from Proposition 3.8.
The next lemma provides a pathwise version of bound (4.17). It also allows to replace fractional Brownian motion by fractional Brownian motion with a drift.

(4.26)
Proof We begin with assuming further that Let us apply the deterministic sewing lemma (Proposition 3.1) to the process Let us check that all the conditions of the above lemma are satisfied. First, the process A is clearly continuous, since f is bounded. Then, using Lemma 4.6 with M := 4R, we derive that for any S ≤ s ≤ u ≤ T there exists a ran- Since, by (4.27), . Thus, all the conditions of Proposition 3.1 hold. By setting nowÃ Thus, the processÃ satisfies (3.7) and therefore coincides with A. Proposition 3.1 implies now that for any where the bound on |A s,t | follows again from Lemma 4.6. By putting in the above bound s = S and t = T and using that |ψ On the other hand, we have the following trivial bound. Therefore, By Chebyshev inequality and (4.21), we finally get (4.26) for the case of smooth f . Now we are ready to remove the extra assumption on the smoothness of f . Let us set f n = P 1/n f ∈ C ∞ . By applying the statement of the lemma to f n and using that (4.28) If α > 0, then f n (x) → f (x) for all x ∈ R d and the claim follows by Fatou's lemma. So we only have to consider the case α = 0. Clearly, it suffices to show that for each r > 0, almost surely as n → ∞. Notice that almost surely f n (B H r ) → f (B H r ) as n → ∞, since the law of B H r is absolutely continuous (for r > 0). Moreover, since α = 0, we have by assumption that H < 1/2. By Proposition 3.10 (recall that ϕ satisfies (4.18), therefore is Lipschitz) there exists a neasure equivalent to P under which B H + ϕ is a fractional brownian motion. Consequently, for all r > 0, almost surely as n → ∞. With the same reasoning we obtain that almost surely f n (B H r + ψ r ) → f (B H r + ψ r ). The lemma is now proved.

Proof of Theorem 2.1
Proof Without loss of generality we assume α = 1. Let us denote Fix ε > 0 such that By assumption (2.5) such ε exists. Fix now large enough p ≥ 2 such that Fix 0 ≤ S ≤ T ≤ 1. Then, taking into account (4.7), for any S ≤ s ≤ t ≤ T we have Let M ≥ 1 be a parameter to be fixed later. We wish to apply Lemma 4.7 with ψ n in place of ϕ, 1 2 + H (α − 1) − ε in place of ε, and τ := 1/2 + ε/2. Let us check that all the conditions of this lemma are satisfied. First, we note that by (4.29) we have 1 2 + H (α − 1) − ε > 0, which is required by the assumptions of the lemma. Second, we note that 1/2 + ε/2 > H (1 − α) thanks to (2.5), thus this choice of τ is allowed. Next, it is clear that ψ 0 and ψ n 0 are deterministic. Further, since the function b is bounded, we see ψ and ψ n satisfy (4.18). Finally, let us verify that ψ satisfies (4.19). If H < 1/2, this condition holds automatically thanks to the boundedness of b. If H ≥ 1/2 then pick H ∈ (0, H ) such that Note that such H exists thanks to assumption (2.5). Then, by definition of ψ, we clearly have Therefore for any λ > 0 we have By taking ρ := 1 + α H and recalling (4.32), we see that ρ > H + 1/2 and thus condition (4.19) holds. Therefore all conditions of Lemma 4.7 are met. Applying this lemma, we get where the last inequality follows from the Kolmogorov continuity theorem and (4.30). Using this in (4.31), dividing by |t −s| 1/2+ε and taking supremum over S ≤ s ≤ t ≤ T , we get for some Fix now m to be the smallest integer so that N 1 Mm −1/2−ε/2 ≤ 1/2 (we stress that m does not depend on n). One gets from (4.33) (4.34) and thus Starting from S = 0 and applying the above bound k times, k = 1, . . . , m, one can conclude

Substituting back into (4.34), we get
It follows from the definition of m that m ≤ 2N 2 1 M 2−ε . At this point we choose ε 0 = ε/2 and note that for some universal constant N 2 one has Thus, we can continue (4.35) as follows.
Fix now δ > 0 and choose N 4 = N 4 (δ) such that for all M > 0 one has It remains to notice that by choosing M > 1 such that one has Substituting back to (4.36) and since X − X n = ψ − ψ n , we arrive to the required bound (2.6).

Malliavin calculus for the Euler-Maruyama scheme
In the multiplicative standard Brownian case, we first consider Euler-Maruyama schemes without drift: for any y ∈ R d define the processX n (y) by dX n t (y) = σ (X n κ n (t) (y)) d B t ,X n 0 = y. given F s , which can be obtained from bounds of the density ofX n t . A trivial induction argument yields that for t > 0,X n t indeed admits a density, but to our knowledge such inductive argument can not be used to obtain useful quantitative information.

Remark 5.1
While the densities of Euler-Maruyama approximations have been studied in the literature, see e.g. [5,6,18], none of the available estimates suited well for our purposes. In [18], under less regularity assumption on σ , L p bounds of the density (but not its derivatives) are derived. In [5,6], smoothness of the density is obtained even in a hypoelliptic setting, but without sharp control on the short time behaviour of the norms.
We will prove Theorem 5.2 via Malliavin calculus. In our discrete situation, of course this could be translated to finite dimensional standard calculus, but we find it more instructive to follow the basic terminology of [35], which we base on the lecture notes [21].

Definitions
One can obtain a scalar product from · H , which we denote by ·, · H . Let us also denote I = {1, . . . , n} × {1, . . . , d}. One can of course view H as a copy of R I , with a rescaled version of the usual 2 norm. We denote by e (i,k) the element of H whose elements are zero apart from the i-th one, which is the k-th unit vector of R d . Set Then for any R-valued random variable X of the form where F is a differentiable function, with at most polynomially growing derivative, the Malliavin derivative of X is defined as the H -valued random variable For multidimensional random variables we define D coordinatewise. In the sequel we also use the matrix norm on R d×d defined in the usual way M := sup x∈R d ,|x|=1 |M x|. Recall that if M is positive semidefinite, then one has M = sup x∈R d ,|x|=1 x * M x. It follows that · is monotone increasing with respect to the usual order on the positive semidefinite matrices.
The following few properties are true in far larger generality, for the proofs we refer to [21]. One easily sees that the derivative D satisfies the chain rule: namely, for any differentiable G : R d → R, one has D G(X ) = ∇G(X ) · D X . The operator D is closable, and its closure will also be denoted by D, whose domain we denote by W ⊂ L 2 ( ). The adjoint of D is denoted by δ. One then has that the domain of δ is included in W(H ) and the following identity holds:

Stochastic difference equations
First let us remark that the Eq. (5.1) does not define an invertible stochastic flow: indeed, for any t > 0, y →X n t (y) may not even be one-to-one. Therefore in order to invoke arguments from the Malliavin calculus for diffusion processes, we consider a modified process equation that does define an invertible flow. Unfortunately, this new process will not have a density, but its singular part (as well as its difference from the original process) is exponentially small. Take a smooth function : R → R such that | (r )| ≤ |r | for all r ∈ R, (r ) = r for |r | ≤ (4 σ C 1 d 2 ) −1 , (r ) = 0 for |r | ≥ (2 σ C 1 d 2 ) −1 , and that satisfies |∂ k | ≤ N for k = 0, . . . , 3 with some N = N (d, σ C 1 ). Define the recursion, for x ∈ R d and j = 1, . . . , n, k = 1, . . . , d By our definition of , for any j, ,m=1,...,d; j=1,...,n satisfies the recursion It is also clear that D m i X k j = 0 for j < i, while for j > i we have the recursion W (i,m) ).
From now on we will usually suppress the dependence on x in the notation. Save for the initial conditions, the two recursions coincide for the matrix-valued processes J · and D i X · . Since the recursion is furthermore linear, j → J −1 j D i X j is constant in time for j ≥ i ≥ 1. In particular, ,m) ) k,m=1,...,d , or, with the notation J i, j = J j J −1 i , ,m) ) k,m=1,...,d .
Let us now define the eventˆ ⊂ bŷ as well as the (matrix-valued) random variables D i, j by Clearly, onˆ one has D i, j = D i X j . Note that for fixed j, m one may view D ·,m ·, j as an element of H , while for fixed i, j one may view D i, j as a d × d matrix. One furthermore has the following exponential bound onˆ .

Proposition 5.3
There exist N and c > 0 depending only on d and σ C 1 , one has Proof For each (i, k) ∈ I, since W (i,k) is zero mean Gaussian with variance n −1 , one has with some N and c > 0 depending only on d and σ C 1 , by the standard properties of the Gaussian distribution. Therefore, by the elementary inequality We now fix ( j, k) ∈ I, G ∈ C 1 , and we aim to bound |E∂ k G(X j )| in terms of t := j/n and G 0 , and some additional exponentially small error term. To this end, we define the Malliavin matrix M ∈ R d×d with m, q = 1, . . . , d. As we will momentarily see (see (5.21 One then has by the chain rule that onˆ , ∂ k G(X j ) = D G(X j ), Y H . Therefore, (5.6)

Recalling (5.3), one has
Theorem 5.2 will then follow easily once we have the appropriate moment bounds of the objects above. Recall the notation t = j/n.

Lemma 5.4
Assume the above notations and let σ satisfy (2.8). Then for any p > 0, one has the bounds Proof As before, we omit the dependence on x ∈ R d in order to ease the notation. We first bound the moments of sup j J j . Recall that we have the recursion J j = J j−1 (I + j/n ), (5.12) where the matrix t = ( t ) d q,k=1 is given by By Itô's formula it follows that Consequently, for j = 0, . . . , n we have that J j = Z j/n , where the matrix-valued process Z t satisfies Notice that there exists a constant N = N ( σ C 1 , C 2 ) such that almost surely, for This bound combined with the fact that Z t satisfies (5.14) imply the bounds We now bound the moments of sup j J −1 j . By (5.12) we get Recall that for t ∈ [( j − 1)/n, j/n] and that by the definition of and (5.13), for all t ∈ [0, T ], the matrix I + t is invertible. Hence, by Itô's formula, we have for t ∈ [( j − 1)/n, j/n] Moreover, by definition or , almost surely, for all (t, x) ∈ [0, T ] × R d one has By (5.17) and (5.18), for j = 1, ..., n we have that J −1 j =Z j/n , where the matrix valued processZ t is defined by tZ κ n (t) dW s ,Z 0 = I .

By this and the bounds (5.19) we have the bounds
Finally, from (5.16) and (5.20) we obtain (5.8).
Next, we show (5.10). On the set of positive definite matrices we have that on one hand, matrix inversion is a convex mapping, and on the other hand, the function · p is a convex increasing mapping for p ≥ 1. It is also an elementary fact that if B λI , then (AB A * ) −1 ≤ λ −1 (A A * ) −1 . One then writes Therefore (5.10) follows from (5.8) We now move to the proof of (5.11). First of all, notice that the above argument yields for all p > 0. Indeed, the proof of this is identical to the proof of (5.16) since (D i X j ) j≥i has the same dynamics as (J j ) j≥0 and initial condition We start with a bound for sup r D i D r , j . By definition of D i, j we have that where for A ∈ (R d ) ⊗2 , B ∈ (R d ) ⊗3 , the product AB or B A is an element of (R d ) ⊗3 that arises by considering B as a d × d matrix whose entries are elements of R d . We estimate the term D i J j . As before, we have that D i J j = D i Z j/n , where Z is given by (5.14). We have that D i Z t = 0 for t < i/n while for t ≥ i/n the process D i Z t =: By the chain rule and (5.22) it follows that for p > 0 there exists N = This combined with (5.16) shows that for the 'free terms' of (5.25) we have This, along with (5.15) and (5.16), implies that This in turn, combined with (5.20) and the boundedness of σ , implies that Next, by the chain rule we have By (5.16), (5.20), (5.27), and the boundedness of σ , we see that We proceed by obtaining a similar bound for the second term at the right hand side of (5.23). First, let us derive a bound for D i M . For each entry M m,q of the matrix M we have Then, notice that onˆ , for > j we have D , j = D X j = 0. Hence, by taking into account (5.9) and (5.28) we get Summation over m, q gives which by virtue of (5.9), (5.10), and (5.30) gives This combined with (5.29), by virtue of (5.23), proves (5.11). This finishes the proof.

Proof of Theorem 5.2
Proof Recalling that Y i = 0 for i > j, we can write, using (5.9) and (5.10), One also has Therefore, by (5.7), we have the following bound on the main (first) term on the right-hand side of (5.6) As for the other two terms, Proposition 5.3 immediately yields while for I 2 we can write Therefore, by (5.6), we obtain and since onˆ , one has X j =X n j/n =X n t , the bound (5.2) follows.

Quadrature estimates
Suppose that σ satisfies (2.8) and thatX n :=X n (y) is the solution of (5.1). Then for all f ∈ C α , 0 ≤ s ≤ t ≤ 1, n ∈ N, one has the bound t s ( f (X n r ) − f (X n κ n (r ) )) dr L p ( ) ≤ N f C α n −1/2+2ε 1 |t − s| 1/2+ε 1 , (6.1) Proof It clearly suffices to prove the bound for p ≥ 2, and, as in [10], for f ∈ C ∞ . We put for 0 ≤ s ≤ t ≤ T Then, clearly, for any 0 and so condition (3.9) trivially holds, with C 2 = 0. As for (3.8), let s ∈ [k/n, (k+1)/n) for some k ∈ N 0 . Suppose first that t ∈ [(k + 4)/n, 1]. We write |A s,t | = |I 1 + I 2 | := For I 2 we write, Next, denote by p the density of a Gaussian vector in R d with covariance matrix and let P f = p * f (recall that for θ ≥ 0, we denote p θ := p θ I , where I is the d × d identity matrix). With this notation, we have we have Moreover, notice that by (2.8) we have for a constant N = ( σ C 1 , α) Let us use the shorthand δ = r − κ n (r ) ≤ n −1 . We can then write with summation over i implied. It is well known that (6.5) Furthermore, with the notation (z) := σ σ (x − z), we have where for the last inequality we have used (2.8). Therefore, by (6.4), (6.5), and (6.6) we see that One also has the trivial estimate P ε g C 0 ≤ 2 f C 0 , and combining these two bounds yields for all β ∈ [−1, 0). Note that the restriction ofX n t (·) to the gridpoints t = 0, 1/n, . . . , 1 is a Markov process with state space R d . Therefore we can write Since g ∈ C α/2 we have that (I + )u = g where u ∈ C 2+(α/2) and Hence, by combining (6.8), (5.2), (6.9), (6.7), and (6.3), we get Putting this back into (6.2) one obtains where we have used that n −1 ≤ |t − s|. The bound for I 1 is straightforward: Therefore, It remains to show the same bound for t ∈ (s, (k + 4)/n]. Similarly to the above we write using that |t − s| ≤ 4n −1 and ε 1 < 1/2. Thus, (3.8) holds with C 1 = N f C α n −1/2+2ε 1 . From here we conclude the bound (6.1) exactly as is Lemma 4.1.

Lemma 6.2
Let α ∈ [0, 1], take ε 1 ∈ (0, 1/2). Let b ∈ C 0 , σ satisfy (2.8), and X n be the solution of (1.7) . Then for all f ∈ C α , 0 ≤ s ≤ t ≤ 1, n ∈ N, and p > 0, one has the bound Proof Let us set and define the measureP by dP = ρdP. By Girsanov's theorem, X n solves (5.1) with aP-Wiener processB in place of B. Since Lemma 6.1 only depends on the distribution ofX n , we can apply it to X n , to bound the desired moments with respect to the measureP. Going back to the measure P can then be done precisely as in [10]: the only property needed is that ρ has finite moments of any order, which follows easily from the boundedness of b and (2.8).

A regularization lemma
The replacement for the heat kernel bounds from Proposition 3.7 is the following estimate on the transition kernelP of (1.6). Similarly to before, is the solution of (1.6) with initial condition X 0 (x) = x.
Then, taking into account (6.10), for any S ≤ s < t ≤ T , we have We wish to apply Lemma 6.4, with ϕ = ϕ n + Q n + R n . It is clear that for sufficiently small ε 2 > 0, τ = 1/2 − ε 2 satisfies (6.12). Therefore, By (6.14), for sufficiently large p, we have Putting these in the above expression, and using τ < 1/2 repeatedly, one gets with some ε 5 > 0. Combining with (6.15), dividing by |t − s| τ and taking supremum over s < t ∈ [S, T ], we get Fix an m ∈ N (not depending on n) such that N m −ε 5 ≤ 1/2. Whenever |S−T | ≤ m −1 , the second term on the right-hand side of (6.16) can be therefore discarded, and so one in particular gets (6.17) and thus also ϕ n T L p ( ) ≤ N ϕ n S L p ( ) + N X − X n L p ( ×[0,T ]) + N n −1/2+ε .
Iterating this inequality at most m times, one therefore gets We can then write, invoking again the usual estimates for the stochastic integrals Q n , Gronwall's lemma then yields where the function C was defined in (2.2). This implies the statement of the proposition.
(ii). We have is a Gaussian random variable independent of F s . It is of mean 0 and variance c 2 (s, t) − c 2 (u, t). This implies the statement of the lemma.
(iii). It suffices to notice that the random vector B H t − E s B H t is Gaussian, independent of F s , consists of d independent components, and each of its components has zero mean and variance (iv). One can simply write by the Newton-Leibniz formula since by our assumption on s, u, t, for all r ∈ [u, t] one has r − s ≤ t − s ≤ 2(r − s).
(v). It follows from (2.1) that Therefore, by the Burkholder-Davis-Gundy inequality one has where the last inequality follows from the fact that by the assumption u −s ≥ (t −s)/2.

Proof of Proposition 3.7 (i). Case β ≤ α:
There is nothing to prove since Case β = 0, α < 0: The bound follows immediately from the definition of the norm.
Case α = 0, β ∈ (0, 1]: By differentiating the Gaussian density we have Consequently, which implies that This, combined with the trivial estimate P t f C 0 ≤ f L ∞ give the desired estimate. Case 0 < α < β < 1: We refer the reader to [17,Lemma A.7] where the estimate is proved in the Besov scale. The desired estimate then follows from the equivalence B γ ∞,∞ ∼ C γ for γ ∈ (0, 1). Case α ∈ (0, 1), β = 1: We have which again combined with P t f C 0 ≤ f C 0 proves the claim. Case α < 0, β ∈ [0, 1]: (ii). Fix δ ∈ (0, 1] such that δ ≥ α 2 − β 2 . Then we have where the last inequality follows from the facts that r ≥ s and r ≥ r − s, and that both of the exponents in the penultimate inequality are nonpositive thanks to the conditions on δ. This yields the statement of (ii).
(iii). First let us deal with the case H ≤ 1/2. Then the bound follows easily by applying part (ii) of the proposition with δ = 1/2. Indeed, for any 0 ≤ s ≤ u ≤ t we have where we also used the fact that Note that by convexity of the function z → z 2H one has for any 0 ≤ z 1 ≤ z 2 Hence for 0 ≤ s ≤ u ≤ t we have Now we are ready to obtain the desired bound. We have P c 2 (s,t) f − P c 2 (u,t) f C β ≤ P c 2 (s,t) f − P k(s,u,t) f C β + P k(s,u,t) f − P c 2 (u,t) f C β ≤ I 1 + I 2 . (A.4) We bound I 1 and I 2 using part (ii) of the proposition but with different δ. First, we apply part (ii) with δ = 1 4H ∨ (α/2 − β/2). Recalling (A.3), we deduce Applying now part (ii) with δ = 1/2, we obtain This, combined with (A.4) and (A.5) implies the desired bound for the case H > 1/2. (iv). We begin with the case H ≤ 1/2. Then, applying part (i) of the theorem with β = 1, we deduce for 0 ≤ s ≤ u ≤ t ≤ 1 Hence for any p ≥ 2 we have where the last inequality follows from the bound (A.1) and the definition of the random variable ξ . This completes the proof for the case H ≤ 1/2. Now let us deal with the case H ∈ (1/2, 1). Fix 0 ≤ s ≤ u ≤ t ≤ 1. Let η and ρ be independent Gaussian random vectors consisting of d independent identically distributed components each. Suppose that for any i = 1, . . . , d we have Eη i = Eρ i = 0 and It is clear that 2 (u,t) f (x + η) L p ( ) + P c 2 (u,t) f (x + η) − P c 2 (u,t) f (x + η + ρ) L p ( ) =: I 1 + I 2 .