Quasilinear SPDEs via rough paths

We are interested in (uniformly) parabolic PDEs with a nonlinear dependance of the leading-order coefficients, driven by a rough right hand side. For simplicity, we consider a space-time periodic setting with a single spatial variable: \begin{equation*} \partial_2u -P( a(u)\partial_1^2u - \sigma(u)f ) =0 \end{equation*} where $P$ is the projection on mean-zero functions, and $f$ is a distribution and only controlled in the low regularity norm of $ C^{\alpha-2}$ for $\alpha>\frac{2}{3}$ on the parabolic H\"older scale. The example we have in mind is a random forcing $f$ and our assumptions allow, for example, for an $f$ which is white in the time variable $x_2$ and only mildly coloured in the space variable $x_1$; any spatial covariance operator $(1 + |\partial_1|)^{-\lambda_1 }$ with $\lambda_1>\frac13$ is admissible. On the deterministic side we obtain a $C^\alpha$-estimate for $u$, assuming that we control products of the form $v\partial_1^2v$ and $vf$ with $v$ solving the constant-coefficient equation $\partial_2 v-a_0\partial_1^2v=f$. As a consequence, we obtain existence, uniqueness and stability with respect to $(f, vf, v \partial_1^2v)$ of small space-time periodic solutions for small data. We then demonstrate how the required products can be bounded in the case of a random forcing $f$ using stochastic arguments. For this we extend the treatment of the singular product $\sigma(u)f$ via a space-time version of Gubinelli's notion of controlled rough paths to the product $a(u)\partial_1^2u$, which has the same degree of singularity but is more nonlinear since the solution $u$ appears in both factors. The PDE ingredient mimics the (kernel-free) Krylov-Safanov approach to ordinary Schauder theory.


Introduction
We are interested in the parabolic PDE (1) ∂ 2 u − P (a(u)∂ 2 1 u − σ(u)f ) = 0 for a rough driver f . The coefficients a, σ are assumed to be regular and uniformly elliptic, see (20) below for precise assumptions, and P is the projection on mean-zero functions. For the right hand side f we only assume control on the low regularity norm of C α−2 in the parabolic Hölder scale for α ∈ ( 2 3 , 1) (see (19) for a precise statement). The optimal control on u one could aim to obtain under these assumption is in the C α norm but in this regularity class there is no classical functional analytic definition of the singular products a(u)∂ 2 1 u and σ(u)f . In this article we assume that we have an "off-line" interpretation for 1 several products such as v∂ 2 2 v, vf (see (111)), where v solves the constant coefficient equation ∂ 2 v − a 0 ∂ 2 1 v = f and show that these bounds allow to control u. We are ultimately interested in a stochastic forcing f and in this case the required control of products can be obtained using explicit moment calculations to capture stochastic cancelations.
Our method is similar in spirit to Lyons' rough path theory [14,13,15]. This theory is based on the observation that the analysis of stochastic integrals t 0 u(s)dv(s) (2) for irregular v, such as Brownian motion or even lower regularity stochastic processes, can be conducted efficiently by splitting it into a stochastic and a deterministic step. In the stochastic step the integral (2) is defined for a single well-chosen functionū, e.g. v itself. In the case where v =ū is a (multidimensional) Brownian motion there is a one-parameter family of canonical definitions for these integrals, with the Itô and the Stratonovich notions being the most prominent ones. Information on this single integral suffices to give a subordinate sense to integrals for a whole class of functions u with similar small-scale behaviour. This line of thought is expressed precisely in Gubinelli's notion of a controlled path [4,Definition 1]. There, a function u in the usual Hölder space C α , α ∈ ( 1 3 , 1 2 ), is said to be controlled byū ∈ C α if there exists a third function σ ∈ C α such that for all s, t ∈ R Loosely speaking, this means that the increments u(t) − u(s) of the function u can be approximated by those ofū , provided the latter are locally modulated by the amplitudes σ. In [4,Theorem 1] it is then shown that this assumption, together with a bound of the form t sū (r)dv(r) −ū(s)(v(t) − v(s)) |t − s| 2α , suffices to define the integral u(r)dv(r) and to obtain the bound t s u(r)dv(r) − u(s)(v(t) − v(s)) − σ(s) t s (ū(r) −ū(s))dv(r) |t − s| 3α .
The construction of the integrals (4) for the specific functionū can be accomplished under a less restrictive set of assumptions than required for the classical Itô theory. In many applications this construction can be carried out using Gaussian calculus without making reference to an underlying martingale structure. The construction makes very little use of the linear order of time and lends itself well to extensions to higher dimensional index sets. This last point was the starting point for Hairer's work on singular stochastic PDE -the observation that the variable t in the rough path theory could represent "space" rather than "time" was the key insight that allowed to define stochastic PDEs with non-linearities of Burgers type [6] and the KPZ equation [7]. The notion of controlled path was also the starting point for his definition of regularity structures [8] which permits to treat semilinear stochastic PDE with an extremely irregular right hand side, possibly involving a renormalisation procedure. Parallel to that, Gubinelli, Imkeller and Perkowski put forward a notion of paracontrolled rough paths [5], a Fourier-analytic variant of (3) which has also been used to treat singular stochastic PDE.
In this article we propose yet another higher-dimensional generalisation of the notion of controlled path, see Definition 1 below, and use it to provide a solution and stability theory for (1). This definition is an immediate generalisation of Gubinelli's definition (3) and also closely related to Hairer's notion [8,Definition 3.1] of a modelled distribution in a certain regularity structure. However, the definition comes with a twist because the quasilinear nature of (1) forces us to allow the realisation of the model, v(·, a 0 ) in our notation, to depend on a parameter a 0 , which ultimately corresponds to the variable diffusion coefficient a(u). In our theory the "off-line products" vf and v∂ 2 1 v play the role of the "off-line integral" ūdv above and the regularity assumption (4) is translated into a control on the commutators [v, (·) T ] ⋄ {∂ 2 1 v, f } := v({∂ 2 1 v, f }) T − (v ⋄ {∂ 2 1 v, f }) T , where (·) T denotes the convolution with a smooth kernel at scale T (see (17) and the discussion that follows it) and where we use the notation ⋄ to indicate that products are not classically defined and that their interpretations have to be specified. Furthermore, here and below we use the abbreviated notation [v, (·) T ] ⋄ {∂ 2 1 v, f } when we speak about [v, (·) T ] ⋄ ∂ 2 1 v and [v, (·) T ] ⋄ f simultaneously. Based on these assumptions we derive bounds in the spirit of (5) on the singular products a(u) ⋄ ∂ 2 1 u and σ(u) ⋄ f (see Lemma 2 and 4) which can also be seen as a (simpler) variant of Hairer's Reconstruction Theorem [8,Theorem 3.10]. We want to point out that our method completely avoids the use of wavelet analysis which features prominently in Hairer's proof of the Reconstruction Theorem. On the PDE side, in Lemma 5, we obtain an optimal regularity result on solution u of (1) based on a control of the commutators [a, (·) T ] ⋄ ∂ 2 1 u and [σ, (·) T ] ⋄ f . This result is similar in spirit to Hairer's Integration Theorem [8,Theorem 5.12]. Our proof mimics the Krylov-Safanov approach to Schauder theory [12] and therefore does not make reference to a parabolic heat kernel. The main deterministic results, Proposition 1 and Theorem 2 combine these ingredients to obtain existence and uniqueness results for the linear version of (1) (i.e. a and σ do not depend on u, Proposition 1) and for (1) under a small data assumption (Theorem 2). We want to point out that the deterministic analysis does not depend on the assumption of a 1 + 1 dimensional space and would go through completely unchanged if ∂ 2 − a(u)∂ 1 were replaced by a uniformly parabolic operator over R n × R.
On the stochastic side, we consider a class of stationary Gaussian distributions f of class C α−2 . This class includes, for example, the case where f is "white" in the time-like variable x 2 and has covariance operator (1 + |∂ 1 |) −λ 1 for λ 1 > 1 3 in the x 1 variable, or the case where the noise is constant in the time-like variable x 2 and has covariance operator (1 + |∂ 1 |) −λ 1 for λ 1 > − 5 3 for the x 1 variable (see the end of Section 3 for a more detailed discussion of admissible f ). For such f we construct the generalized products v ⋄ ∂ 2 1 v and v ⋄ f as limits of renormalized smooth approximations: More precisely, let ψ ′ be an arbitrary Schwartz function with ψ ′ = 1 and set ψ ε (x 1 , x 2 ) = 1 ).
Then we set see Proposition 2 below. In many of the examples we consider, the expectations of the regularized products c (1) (ε, a 0 ) = v ε (·, a 0 )f ε and c (2) (ε, a 0 , a ′ 0 ) = v ε (·, a 0 )∂ 2 1 v ε (·, a ′ 0 ) diverge as ε goes to zero (the precise form of these constants is given in Lemma 7). The renormalization procedure can be avoided if f satisfies the additional stronger regularity assumption (144) which holds, for example, if f is "white" in x 1 and "trace-class" in x 2 .
(i) The renormalized commutators [v, (·) T ]⋄f and [v, (·) T ]⋄∂ 2 1 v defined via the limits (6) exist and satisfy the bounds as well as analogous bounds if v is replaced by its derivatives with respect to a 0 , a ′ 0 . The convergences of the renormalized products take place almost surely and in every stochastic L p space with respect to the norm sup λ<a 0 ≤1 sup λ<a ′ 0 ≤1 · C α−2 . Here f = sup x∈R 2 |f (x)| denotes the supremum norm and a norm for C α−2 is defined in (19) below.
Proof of Theorem 1. The bound (7) is proved as (130) in Lemma 6, and the bounds (8) and (9) as well as the convergence almost surely and in every stochastic L p space are proved in Proposition 2. For (ii) we set then ηf ε , η 2 [v ε , (·) T ]f ε etc. satisfy the smallness condition (110) and (111) uniformly in ε. The bound (10) is a consequence of Proposition 2 and the conclusion of (ii) is contained in part (i) of Theorem 2. For part (iii), we have already seen that ηf ε , η 2 [v ε , (·) T ]f ε etc. satisfy the smallness assumptions (110) and (111) uniformly in ε, and the convergence of the u ε to u follows from a combination of Lemma 6, Proposition 2 and Theorem 2 (ii). The form of (14) follows from Corollary 3. Finally, part (iv) follows in the same way only replacing Proposition 2 by Corollary 5.
We finally mention that one week before posting this second version of our result, the article [2] was posted on the arXiv. In this article Furlan and Gubinelli study the equation where u = u(t, x) for x taking values in the two-dimensional torus, and ξ = ξ(x) is a white noise over the two-dimensional torus, which is constant in the time variable t. This noise term ξ is of class C −1− and therefore essentially behaves like our term f . They also define a notion of solution and prove short time existence and uniqueness of solutions for the initial value problem, as well as convergence for renormalized approximations similar to (14). Similar to the approach we present here, they locally approximate the solutions u by a family of solutions to constant coefficient problems. Their approach then deviates from ours and they implement their theory in the framework of paracontrolled distributions.
2. Deterministic analysis 2.1. Setup. Metric. The parabolic operator ∂ 2 − a 0 ∂ 2 1 and its mapping properties on the scale of Hölder spaces (i.e. Schauder theory) imposes its intrinsic (Carnot-Carathéodory) metric, which is given by see for instance [12,Section 8.5]. The Hölder semi norm [·] α is defined based on (15): Convolution. In order to define negative norms of distributions in the intrinsic way, it is convenient to have a family {(·) T } T >0 of mollification operators (·) T consistent with the relative scaling (x 1 , x 2 ) = (ℓx 1 , ℓ 2x 2 ) of the two variables dictated by (15). It will turn out to be extremely convenient to have in addition the semi-group property All is achieved by convolution with the semi-group exp(−T (∂ 4 1 − ∂ 2 2 )) of the elliptic operator ∂ 4 1 − ∂ 2 2 , which is the simplest positive operator displaying the same relative scaling between the variables as ∂ 2 − ∂ 2 1 and being symmetric in x 2 next to x 1 . We note that the corresponding convolution kernel ψ T is easily characterized by its Fourier transform ψ T (k) = exp(−T (k 4 1 + k 2 2 )); since the latter is a Schwartz function, also ψ T is a Schwartz function. The only two (minor) inconveniences are that 1) the x 1 -scale is played by T 1 4 (in line with (15) ) and that 2) ψ 1 (and thus ψ T ) does not have a sign. The only properties of the kernel we need are moments of derivatives: for all orders of derivative k = 0, 1, · · · and moment exponents α ≥ 0, as well as the fact that ψ(x)x 1 dx = 0. Estimates (18) follow immediately from the scaling and the fact that ψ 1 is a Schwartz function. In Lemma 11 we show however, that our main regularity assumption (19) on f as well as the bounds on the commutators do not depend on the specific choice of Schwartz kernel ψ. In particular, the statements ultimately do not depend on the semi-group property although this property plays an important part in the proofs.
Finite domain. We mimic a finite domain by imposing periodicity in both directions; w.l.o.g. we may set this scale equal to one. We will typically measure the size of the distribution f by the expression where the restriction T ≤ 1 reflects the period unity. With Lemma 9, cf Step 1, we have that this expression agrees with the standard definition of the norm of C α−2 .
Standing assumptions on the nonlinearities. There exists a constant λ > 0 such that We express the bound on the various norms of a and σ by the ellipticity contrast λ in order to have a single constant that measures the quality of the data. Note that the assumption σ ∈ [−1, 1] is only seemingly stronger than σ 1 λ since that constant can always be absorbed into the rhs f in the equation. These fairly high regularity assumptions intervene in the proof of Lemma 1, they could be slightly weakened in the sense of [4,Proposition 4], at the expense of a more complicated notation. Here and in the entire deterministic section means ≤ C with a constant C only depending on λ and the exponent α.

2.2.
Definitions and results. The following central definition is a straightforward generalization of Gubinelli's definition [4, Definition 1] of a "controlled path", a generalization from the time variable x 2 to multiple variables x, and to a "model" (v 1 , · · · , v I ) (in the language of Hairer [8]) that here may depend on an additional parameter a 0 . It states that the increments u(y) − u(x) of the function u can be approximated by those of several functions v i , if the latter are locally modulated by the amplitudes σ i and the functions a i that locally determine the value of the parameter a 0 . The functions σ i can therefore be interpreted as "derivatives" of u wrt v i . The increments of the linear function x 1 also have to be included because of α > 1 2 . In fact, since 2α > 1, given the model (v 1 , · · · , v I ) (as modulated by the functions a i ), the "derivatives" (σ 1 , · · · , σ I ) and ν determine u up to a constant. In our situation, we expect u and (v 1 , · · · , v I ) to be Hölder continuous with exponent not (much) larger than α, so that imposing closeness of the increments to order 2α contains valuable additional information. Definition 1. Let 1 2 < α < 1 and I ∈ N. We say that a function u is modelled after the functions (v 1 , · · · , v I ) of (x, a 0 ) according to the functions (a 1 , · · · , a I ) and (σ 1 , · · · , σ I ) provided there exists a function ν (which because of 2α > 1 is easily seen to be unique) such that is finite. Here and in the sequel we use Einstein's convention of summation over repeated indices.
Note that imposing (21) also for distant points x and y is consistent with periodicity despite the non-periodic term (y − x) 1 since by α ≥ 1 2 the latter is dominated by d 2α (x, y) for d(x, y) ≥ 1. Note also that (21) is reminiscent of a Hölder norm: In case of (σ 1 , · · · , σ I ) = 0, the finiteness of (21) implies that u is continuously differentiable in x 1 and that ν(x) = ∂ 1 u(x) so that M turns into the parabolic C 2α -norm of u. In this spirit, Step 1 in the proof of Lemma 2 shows that the modelledness constant M in (21) controls the (2α − 1)-Hölder norm of ν, provided x → σ i (x)v i (·, a i (x)) is α-Hölder continuous with values in C α . In addition, in the presence of periodicity, M also controls the α-Hölder norm of u and the supremum norm of ν, which are of lower order, cf Step 2 in the proof of Lemma 2.
The following lemma shows that the notion of modelledness in Definition 1 is well-behaved under sufficiently smooth nonlinear pointwise transformation; it will be used in the proof of Theorem 2. It is essentially identical to [4,Proposition 4], which in turn is a consequence of Taylor's formula; because of the minor modifications due to the presence of a more general model, we reproduce the proof. Lemma 1. i) Suppose that u is modelled after v according to a and σ with constant M. Let the function b be twice differentiable. Then b(u) is modelled after v according to a and µ : ii) Suppose that for i = 0, 1, u i is modelled after v i according to a i and σ i with constant M i . Suppose further that u 1 − u 0 is modelled after (v 1 , v 0 ) according to (a 1 , a 0 ) and (σ 1 , −σ 0 ) with constant δM. Let the function b be three times differentiable.
As discussed in the introduction, the main challenge in solving stochastic ordinary differential equations is to give a sense to integrals of the form (2). In the spirit of Hairer [8] we interpret this problem as giving a meaning to the product u∂ t v, which does not have a canonical functional analytic definition because both u and v are only Hölder continuous in the time variable t of exponent less than 1 2 , because they behave like Brownian motion. In view of the parabolic scaling, we encounter the same difficulty when giving a distributional sense to b⋄∂ 2 1 u when b and u are only Hölder continuous of exponent α < 1 (from now we use the non-standard notation b ⋄ ∂ 2 1 u instead of b ∂ 2 1 u to indicate that the definition of this product is non-standard).
As discussed in the introduction a main insight of Lyons' theory of rough paths, was the observation that such products can be defined provided u is controlled byū and the off-line productū∂ t v satisfies the bound (4), which can be rewritten as t ] ⋄ f )(s), that is, the expression on both sides of (4) amount to a commutator [ū, t ] of multiplication withū and integration, applied to a distribution ∂ r v. In our multidimensional framework, we replace integration 1 t−s t s by (smooth) averaging: In our set up, the role of the crucial "algebraic relationship" [4, (24)] from rough path theory is played by the following straightforward consequence of the semi-group property (17) cf (263) in the proof of Lemma 2. Note that it is (only the control of) [v, (·) T ] ⋄ f that relates the distribution v ⋄ f to the function v and the distribution f . For our quasilinear SPDE, we need to give a sense to the two singular products σ(u) ⋄ f and a(u) ⋄ ∂ 2 1 u, so in particular to products of the form u ⋄ f and b ⋄ ∂ 2 1 u, where u and b behave like the solution v of (∂ 2 − a 0 ∂ 2 1 )v = f . Hence we will need the two off-line products v ⋄ f and v⋄∂ 2 1 v. For simplicity, we split the argument into Lemma 3 dealing with the first and Lemma 4 with the second factor in the singular products. We will use Lemma 3, or rather Corollary 1, in order to pass from the definition of v ⋄ f and v ⋄ ∂ 2 1 v to the definition of u ⋄ f and b⋄∂ 2 1 v, respectively (since the distribution ∂ 2 1 v plays a role very similar to f , the lemma and the corollary are formulated in the notation of the former case). We will then use Lemma 4 to pass from b⋄∂ 2 1 v to b⋄∂ 2 1 u. Lemmas 2, 3, and 4, reveal a clear hierarchy of norms and measures of size: • Functions u are measured in terms of the Hölder semi-norm [u] α (the supremum norm σ of a function σ only intervenes in scaling-wise suboptimal estimates like (60) that rely on the periodicity or the constraint T ≤ 1 providing a large-scale cutoff, otherwise just as part of the product σ [a] α with the Hölder norm of a), • distributions are measured in the C α−2 -norm sup T ≤1 (T 1 4 ) 2−α f T , see Step 1 in the proof of Lemma 9 for this equivalence of norms, Equipped with this dictionary, Lemmas 3 and 4 can be seen to be very close to [4,Theorem 1]; in particular, (32) in Lemma 2 is very close to (28) in [4,Corollary 3]. The major difference is the multi-dimensional extension through (26). A minor difference coming from the parabolic nature is the appearance of the commutator [x 1 , (·) T ]f , which however is regular, cf Lemma 10. A further minor difference arises from the a 0dependence of the model v and the related appearance of the function a, which necessitates control of ∂ ∂a 0 -derivatives of the functions and the commutators and manifests itself via the evaluation operator E. However, this minor difference can be embedded into the more general form of the upcoming Lemma 2.
Lemma 2. Let 2 3 < α < 1. Suppose we have a family of functions {v(·, x)} x of class C α , parameterized by points x, a distribution f , and a family of distributions {v(·, x) ⋄ f } x , both of class C α−2 , satisfying for all pairs of points x, x ′ for some constants N, N 1 . Suppose we are given a function u such that for all pairs of points y, x for some constant M and some function ν. Then there exists a unique distribution u ⋄ f such that where E stands for the evaluation of the continuous function (x, y) → If moreover all functions and distributions are 1-periodic and we use the constant N to also estimate the lower-order expressions for all points x then also Equipped with Lemma 2, the upcoming Lemma 3 is more of a corollary that specifies the form of the model. The general form of Lemma 2 is in particular convenient for part ii) of Lemma 3, where the Lipschitz continuity of the product σ⋄f in terms of the off-line product v⋄f and the modulating property (both constant and modulating functions) is established.
for some constant N 1 . i) We consider a family of functions {v(·, a 0 )} a 0 and a family of distributions {v(·, a 0 ) ⋄ f } a 0 satisfying for some constant N 0 . We are given a function u modelled after v according to the α-Hölder functions a and σ with constant M and ν as in (21). Then there exists a unique distribution u ⋄ f such that where E evaluates a function of (x, a 0 ) at (x, a(x)). Furthermore, in case of and when all functions are 1-periodic we have the sub-optimal estimate ii) We consider two families of functions {v i (·, a 0 )} a 0 , i = 0, 1, and two families of for some constants N 0 and δN 0 . Suppose the function δu is modelled after (v 1 , v 0 ) according to the α-Hölder functions (a 1 , a 0 ) and (σ 1 , −σ 0 ) with δM and δν in analogy to (21). Then there exists a unique distribution δu ⋄ f such that where E i denotes the operator that evaluates a function of (x, a 0 ) at a 0 = a i (x). Furthermore, in case of (40) and when all functions are 1-periodic we have the sub-optimal estimate For the use in the proof of Theorem 2 it is very convenient to bring Lemma 3 into the form of Corollary 1. The difference between part i) and part iii) of the corollary on the one hand and part i) and part ii), respectively, of the lemma on the other hand is that the corollary allows for a distribution f that depends on an additional parameter a ′ 0 and establishes estimates on the a ′ 0 -derivatives. Part ii) of the corollary extends part i) of the lemma to two distributions f 1 and f 0 .
be two families of distributions satisfying (37) and for some constant N 1 . If u is modelled after v according to a and σ, satisfying (40), with constant M we have , j = 0, 1, be as in i) and suppose in addition for some constant δN 1 . Then for u as in i) we have iii) Let the two families of functions {v i (·, a 0 )} a 0 , i = 0, 1, and the three ,a ′ 0 be as in i) and satisfy in addition (42), (43). Suppose we have in addition Let u i be two functions like in part i) and suppose that u 1 − u 0 is modelled after (v 1 , v 0 ) according to (a 1 , a 0 ) and (σ 1 , −σ 0 ) with constant δM. Then we have We now turn to Lemma 4 that deals with the second factor in a ⋄ ∂ 2 1 u. The reason why we consider several functions v 1 , · · · , v I in Lemma 4 instead of a single one for our scalar PDE is that this seems necessary when establishing the contraction property for Proposition 1; because of the a 0 -dependence, it turns out that we need not just I = 2 but in fact I = 3, cf Corollary 2.
Lemma 4. Let 2 3 < α < 1. We are given a function b, I families of functions {v 1 (·, a 0 ), · · · , v I (·, a 0 )} a 0 , and I families of distributions for some constants N 0 , · · · , N I . Let the function u be modelled after (v 1 , · · · , v I ) according to the α-Hölder functions a and (σ 1 , · · · , σ I ) with constant M, cf Definition 1. Then there exists a unique distribution b ⋄ ∂ 2 1 u such that on the level of the commutators lim where E denotes the operator that evaluates a function in two variables (x, a 0 ) at (x, a(x)). Moreover, provided [a] α ≤ 1, we have the suboptimal estimate The following lemma is the only place where we use the PDE. It might be seen as an extension of Schauder theory in the sense that it compares, on the level of C 2α , the solution u of a variable-coefficient equation ∂ 2 u − a⋄∂ 2 1 u = σ⋄f to the solutions of the corresponding constantcoefficient equation (62), by saying that u is modelled after v according to a and σ. To this purpose we apply (·) T to the equation and rearrange to Since the previous lemmas estimate the commutators on the rhs, we will right away assume that the lhs is estimated accordingly, cf (63). Working with the commutator of multiplication with a coefficient a and convolution is reminiscent of the DiPerna-Lions theory, which however deals with a transport instead of a parabolic equations with a rough coefficient, that is ∂ 2 u − a∂ 1 u instead of ∂ 2 u − a∂ 2 1 u. In our proof, we follow the approach to classical Schauder theory of Krylov & Safanov, see [12], in particular Section 8.6. This approach avoids the use of kernels.
Lemma 5. Let 1 2 < α < 1 and suppose all functions and distributions are periodic. Were are given I families of distributions {f 1 (·, a 0 ), · · · , f I (· · · , a 0 )} a 0 with for some constants N 1 , · · · , N I . For a 0 ∈ [λ, 1 λ ] we denote by v i (·, a 0 ) the function of vanishing mean solving We are also given a function u, modelled after (v 1 , · · · , v I ) according to some functions a ∈ [λ, 1 λ ] and (σ 1 , · · · , σ I ). We assume that u approximately satisfies the PDE ∂ 2 u − P a∂ 2 1 u = P σ i Ef i in the sense of sup for some constant N, where E is defined as in Lemma 4. Then we have for the modelling and the Hölder constant of u In Corollary 2, we will combine Lemma 4 on the product a ⋄ ∂ 2 1 u and Lemma 5 to obtain an a priori estimate on the modelling and Hölder constants. The use of the "infinitesimal" part ii) of this corollary will be explained in the discussion of Proposition 1.
Corollary 2. Let 2 3 < α < 1. i) Suppose we are given two functions σ and a, two distributions f and σ ⋄ f , and a family of distributions where v(·, a 0 ) denotes the mean-free solution of and satisfying the constraints Then if a function u is modelled after v according to a and σ with we have for the modelling and Hölder constants [u] α N 0 (N + 1).
ii) In addition, suppose we are given two functions δσ and δa, three distributions δf , σ ⋄ δf , and δσ ⋄ f , and two families of distributions for some constants δN 0 , δN and where δv is the mean-free solution of Then if a function δu is modelled after (v, ∂v ∂a 0 , δv) according to a and (δσ, σδa, σ) with then we have for the modelling and Hölder constants The following Proposition 1 may be seen as the main contribution of this paper. It establishes a solution theory for the linear equation ∂ 2 u − P (a ⋄ ∂ 2 1 u + σ ⋄ f ) = 0 for given driver f (a distribution) and functions σ and a. Because of the roughness of f , it does not only require an definition of σ ⋄ f but also of a ⋄ ∂ 2 1 v = P f , so that when u is modelled after v according to a and σ, also a⋄∂ 2 1 u may be given a sense by Lemma 4. The most subtle point is to establish Lipschitz continuity of u in the data (a, a ⋄ ∂ 2 1 v). This involves considering differences of solutions and quantifying u 1 − u 0 is modelled after (v 1 , v 0 ) according to (a 1 , a 0 ) and (σ 1 , −σ 0 ).
When quantifying differences of solutions, variable coefficients require a somewhat different strategy compared to constant coefficients, as we shall explain now. The modelledness (85) has to come from the PDE, that is, Lemma 5. The naive approach is to consider the difference of the PDE for two given pairs of data (σ i , a i , f i ), i = 0, 1, (plus the products), and to rearrange as follows which already means breaking the permutation symmetry in i = 0, 1 and therefore does not bode well. By the modelledness of u 1 we expect that for the purpose of Lemma 5, we may replace u 1 by v 1 on the rhs of (86), leading to In view of Lemma 5 and the discussion preceding it, this suggests that we obtain which is not the desired (85) unless a 1 = a 0 . Instead, our strategy will be to construct a curve {u s } s∈[0,1] interpolating between u 0 and u 1 . For this, we interpolate the data linearly, that is, Provided we interpolate the products bi-linearly, that is, and the same definition for a s ⋄∂ 2 1 v s , Leibniz' rule for σ s ⋄f s holds, and we expect it to hold for a s ⋄ ∂ 2 1 u s so that differentiation of (89) gives , which in view of (89) we approximate by according to a s and (∂ s σ, σ s ∂ s a, σ s ), which should be compared with (88). Using Leibniz' rule once more, but this time in the classical form of and integrating (91) in s ∈ [0, 1] yields the desired (85). We note that this strategy differs from [4] even in case when a is constant: When passing from the modelledness of u 1 − u 0 to the modelledness of σ(u 1 ) − σ(u 0 ), the argument in [4,Proposition 4] uses the linear interpolation u s = su 1 + (1 − s)u 0 (as we do in Lemma 1), which implicitly amounts to the interpolation σ s ⋄f s = sσ 1 ⋄f 1 + (1 − s)σ 0 ⋄f 0 , as opposed to (90).
Proposition 1. Let 2 3 < α < 1. i) Suppose we are given two functions σ and a, two distributions f and σ ⋄ f , and a family of distributions Then there exists a unique mean-free function u modelled after v according to a and σ and such that The modelling and Hölder constants are estimated as follows ii) Suppose we are given four functions σ i and a i , satisfying the assumption (92), (93), (94), and (95), the two latter with cross terms, that is, and (96). We suppose in addition that for some constants δN 0 , δN. Let u i denote the corresponding solutions ensured by part i). Then u 1 − u 0 is modelled after (v 1 , v 0 ) according to (a 1 , a 0 ) and (σ 1 , −σ 0 ) with modelling constant and Hölder norm estimated as follows We now proceed to Theorem 2, the main deterministic result of this paper. It can be seen as a PDE version of the ODE result in [4,Section 5]. Part i) of the theorem provides existence and uniqueness by a contraction mapping argument, corresponding to [4, Proposition 7]; part ii) provides continuity of the fixed point in the model, the analogue of the Lyons' sense of continuity for the Itô map and corresponding to [4,Proposition 8].
for some constant N 0 ≪ 1; denote by v(·, a 0 ) the mean-free solution of (∂ 2 − a 0 ∂ 2 1 )v = P f . Suppose further that we are given a one-parameter family of distributions v(·, a ′ 0 )⋄f and a two-parameter family of distri- ). Then there exists a unique mean-free function u with the properties u is modelled after v according to a(u) and σ(u), (112) under the smallness condition This unique u satisfies the estimate where M denotes the modelling constant in (112).
ii) Now suppose we have two distributions f j , j = 0, 1, with We measure the distance of f 1 to f 0 in terms of a constant δN 0 with and (σ(u 1 ), −σ(u 0 )) with modelling constant δM estimated by It remains to establish a link between the solution theory presented in Theorem 2 and the classical solution theory in the case where f is smooth, e.g. f ∈ C β for any 0 < β < 1. In this case by classical In the language of Hairer [8,Sec. 8.2], this corresponds to the canonical model built from a smooth noise term. The only assumption on the products v(·, a ′ 0 ) ⋄ {f, ∂ 2 1 v(·, a 0 )} entering the definition of the singular products are the regularity bounds (111) expressed in terms of commutators and they are easily seen to be satisfied in this case. For example we have which is much more than needed. However, the canonical definition (123) is by no means the only possible choice of product. In fact, as for a one-parameter family of distributions g (1) indexed by a 0 and a twoparameter family g (2) indexed by a 0 , a ′ 0 . For this choice of "products" ⋄ the commutators turn into This mild assumption leaves a lot of freedom to choose g (i) (any distribution of order 2α − 2 that is smooth in the parameter would do) but we are mostly interested in the case where they are constant in x depending only on a 0 and a ′ 0 . The following corollary provides a link between solutions of (113) and classical solutions in the case where the the products ⋄ are defined by (125).
Corollary 3. Let f be a function in C β for some 0 < β < 1 and let the products v(·, a ′ 0 ) ⋄ {f, ∂ 2 1 v(·, a 0 )} be defined by (125) for g (1) , g (2) which are of class C β in x and smooth in a 0 , a ′ 0 . Then the following are equivalent: i) u is modelled after v according to a(u) and σ(u) and solves

Stochastic bounds
We now present the stochastic bounds which are necessary as input into our deterministic theory. We consider a random distribution f , construct (renormalized) commutators, and show that the bounds (19) and (111) hold for these objects. The calculations in this section are inspired by a similar reasoning (in a more complicated situation) in [11,Sec. 5], [8,Sec. 10]; for the reader's convenience we provide selfcontained proofs.
Let f be a Gaussian centered distribution which is 1-periodic in both the x 1 and the x 2 direction. Such a distribution is most conveniently represented in terms of its Fourier series development given by which converges in a suitable topology on distributions. Here the Z k are complex-valued centered Gaussians which are independent except for the symmetry constraint Z k =Z −k and satisfy Z k Z −ℓ = δ k,ℓ . The coefficients Ĉ are assumed to be real-valued, non-negative, and symmetric Ĉ (k) = Ĉ (−k). This notation is chosen because in the case where realisations from f are (say smooth) functions the coefficients in (127) do coincide with the square root of the Fourier transform of the covariance function as we now demonstrate. If, using the stationarity of f , we define the covariance function , then stationarity also implies that Hence the (discrete) Fourier transform is real valued and symmetric. For k, ℓ ∈ (2πZ) 2 we have which implies in particular thatĈ is non-negative, and exactly corresponds to (127).
The construction of non-linear functionals of f involves regularisation. For this, let ψ ′ be an arbitrary Schwartz function with R 2 ψ ′ = 1. As in the deterministic part we define the rescaling ψ ′ ε (x 1 , Of course, ψ ′ = ψ 1 for ψ 1 as in the deterministic analysis constitutes an admissible choice, but in the following analysis of stochastic moments the semi-group property for ψ ′ is not needed and we therefore do not need to restrict ourselves to this particular choice. Throughout this section we assume thatĈ(0) = 0, i.e. f has vanishing average. Our quantitative assumptions on the regularity of f are expressed in terms ofĈ: We assume that there exist λ 1 , λ 2 ∈ R and α ∈ (0, 1) such that The second condition, may seem confusing, because larger values of λ, corresponding to more smoothness for f , should help our theory. The point is here, that decay in one of the directions beyond summability cannot compensate for a lack of decay in the other direction. In order to use the bounds presented in Lemma 6 and Proposition 2 as input for the deterministic theory in Section 2 we need α > 2 3 , but this condition does not play a role in the proof of these bounds. The following lemma shows that assumption (129) corresponds to the regularity assumption (19) on f .
Lemma 6. Let f be a stationary centered Gaussian distribution given by (127) and for ε > 0 set f ε = f * ψ ′ ε . If the assumption (129) holds then we have for any p < ∞ and α ′ < α Here and in the proof the implicit constant in depends only on p and α ′ .
Because of (f ε ) T = f T * ψ ′ ε and because the operators ψ ′ ε * are bounded with respect to · uniformly in ε the bound (130) immediately implies a bound which holds uniformly in the regularisation ε For a 0 ∈ [λ, 1] let G(·, a 0 ) be the (periodic) Green function of (∂ 2 − a 0 ∂ 2 1 ), where the heat operator is endowed with periodic and zero average time-space boundary conditions. Its (discrete) Fourier transform is given by With these notations in place, the periodic zero-mean solutions of ( We aim at giving a meaning to the products v(·, a 0 )⋄f , v(·, a 0 )⋄∂ 2 1 v(·, a ′ 0 ) and obtaining bounds for the families of commutators [v(·, a 0 ), (·) T ]⋄f , is obtained from f through a regularity-preserving transformation, as can be expressed in terms of the Fourier transform and noting that Lemma 9). Therefore, the proofs for v(·, a 0 )⋄f and v(·, a 0 )⋄∂ 2 1 v(·, a ′ 0 ) are essentially identical. The list of commutators needed for the deterministic analysis also includes various derivatives with respect to a 0 and a ′ 0 , but these derivatives do not change the regularity either. For example we have for any n ≥ 1 and for every n the symbol (−1) n n!k 2n 1 (a 0 k 2 1 −ik 2 ) n is also bounded. As the regularities of v(·, a 0 ), f , ∂ 2 1 v(·, a 0 ) are not sufficient to give a deterministic functional analytic interpretation to these products, we proceed by approximation and study the convergence of v ε (·, a 0 )f ε , v ε (·, a 0 )∂ 2 1 v ε (·, a ′ 0 ) as ε goes to zero by bounding stochastic moments. As a first step, in the following lemma we calculate the expectations of v ε (·, a 0 )f ε and v ε (·, a 0 )∂ 2 1 v ε (·, a ′ 0 ), which by stationarity do not depend on the point x ∈ [0, 1) 2 they are evaluated at.
The regularity assumption (129) does not imply that the constants c (1) (ε, a 0 ) and c (2) (ε, a 0 , a ′ 0 ) converge to a finite limit as ε tends to zero, although there are interesting cases in which they do converge. This is discussed below, but for the moment we study the convergence of the renormalized products as well as the corresponding commutators, Observe that while the singular products appearing in this expression, v ε (·, a 0 )f ε and v ε (·, a 0 )∂ 2 1 v ε (·, a ′ 0 ), are renormalized by subtracting the expectation, the products v ε (·, a 0 )(f ε ) T and v ε (·, a 0 )(∂ 2 1 v ε (·, a ′ 0 )) T are not changed. In particular, unlike the renormalized products in (136) the renormalized commutators in (137) do not have vanishing expectation.
The key result of this section is the following proposition which shows the convergence of the renormalized products and provides a control for stochastic moments of the renormalized commutators in (137) as well as their derivatives with respect to a 0 , a ′ 0 . Proposition 2. Let f be a stationary centered Gaussian distribution given by (127) and assume that (129) is satisfied.
This convergence takes place almost surely uniformly over a 0 , a ′ 0 and with respect to any C α ′ −2 norm for α ′ < α. We denote the limits by ∂ n where means up to a constant that may depend on n, n ′ , α ′ and p, as well as where means up to a constant depending only on n, n ′ , α ′ , κ and p.
Proposition 2 follows from the following estimate on the second moments of commutators.
In the proof of Proposition 2 this Lemma is used in the form of the following immediate corollary: (137).
Finally, we come back to the products and commutators without renormalization. According to Lemma 7 the constants c (1) (ε, a 0 ) converge to a non-trivial limit if and only if Furthermore, given that the ratio of the kernels appearing in (134) and is bounded away from 0 and ∞ the convergence of the c (2) (ε, a 0 , a ′ 0 ) as ε goes to zero is also equivalent to (144). The condition (144) also implies the convergence for arbitrary derivatives of c (1) , c (2) with respect to a 0 , a ′ 0 . For example, recalling (133) and the fact that the term is nothing but the real part R ofĜ(k, a 0 ) we can write Given that for any n ≥ 1 the absolute value of the quantity under the real part R is the convergence as ε → 0 under (144) follows.
A similar argument works for c (2) . We summarise this discussion in the following corollary.
Corollary 5. Assume that both (129) and (144) hold. Then the statements of Proposition 2 remain true if all of the renormalized products are replaced by products without renormalisation.
The limits which exist under the assumptions of this corollary will be denoted by We finish this section by comparing the assumptions (129) and (144) in particular cases. First consider the casê For this choice ofĈ the regularity assumption (129) is equivalent to (note that equality is not necessary in the first condition, because in the case of strict inequality, one can find λ ′ 1 ≤ λ 1 and λ ′ 2 ≤ λ 2 that satisfy (129) with equality. However, λ 1 ≤ −3 + 2α or λ 2 ≤ −2 + 2α can never be compensated without violating the second condition in (129)) The condition (144) on the other hand is equivalent to For any α ∈ (0, 1) the first requirements in (146) is weaker than the corresponding assumptions in (147). An interesting case in which both assumptions are satisfied and for which our theory can therefore be applied without renormalisation is the case where λ 1 > 1 and λ 2 = 0; this corresponds to the case of noise which is white in the time-like variable x 2 but "trace-class" in x 1 . However, if we are willing to accept renormalisation, the regularity requirement in the x 1 direction reduces to λ 1 > 1 3 (recall that the deterministic analysis is applicable if α > 2 3 ). Another interesting case is the covariancê which corresponds to a noise term which only depends on the space-like x 1 variable. The parabolic equations with constant diffusion coefficients driven by such a noise term has recently been studied as Parabolic Anderson model in two and three spatial dimensions [5,10,9,1]. Our theory applies without renormalisation for all λ 1 > −1, which covers in particular the case of one-dimensional spatial white noise, λ 1 = 0. If we admit renormalisation we can go all the way to λ 1 > − 5 3 by choosing λ 2 < 2 and α > 2 3 as close to 2 and 2 3 , respectively, as we please. This covers the case λ 1 = −1 for which the noise f has the same scaling behaviour as spatial white noise in two dimensions (both are distributions of regularity C −1− ) but it does not cover the case λ 1 = −2 for which the noise scales like spatial white noise in three dimensions.

Proofs for the deterministic analysis
Proof of Theorem 2.
We write for abbreviation [·] = [·] α . We consider the map defined through where u is the solution provided by Proposition 1, the map of which we seek to characterize the fixed point.
Step 1. Pointwise nonlinear transformation, application of Lemma 1. We work under the assumptions of part ii) of the theorem on the distributions f j and the off-line products where Mū i denotes the constant in the modelledness ofū i after v i according toā i andσ i , and where Mū 1 −ū 0 denotes the constant in the modelledness ofū 1 −ū 0 after (v 1 , v 0 ) according to (ā 1 ,ā 0 ) and (σ 1 , −σ 0 ).
We now consider σ i := σ(ū i ) and a i := a(ū i ). We claim where we define in analogy with (150) and (151): with the understanding that σ i is modelled after v i according toā i and ω i := σ ′ (ū i )σ i and constant M σ i ; that a i is modelled after v i according toā i and µ i := a ′ (ū i )σ i and constant M a i ; that σ 1 − σ 0 is modelled after (v 1 , v 0 ) according to (ā 1 ,ā 0 ) and (ω 1 , −ω 0 ) and a constant we name M σ 1 −σ 0 ; and that a 1 − a 0 is modelled after (v 1 , v 0 ) according to (ā 1 ,ā 0 ) and (µ 1 , −µ 0 ) and a constant we name M a 1 −a 0 .
It is obvious from (20) that (149) turns into (152) under the assumption max i [ū i ] ≪ 1. Estimate (153) follows from part i) of Lemma 1 with u replaced byū i and the generic nonlinearity b replaced by σ and by a, respectively, (using our assumptions (20)). More precisely, (153) follows from (22) by [ū i ] ≤ 1. We now turn to (154), which by definitions (151) of δM and (156) of δM and because of N 0 ≤ 1 we may split into the four statements where we also used the definition (150) ofM . This is a consequence of part ii) of Lemma 1 with (ū i ,σ i ,ā i ) playing the role of (u i , σ i , a i ). The first two estimates follow from replacing the generic nonlinearity b by σ, the last two estimates from replacing it by a. The first and the third estimate are a consequence of (24), the second and fourth one of (25), in which we use (152). It is on all four we use our full assumptions (20) on the nonlinearities σ and a.
Step 2. Using the off-line products, application of Corollary 1. We claim that under the hypothesis of part ii) of the theorem on the distributions f j and the off-line products v i ⋄ f j and v i ⋄ ∂ 2 1 v j we have the commutator estimates This is an application of Corollary 1 with (N 1 , δN 1 ) = (N 0 , δN 0 ). Estimate (157) is an application of Corollary 1 i) with u replaced by σ i ; the hypotheses (48) and (49) The arguments for (160), (161), and (162) follow the same lines of those for (48), (158), and (159), respectively. The only difference is that in all instances, the distribution f j is replaced by the family of distributions ∂ 2 1 v j (·, a 0 ) (and a i plays the role of u in Corollary 1). Hence the hypotheses (48) and (51) in Corollary 1 turn into This follows from Step 1 in the proof of Corollary 2 via (18).
Step 3. Application of Proposition 1. We claim that under the hypothesis of part ii) of the theorem regarding the distributions f j and where we define in consistency with (150) and (151) Indeed, (163) and (164) whereas the outcome (109) of the proposition assumes the form By definition (156) of δM we have The combination of the last three statement yields (165) in view of definition (168).
Step 4. Under the assumptions of part ii) of the theorem on the distributions f j and the off-line products v i ⋄ f j and v i ⋄ ∂ 2 1 v j , Step 1 and Step 3 obviously combine to Step 5. Contraction mapping argument. We work under the assumptions of part ii) of the theorem on the distributions f j and the off-line In this step, we specify to the case of a single model f 1 = f 0 =: f with the corresponding constant-coefficient solution v; this means that we may set δN 0 = 0.
We consider the space of all triplets (ū,ā,σ), whereū is modelled after v according toā andσ, which fulfill the constraints (149), and which satisfyM cf. (150), for some constant N to be fixed presently. We apply Step 4 to (f i ,ā i ,σ i ) = (f,ā,σ). From (173) and the definition (150) ofM we learn that the proviso of (169) is fulfilled provided the constant N is sufficiently small, which we now fix accordingly. We thus learn from (169), which by (173) assumes the form of M N 0 , that the map defined through (148) sends the set defined through (173) into itself, provided N 0 ≪ 1.
Hence the map (148) is a contraction for N 0 ≪ 1. We further note that the space of above triplets (u, a, σ) endowed with the distance function (174) is complete; and that the subset defined through the constraints (149) and (173) is closed. Hence by the contraction mapping principle the map (148) admits a unique fixed point on the set defined through (149) and (173).
Step 6. Conclusion on part i) of the theorem. Let u now be as in part i) of the theorem. We note that the assumptions of part i) on the distribution f and the off-line products v ⋄ f, v ⋄ ∂ 2 1 v turn into the assumptions of part ii) with δN 0 = 0. We claim that (u, a(u), σ(u)) =: (u, a, σ) is a fixed point of the map (148), which is obvious, and which lies in the set defined through the constraints (149) and (173), and therefore is unique. (20) and (114), the constraints (149) are satisfied. The constraint (173) would be an immediate consequence of the stronger statement (115) (provided N 0 is sufficiently small). We thus turn to this a priori estimate (115). We apply Step 4 to (f i ,ā i ,σ i ) = (f, a(u), σ(u)). Since we are dealing with fixed points, we haveM = M. By the theorem's assumption [u] ≪ 1, the provisos of (169) and (170) are satisfied so that because of N 0 ≪ 1, their application yields By definition (163) this turns into (115).
Step 7. Conclusion on part ii) of the theorem. Let u i , i = 0, 1, now be as in part ii) of theorem. By Step 6, the two triplets (u i , a(u i ), σ(u i )) =: (u i , a i , σ i ) satisfy the constraints (149) and (173) and each triplet is a fixed point of "its own" map (148) (which depends on i through the model f i ). We apply Step 4 to (f i ,ā i ,σ i ) = (f i , a(u i ), σ(u i )). Since we are dealing with fixed points, we haveM = M and δM = δM. By the a priori estimate (115) and N 0 ≪ 1, the two provisos of Step 4 are satisfied. Because of N 0 ≪ 1, (171) and (172)  Proof of Proposition 1. We write for abbreviation [·] = [·] α . When a function v depends on a 0 next to x, we continue to write v when we mean sup a 0 v(·, a 0 ) and [v] for sup a 0 [v(·, a 0 )]. When we speak of a function u, we automatically mean that it is Hölder continuous with exponent α, that is, [u] < ∞; when we speak of a distribution f , we imply that it is of order α − 2 in the sense of sup T ≤1 (T 1 4 ) 2−α f T < ∞. When a distribution depends on the additional parameter a 0 , we imply that the above bound is uniform in a 0 .
Step 1. Uniqueness. Under the assumptions of part i) of the proposition we claim that there is at most one mean-free u modelled after v according to a and σ satisfying the equation (97). Indeed, let u ′ be another function with these properties; we trivially have by Definition 1 that u − u ′ is modelled after v according to a and to 0 playing the role of σ. We now apply Lemma 4 with b replaced by a. We apply it three times, namely to u, to u ′ , and to u − u ′ . We obtain from these three versions of (59) and the triangle inequality that Hence we obtain from taking the difference of the equations: We may also say that u − u ′ is modelled after 0 playing the role of v and 0 playing the role of σ; we call δM the corresponding modelling constant. Hence we may apply Corollary 2 i) with f = 0 and thus N 0 = 0. We apply it with u replaced by u − u ′ (and thus M by δM), which we may thanks to (176). In this context, the output (73) of Corollary 2 assumes the form δM = 0. Since u − u ′ is periodic, we first infer δν = 0 and then u − u ′ = const. Since u − u ′ has vanishing average, we obtain as desired u − u ′ = 0.
Step 2. A special regularization. Under the assumptions of Lemma 4 and for τ > 0 and i = 1, · · · , I we consider the convolution v iτ of v i and define Then, we claim that for any function u of class C α+2 , which is modelled after (v 1τ , · · · , v Iτ ) according to a and (σ 1 , · · · , σ I ), we have Indeed, by Lemma 4 (with b replaced by a) we understand the distribution a ⋄ ∂ 2 1 u as defined by lim We note that (177) implies which ensures that [a, (·) T ] ⋄ ∂ 2 1 v iτ → [a, (·) τ ] ⋄ ∂ 2 1 v i as T ↓ 0 uniformly in x for fixed a 0 . Thanks to the bound on the ∂ ∂a 0 -derivative in (58), this convergence is even uniform in (x, a 0 ), so that (179) turns into Since u is of class C α+2 , this further simplifies to lim T ↓0 from which we learn that the distribution a⋄∂ 2 1 u is actually the function given by (178).
Step 3. Existence in the regularized case. Under the assumptions of part i) of this proposition and in line with Step 2, for τ > 0 we consider the mollification f τ of f , so that v τ satisfies (∂ 2 − a 0 ∂ 2 1 )v τ = P f τ , and complement definition (177) (without the index i) by Then we claim that there exists a mean-free u τ of class C α+2 modelled after v τ according to a and σ such that and at the same time We first turn to the existence of (183) and start by noting that the rhs −σE[a, for a 0 = a(0). Using the invertibility of the constant-coefficient operator ∂ 2 − a 0 ∂ 2 1 on periodic mean-free functions, and equipped with the corresponding Schauder estimates, see for instance [12,Theorem 8.6.1] lifted to the torus, we see that a solution of class C α+2 exists, using a contraction mapping argument based on a − a 0 ≪ 1. Since both u τ and v τ (·, a 0 ) are in particular of class C α+1 , u is modelled after v τ according to -in fact anya and σ. By Step 2 and definition (181) we see that (183) may be rewritten as (182).
Step 4. Basic construction. We now work under the assumptions of part ii) of the proposition. We interpolate the functions σ i , a i , and v i as well as the distribution f i linearly: σ s := sσ 1 + (1 − s)σ 0 and same for a, f , and v.
We note that this preserves (96). We interpolate the products bilinearly and same for a s ⋄ ∂ 2 1 v s , ∂ s a ⋄ ∂ 2 1 v s and a s ⋄ ∂ 2 1 ∂ s v.
Thanks to the estimate (101), which is preserved under bilinear interpolation, the family of distributions {a s ⋄ ∂ 2 1 v s (·, a 0 )} a 0 is continuously differentiable in a 0 so that we may define For given 0 < τ ≤ 1, we define the singular products with the regularized distributions as in Step 2, namely We claim that there exists a curve u τ s of mean-free functions continuously differentiable in s wrt to the class C α+2 such that u τ s is modelled after v sτ according to a s and σ s (188) and satisfies Furthermore, we claim that according to a s and (∂ s σ, σ s ∂ s a, σ s ) (190) and satisfies By Steps 3 and 1 and our definitions of σ s ⋄f sτ and a s ⋄∂ 2 1 v sτ by convolution, cf (187), there exists a unique mean-free u τ s of class C α+2 such that (188) and (189) hold. Furthermore by Step 2 u τ s is characterized as the classical solution of In preparation of taking the s-derivative of (192) we note that the definition (185) of σ s ⋄f s and a s ⋄∂ 2 1 v s by (bi-)linear interpolation ensures that Leibniz' rule holds: We recall that E s denotes the evaluation operator that evaluates a function of (x, a 0 ) at (x, a s (x)); with the obvious commutation rule [∂ s , E s ] = (∂ s a)E s ∂ ∂a 0 we obtain from (194) and (186) which in conjunction with the classical differentiation rules extends to the commutator: Equipped with (193) and (195) we learn from (192) that u τ s is differentiable in s with values in the class C α+2 and Moreover, like in Step 3, (190) holds automatically because of the regularity of ∂ s u τ and of (v sτ , ∂vsτ ∂a 0 , ∂ s v τ ). In view of the definition (187) of ∂ s a ⋄ ∂ 2 1 v sτ we have by Step 2 applied to u τ s modelled according to (188) In view of the similar definition of a s ⋄∂ 2 1 ∂ s v, a s ⋄∂ 2 1 ∂vsτ ∂a 0 , and a s ⋄∂ 2 1 ∂ s v τ we have by Step 2 applied to ∂ s u τ modelled according to (190) Plugging these two formulas and the definition (187) of ∂ s σ ⋄ f τ and σ s ⋄ ∂ s f τ into (196), we obtain (191).
Step 5. We now work under the assumptions of part ii) of the proposition. We claim It remains to pass from τ = 0 to 0 < τ ≤ 1 in the eight estimates of this step, based on our definition (187) of singular products. This is done with help of the next step.
Step 6. Let the (generic) function u and the (generic) distributions f and u ⋄ f be such that for some constants N 0 and N 1 . Then we claim that for τ ≤ 1 the distributions f τ and u ⋄ f τ := (u ⋄ f ) τ satisfy the same estimates: For this, we appeal to the semi-group property giving us so that by the boundedness of (·) T in · indeed (208) entails (210), appealing to (268) and using in addition that by (207) Step 7. Application of Corollary 2. We claim for the modelling and Hölder constants of u τ s and ∂ s u τ : For the remaining estimates (213) and (214), we apply Corollary 2 ii) with (δf, δv, δσ, δa, σ ⋄ δf, δσ ⋄ f, a ⋄ ∂ 2 Step 8. Integration. We claim that u τ 1 − u τ 0 is modelled after (v τ 1 , v τ 0 ) according to (a 1 , a 0 ) and (σ 1 , −σ 0 ) with the modelling constant and Hölder constant estimated as follows and on defining ν := 1 0 ν s ds, where ν belongs to u τ 1 − u τ 0 and ν s to ∂ s u τ in the sense of Definition 1. This provides the link between (213) and (215) by integration.
Step 9. Passage to limit. We claim that we may pass to the limit τ ↓ 0 in (211) and (212) with s = 0, 1, recovering (98) and (99) in part i) of this proposition, and in (215) and (216), recovering (108) and (109) in part ii) of the proposition. Clearly, from the uniform-in-τ estimate (212) (in conjunction with the vanishing mean of u τ i which provides the same bound on the supremum norm) we learn by Arzelà-Ascoli that there exists a subsequence τ ↓ 0 (unchanged notation) and a continuous mean-free function u i to which u τ i converges uniformly. Hence we may pass to the limit in the Hölder estimates (212) and (216). Since also the convolution v iτ converges to v i uniformly, we may pass to the limit in the estimates (211) and (215) of the modelling constants. By uniqueness, cf Step 1, it thus remains to argue that u i solves (97) (with (f, σ, a) replaced by (f i , σ i , a i )). In order to pass from (189) to (97) it remains to establish the distributional convergences The convergence (217) is build-in by the definition (187) through convolution. One of the ingredients for the convergence (218) is the analogue of (217) a 0 ), which in conjunction with the pointwise convergence of v iτ extends to the commutator a 0 ) is uniformly bounded, cf (101) and (187) in conjunction with a formula of type (180), we even have In order to relate this to (218) we appeal to the modelledness of u i wrt to v i according to a i and σ i which by (59) in Lemma 4 yields Likewise, the uniform modelledness of u τ i , cf (213), in conjunction with the uniform commutator bounds (95) and the uniform bounds on v iτ , we have, again by (59) in Lemma 4, the uniform convergence The combination of the three last statements implies which by the convergence of u τ i yields lim Now the next step shows that this implies (218).
Step 10. Suppose that the sequence {f n } τ ↓0 of uniformly bounded distributions satisfies We claim that this implies distributional convergence: Indeed, we have for fixed T > 0 and any τ ≤ T that f nT f nτ and therefore lim sup n↑∞ f nT lim sup n↑0 f nτ and lim sup n↑0 f nT lim τ ↓0 lim sup n↑0 f nτ . The latter is equal to zero by assumption.
Hence we have f nT → 0 for every T > 0, which yields the claim by the uniform boundedness of f n in the sense of sup T ≤1 (T 1 4 ) 2−α f T , and then also in the more classical C α−2 -norm, cf (336) in Step 1 of Lemma 9.
Step 1. Application of Lemma 9. We claim The estimate (220) is based on the two identities following from differentiating (70) twice wrt a 0 We now see that (220) follows by an iterated application of Lemma 9: From (67) we first obtain the bound on v by Lemma 9, then the bound on ∂ 2 1 v T by (18), then via (222) the bound on ∂v ∂a 0 by Lemma 9, then the bound on ∂ 2 1 ∂v T ∂a 0 by (18), then via (222) finally the bound on by Lemma 9. The argument for (221) is identical, just with (f, v) replaced by (δf, δv), cf (81), and starting from (76) instead of (67) and thus with N 0 replaced by δN 0 .
Step 2. Application of Lemma 4. We claim that Here comes the argument: Estimate (223) follows from Lemma 4 with b replaced by a, I = 1 and v i=1 = v, so that the hypothesis (57) is satisfied by (220) in Step 1 with N 0 playing the role of N i=1 . Hypothesis (58) is satisfied by our assumption (69) with N playing the role of N 0 . In view of (71), the outcome (60) of Lemma 4 turns into (223).
Estimate (224) follows from applying Lemma 4 with b replaced by δa, still I = 1, v i=1 = v, and N 0 playing the role of N i=1 . Hypothesis (58) is satisfied by our assumption (80) with δN playing the role of N 0 . In view of (71), the outcome (60) of Lemma 4 turns into (224).
Finally, estimate (225) follows from applying Lemma 4 with b again replaced by a, but this time I = 3 and (v 1 , v 2 , v 3 ) = (v, ∂v ∂a 0 , δv). We learn from Step 1 that hypothesis (57) is satisfied with (N 1 , N 2 , N 3 ) = (N 0 , N 0 , δN 0 ). We now turn to the hypothesis (58): For i = 1 it is contained in our assumption (69) with N playing the role of N 0 . In preparation of checking hypothesis (58) for i = 2 we note that our assumption (69) implies in particular that the family of distributions {a ⋄ ∂ 2 1 v(·, a 0 )} a 0 is continuously differentiable in a 0 . This allows us to define the family of distributions {a ⋄ ∂ 2 1 ∂v which extends to the commutator: Hence the hypothesis (58) for i = 2 is also satisfied by (69) (here we use it up to ∂ 2 ∂a 2 0 ). Hypothesis (58) for i = 3 is identical to our assumption (79). We apply Lemma 4 with δu playing the role of u; the triple (δσ, σδa, σ) then plays the role of (σ 1 , σ 2 , σ 3 ) and δM that of M. The outcome (60) of Lemma 4 assumes the form We note that by (71) and (75) we have so that (227) yields (225).
Step 4. Application of Lemma 5 and conclusion. We first apply Proof of Lemma 5. All functions are 1-period if not stated otherwise.
Step 1. Estimate of v i and ∂v i ∂a 0 . We claim This follows immediately from assumption (61) on f i and the definition (62) of v i via Lemma 9 and the argument of Step 1 of Corollary 2.
Step 2. Freezing-in the coefficients. We claim that we have for all points x 0 where the function g T x 0 is estimated as follows with the abbreviatioñ Indeed, making use of P 2 = P we write By definition (62) of v i (·, a 0 ), to which we apply (·) T , which we evaluate for a 0 = a(x 0 ), and which we contract with σ i (x 0 ) we have From the combination of (240) and (242) we obtain (237), so that it remains to estimate g T x 0 , cf (241). Making use of the assumption (63) we obtain so that by (18) and by assumption (61) which can be consolidated into the estimate (238).
Step 3. PDE estimate. Under the outcome of Step 2, we have for all points x 0 and radii R ≪ L where ℓ runs over all functions spanned by 1 and x 1 and · B R (x 0 ) denotes the supremum norm restricted to the ball B R (x 0 ) in the intrinsic metric (15) with center x 0 and radius R. This step mimics the heart of the kernel-free approach of Krylov & Safanov to the classical Schauder theory, see [12,Theorem 8.6.1]. Here comes the argument: Wlog we restrict to x 0 = 0 and write B R = B R (0) and · R := · B R . Let w > be the (non-periodic) solution of so that in view of (237), where we write P g T with the constant c given by c := − [0,1) 2 g T 0 . By standard estimates for the heat equation we have for any function ℓ L ∈ span{1, x 1 }. The interior estimate (247) is slightly non-standard because of the non-vanishing rhs c but can be easily reduced to the case of c = 0: First of all, replacing w by w − ℓ L in (245) and (247) we may reduce to the case of ℓ L = 0. Testing (245) with a cut-off function for B L that is smooth on scale L we learn that |c| L −2 w < L . We then may replace w by w + cx 2 which reduces the further estimate to the standard case of c = 0. We refer to [12,Theorem 8.4.4] for an elementary argument for (247) in case of c = 0 only relying on the maximum's principle via Bernstein's argument. We refer to [12,Exercise 8.4.8] for the statement (246) via the representation through the heat kernel. Since by construction, cf (244), we have u T −σ i (0)v iT (·, a(0)) = w < + w > we obtain by the triangle inequality for a suitably chosen ℓ R ∈ span{1, Inserting (247) for R ≪ L, and another application of the triangle inequality this yield where we recall that ℓ runs over span{1, x 1 }. Dividing by R 2α gives (243).
Step 4. Equivalence of norms. We claim that the modelling constant M of u is estimated by the expression appearing in Step 3: where we have set for abbreviation and where the maximal radius 4 is chosen such that a ball of half of that radius covers the periodic cell [0, 1] 2 . In fact, also the reverse estimate holds, highlighting once more that the modulation function ν in the definition of modelledness (Definition 1) plays a small role compared to σ i . The equivalence of (249) and (250) on the level of standard Hölder spaces is the starting point for the approach to Schauder theory by Krylov and Safanov, see [12,Theorem 8.5.2]. We first argue that the ℓ in (250) may be chosen to be independent of R, that is, Indeed, fix x 0 , say x 0 = 0, and let ℓ R = ν R x 1 + c R be (near) optimal in (250), then we have by definition of M ′ and by the triangle inequality Hence we may pass from (250) to (251) by the triangle inequality.
It is clear from (251) that necessarily for any x 0 , say x 0 = 0, the optimal ℓ must be of the form ℓ(x) = u(0) −σ i (0)v i (0, a(0)) −ν(0)x 1 . This establishes the main part of (249), namely the modelledness (21) for any "base" point x and any y of distance at most 4. Since B 4 (x) covers a periodic cell, we may use (21) for y = x + (1, 0) so that by y) for all y ∈ B 4 (x). Hence once again by periodicity of a(x))), (21) holds also for y ∈ B 4 (x).
Step 5. Modelledness implies approximation property. We claim that for any mollification parameter 0 < T ≤ 1, radius L, and point x 0 we have 1 where we recall the definition (239) ofÑ. Wlog we restrict ourselves to x 0 = 0, write v i (y, x) = v i (y, a(x)), and note that the first moment of ψ T vanishes We split the rhs into three terms: For the first rhs term we appeal to the modelledness assumption (21), which implies that the integrand is estimated by |ψ T (x−y)| M d 2α (x, y). Hence by (18) the integral is estimated by M (T Using the identity (and dropping the index i) (v(y, a(x)) − v(y, a(0))) − (v(x, a(x)) − v(x, a(0))) = (a(x) − a(0)) we see that the integrand of the third rhs term is estimated by In view of the definition (239) ofÑ 2 , this yields (253).
Step 6. Estimate of M. We claim that Indeed, we now may buckle and to this purpose rewrite (243) from Step 3 with help of the triangle inequality as We now insert (253) from Step 5 to obtain Here we have used that Relating the length scales T 1 4 and L to the given R ≤ 4 in (255) via T 1 4 = ǫR (so that in particular as required T ≤ 1 since we think of ǫ ≪ 1) and L = ǫ −1 R, taking the supremum over R ≤ 4 and x 0 yields by definition (250) By (249) in Step 4, this implies Since 0 < α < 1, we may choose ǫ sufficiently small such that the first rhs term may be absorbed into the lhs yielding the desired estimate M Ñ 2 (note that M < ∞ is part of our assumption).
Step 7. Conclusion. Clearly, (64) and (65) immediately follow from the combination of The first estimate is identical to (254) in Step 6 into which we plug the definition (239) ofÑ . The second estimate is an application of Step 2 in the proof of Lemma 2 with v(y, x) := σ i (x)v i (y, a i (x)), so that the hypothesis (33) holds with N replaced by σ i N i , cf (236) in Step 1.
Step 1. We claim Indeed, introducing ℓ x (y) := ν(x)y 1 we see that (31) can be rewritten as so that we obtain by the triangle inequality In combination with (28) this yields by the triangle inequality We now take the difference of this with (257) with x replaced by x ′ to obtain, once more by the triangle inequality, By definition of ℓ and with the choice of y = x and y ′ = x + (R, 0), this assumes the form With the choice of R = d(x, x ′ ) this turns into which amounts to the desired (256).
Step 2. We claim By the triangle inequality on (31) we obtain for all pairs of points y). Choosing y = x+(1, 0), appealing to the 1-periodicity of u, taking the supremum over x, and appealing to (33), this turns into the ν-part of (258): We now consider pairs of points (x, y) with d(x, y) ≤ 2. By the triangle inequality from (31) we get 1 d α (x, y) |u(x) − u(y)| M + N + ν .
By 1-periodicity, this extends to all pairs so that Inserting (259) into this yields the u-part of (258).
Step 3. Dyadic decomposition. For τ < T (with T a dyadic multiple of τ ) we claim that where the sum runs over the dyadic "times" t = T 2 , T 4 , · · · , τ . By telescoping based on the semi-group property (17) this reduces to and splits into the three statements Plugging in the definition of the commutator [ν, (·) t ], the middle statement reduces to By the definition of the commutator [E, (·) t ], the last statement reduces to Now identities (261), (262), and (264) follow immediately from the semi-group property.
Step 4. For τ < T ≤ 1 (with T a dyadic multiple of τ ) we claim the estimate Indeed, by the dyadic representation (260), the triangle inequality in · and the fact that (·) T −2t is bounded in that norm, cf (18), it is enough to show that the rhs term of (260) under the parenthesis is estimated by (M + N)N 1 (t 1 4 ) 3α−2 for all t ≤ 1; here we crucially use that by assumption 3α − 2 > 0 for the convergence of the geometric series. Using Step 1 to control [ν] 2α−1 in (266) by M + N, this estimate splits into Appealing to our assumptions (29) & (30) and to Lemma 10, these three estimates reduce to [E, (·) t ]ṽ sup wheref =f (y) plays the role of f t or [x 1 , (·) t ]f , andṽ =ṽ(x, y) plays the role of ([v(·, x), (·) t ] ⋄ f )(y), but now can be, like ν, generic functions; similarly, β plays the role of 2α−1 but could be any exponent in [0, 1]. Using the definition of E, we may rewrite these estimates more explicitly as All three estimates rely on the moment bounds (18), the first estimate is then an immediate consequence of (31) and the two last ones tautological.
Step 5. For we claim the estimates   (17) and (18)) by the triangle inequality and (29), again making use of T ≤ 1.
Step 6. Conclusion. Indeed by (271) in Step 5, the sequence {F τ } τ ↓0 of functions is bounded as distributions in C α−2 . By standard weak compactness based on the equivalence of norms from Step 1 in the proof of Lemma 9, there exists a subsequence τ n ↓ 0 and a distribution we give the name of u ⋄ f such that F τn ⇀ u ⋄ f . By standard lower semi-continuity, we may pass to the limit in (270) in Step 5 to obtain (35). Likewise, we may pass to the limit in (265) in Step 4 to obtain (32).
Proof of Lemma 4. The proof follows the lines of Steps 3 through 6 of the proof of Lemma 2.
Step 1. For τ < T (with T a dyadic multiple of τ ) we claim the formula where the sum runs over t = T 2 , T 4 , . . . , τ . By telescoping based on the semi-group property the formula reduces to and splits into the two statements By definition of the commutator [σ i , (·) t ], the last statement reduces to and by the definition of [E, (·) t ] further to Now (273) and (274) are consequences of the semi group property.
Step 2. We claim the estimate In view of (272) this estimate splits into Estimate (276) follows from (268) (with σ i playing the role of ν, 1 v i playing the role off , and β playing the role of α) and our assumption (58) (without ∂ ∂a 0 ). Estimate (277) from (269) (with [b, (·) t ] ⋄ ∂ 2 1 v i playing the role ofṽ and our assumptions (57) and (58) For (275) we write and Hence by the modelledness assumption of u, the triangle inequality d(z, x) ≤ d(z, y) + d(y, x), and (18) we obtain Plugging this into (278), we obtain using (18) once more as desired.
The further two steps are as Steps 5 and 6 in Lemma 2.
Proof of Lemma 3.
Step 1. Suppose that {v(·, a 0 )} a 0 and {v i (·, a 0 )} a 0 , i = 0, 1, are three families of functions and | · | a semi-norm on functions of x (like [·]) such that for some constants N 0 , δN 0 ; the reason for this more general framework is useful because we shall also apply it with v(·, a 0 ) replaced by [v(·, a 0 ), (·) T ]⋄f , the supremums norm · playing the role of | · |, and with (N 0 , δN 0 ) replaced by ( . We claim that this entails Estimate (282) follows immediately from (279). We treat (283), (284), and (285) along the same lines, which is a bit of an overkill for (283) and (284). We start with the two elementary, and purposefully symmetric, formulas and We use the first formula twice. The first application is for σ = σ(x) and σ ′ = σ(x ′ ), v = v(·, a(x)), and v ′ = v(·, a(x ′ )) to obtain using the triangle inequality In view of the assumption (279) this yields (283). The second application is for σ = σ 1 (x) and σ ′ = σ 0 (x), v = v 1 (·, a 1 (x)), and v ′ = v 0 (·, a 0 (x)). We obtain the inequality In view of the assumption (280), the first rhs term is estimated as desired. For the second rhs term we interpolate linearly in the sense of v s := sv 1 +(1 − s)v 0 and a s : from which we learn Inserting this into (288) and in view of the assumption (280)&(281) we obtain the remaining part of (284).
We use the second formula (287) for In order to deduce (285) from this inequality, in view of (290) and of our assumption (280) & (281), it remains to show for the second rhs terms We appeal again to the outcome (289) of the linear interpolation, which immediately yields the first rhs term (291) from the first rhs term in (289). For the second rhs term in (291), we appeal once more to formula (286) (applied to σ = ∂vs ∂a 0 (·, a s (x)), σ ′ = ∂vs ∂a 0 (·, a s (x ′ )), v = (a 1 − a 0 )(x), and v ′ = (a 1 − a 0 )(x ′ )).
Step 2. Argument for part i) of the lemma. We apply Lemma 2 to the family of functions v(·, x) := σ(x)v(·, a(x)) and the family of distributions v(·, x) ⋄ f := σ(x)v(·, a(x)) ⋄ f , both parameterized by x. We claim that the hypotheses (28) and (30) of Lemma 2 are satisfied with We also claim that in addition the hypotheses (33) and (34) of Lemma 2 are satisfied provided N is enlarged to Indeed, for (28) this follows from (283) of Step 1 with the Hölder semi-norm [·] playing the role of | · |. In the same vein, hypothesis (33) follows from (282). The relevant hypothesis (279) of Step 1 coincides with the assumption (37) of this lemma. For (30) this follows again from (283) but this time with [v(·, a 0 ), (·) T ] ⋄ f playing the role of v(·, a 0 ), the supremum norm · playing the role of | · |, and with N 1 N 0 (T Step 3. Argument for part ii) of the lemma. We apply Lemma 2 to the family of functions v(·, x) := σ 1 (x)v 1 (·, a 1 (x)) −σ 0 (x)v 0 (·, a 0 (x)) and the family of distributions v(·, x)⋄f := σ 1 (x) v 1 (·, a 1 (x))⋄f −σ 0 (x) v 0 (·, a 0 (x))⋄f , both parameterized by x. We claim that the hypotheses (28) and (30) of Lemma 2 are satisfied with We also claim that in addition the hypotheses (33) and (34) of Lemma 2 are satisfied provided N is enlarged to Indeed, for (28) this follows from (285)  Proof of Corollary 1.
Step 1. Generalization of part i) of Lemma 3. Let f j , j = 1, · · · , J, be finitely many distributions satisfying (36) and in addition for some scalars {c j } j=1,··· ,J . Suppose that next to (38) we have Then we claim in the situation of Lemma 3 i) Indeed, in view of (296) & (297) we may apply part i) of Lemma 3 to thus N 1 replaced by δN 1 ), yielding the existence of a distribution we It follows from the unique characterization of the product with u by the products with (v, x 1 ) through (39) in conjunction with our definition v ⋄ ( J j=1 c j f j ) = J j=1 c j v ⋄ f j and the linearity of the regular product in form of Hence (299) turns into (298).
Step 2. Conclusion for part i) and ii Step 3. Conclusion for part iii). From the unique characterization of u i ⋄ f through (39) and (u 1 − u 0 ) ⋄ f through (46), and the fact that by uniqueness of ν we have δν = ν 1 − ν 0 , it follows (u 1 − u 0 ) ⋄ f = u 1 ⋄f −u 0 ⋄f . Modulo this identity, the only new element in assumptions and conclusions of part iii) of this corollary over part ii) of Lemma 3 is the appearance of ∂ ∂a ′ 0 . The argument proceeds by establishing the linearity property analogous to Step 1, now on the level of part ii) of Lemma 3, and applying it to J j=1 c j f j = f (·, a ′ 0 ) − f (·, a ′′ 0 ), cf Step 2.
Proof of Lemma 1.
Step 1. Taylor's formulas. We start from two levels of Taylor's formula for the function b of u: Substituting u by su 1 + (1 − s)u 0 and u ′ by su ′ 1 + (1 − s)u ′ 0 , taking the derivative in s and integrating over s ∈ [0, 1] we obtain where the argument of b ′′ and of b ′′′ is given by s(ru ′ 1 + (1 − r)u 1 ) Step 2. Inequalities. We use the formulas from Step 1 in terms of the inequalities By smuggling in a term b ′ (u)(v ′ − v), we obtain from (301) by the triangle inequality Step 3. Application of inequalities. We apply (300) from Step 2 to u = u(x), u ′ = u(y), which yields and (304) We combine the latter with (306) to (22).
Proof of Corollary 3.
Step 1. Proof of (i) ⇒ (ii). As v is a C β+2 function the assumption that u is modelled after v according to a(u), σ(u) implies that u is of class C 2α , in particular ∂ 1 u is a function of class C 2α−1 (of course, as we will see below, u is actually of class C β+2 but we do not have this information to our disposal yet). Together with the regularity assumption on f this implies that there is a classical interpretation of the products σ(u)f and a(u)∂ 2 1 u the latter as a distribution. In fact, this is obvious for σ(u)f and for a(u)∂ 2 1 u we can set, for example, a(u)∂ 2 The claim then follows from standard parabolic regularity theory as soon as we have established that We first argue that (310) holds. To see this, first by Lemma 1 σ(u) is modelled after v according to a(u) and σ ′ (u)σ(u). Then, Lemma 3 characterizes σ(u) ⋄ f as the unique distribution for which By the C β regularity of f as well as the C 2α regularity of σ(u) one sees immediately that each of the commutators in this expression goes to zero if ⋄ is replaced by the classical product T (·, a(u)) = 0.
Since, g(·, a 0 ) ∈ C β by assumption, this yields (310). In the same way, one can see that for any a ′ 0 we have (the classical definition of a(u)∂ 2 1 v(·, a ′ 0 ) poses no problem because v is of class C β+2 ).
Step 2. Proof of (ii) ⇒ (i). If u as well as all the v(·, a 0 ) are of class C β+2 , then u is automatically modelled after v according to a(u) and σ(u). Thus we can conclude from Step 1 that (310) and (311) hold which in turn implies that u solves ∂ 2 u − P (a(u) ⋄ ∂ 2 1 u + σ(u) ⋄ f ) = 0 distributionally.

Proofs of the stochastic bounds
Proof of Lemma 6.
Step 1. Proof of (130). Assumption (129) and the stationarity and periodicity of f imply that for T ≤ 1 In the last estimate we have used that the sum in the second line is a Riemann sum approximation to the integral e −2k 4 1 −2k 2 2 |k 1 | −λ 1 |k 2 | − λ 2 2 dk which converges due to λ 1 , λ 2 2 < 1.
uniformly in T and ε, it suffices to establish (317) for ε ≤ 1 4 T . We then write and have thereby reduced (317) (and hence (131)) to establishing that By scaling (recalling that ψ T (x 1 , , it suffices to show this bound for T 2 = 1, in which case it turns into , which is immediate for a Schwartz kernels ψ 1 and ψ ′ . Proof of Lemma 7. By stationarity and (128) we may write As the left hand side of this expression is real valued, the imaginary part of the sum of the right hand side also has to vanish. As bothĈ andψ ε are real valued this means that we can replaceĜ(·, a 0 ) by its real part (given in (132)) thereby yielding (134).
Proof of Lemma 8.
Step 2. For the solution of where we have set for abbreviation N 0 := sup T ≤1 (T By approximation through (standard) convolution, which preserves (338) and does increase N 0 , we may assume that f and v are smooth. By definition of the convolution (·) t we have Hence we obtain by (18) for all T ≤ 1 Integrating over t ∈ (0, T ) we obtain (339) by the triangle inequality.
which together with (351) implies that the convergence holds in L 1 .
The bounds (355) and (357) are provided in the discussion following Equation (295) in [3] (up to the parabolic scaling which can be included in the same way as in the following argument). Here we only present the proofs for (356) and (358) which follow along similar lines. First of all, in order to bound d(0, x) α |∂ m ω − ϕ ′ T * ∂ m ω|dx we make use of the triangle inequality in the form |∂ m ω − ϕ ′ T * ∂ m ω| ≤ |∂ m ω| + |ϕ ′ T * ∂ m ω|. The integral resulting from the first term then already has the desired form. For the second term, we write |ϕ ′ T * ∂ m ω(x)| ≤ |ϕ ′ T (x − y)∂ m ω(y)|dy and use the triangle inequality once more, this time in the form d(0, x) α ≤ d(0, x − y) α + d(0, y) α . Hence, it remains to bound the two integrals To obtain (358), similar to [3] we obtain the pointwise bound We recall their argument (adjusted to the case of parabolic scaling): First, according to (349) ϕ ′ integrates non-constant monomials of (parabolic) degree < p to zero which permits us to write (ϕ ′ T * ω − ω)(x) = ω(x + z) − m par <p 1 m 1 !m 2 ! ∂ m ω(x)z m ϕ ′ T (−z)dz. At this point we seek to apply Taylor's formula, but unlike [3] we need an anisotropic version of the error term. In order to formulate this we define for m = (m 1 , m 2 ) Then bounding |z m | ≤ d(0, z) m par and observing that the combinatorial pre-factor satisfies 1 (m 1 +m 2 −1)! m 1 +m 2 m 1 ≤ 2 and dropping (1 − s) m 1 +m 2 −1 ≤ 1 the claimed expression (359) follows.