Quasilinear SPDEs in divergence-form

We develop a solution theory in H\"older spaces for a quasilinear stochastic PDE driven by an additive noise. The key ingredients are two deterministic PDE Lemmas which establish a priori H\"older bounds for an equation with irregular right hand side written in divergence form. We apply these deterministic bounds to the case of a noise term which is white in time and trace class in space to obtain stretched exponential bounds for the H\"older semi-norms of the solution for the stochastic equation.


Introduction
We are interested in the quasi-linear equation with unknown u : R t × R d x → R and for A : R d → R d which is uniformly elliptic, see (11) and (14) below for precise assumptions. The right hand side ξ represents an irregular Gaussian stochastic noise term which is white in time and coloured in space. We show the existence and uniqueness of solutions to (1) as well as a stretched exponential moment bound on a spacetime Hölder semi-norm for ∇u under a suitable condition on the spatial covariance operator for ξ.
Quasi-linear stochastic PDE such as (1) have been treated since the 70s, important contributions include [9,6] see also [10] for a more recent presentation. These works rely on the theory of monotone operators and yield solutions which satisfy sup 0≤t≤Tˆu 2 (t, x)dx +ˆT 0ˆ| ∇u(t, x)| 2 dxdt < ∞ for all T < ∞ almost surely. In fact, these methods allow for much more general equations: generalisations include on the one hand more general nonlinear operators, e.g. degenerate cases such as the porous medium equation, but in particular also the case of multiplicative noise, i.e. the right hand side ξ is replaced by σ(u)ξ and the time integral of this product is interpreted as a stochastic integral, the latter requiring to introduce stochastic machinery such as filtrations, adapted processes etc.
In this article we restrict ourselves to the case of non-degenerate A and additive noise and develop a regularity theory in Hölder spaces. Restricting ourself to additive noise enables us to avoid stochastic integrals and use purely deterministic arguments. The challenge then becomes to develop a theory for (1) where the right hand side is only controlled in a low regularity 1 norm. For this we will only access ξ through the solution of the linear stochastic heat equation and rewrite (1) as To stress the divergence form of the right hand side, we relabel those terms and write for j = ∇g. In Lemmas 1 and 2 below we establish optimal interior a priori Hölder bounds for ∇u in (4) in terms of the parabolic space-time Hölder norms [∇v] α and [j] α (parabolic Hölder norms are defined in (10) and (16) below), first in Lemma 1 for a small α 0 using the celebrated De Giorgi-Nash Theorem, and then in Lemma 2 for arbitrary α. We do not assume time regularity for v and thus, although we have the optimal α regularity for ∇u, we do not get the matching 1+α 2 temporal regularity for u. This corresponds exactly the fact that in our stochastic application the right hand side ξ is white in time and we cannot expect to get more than the Brownian temporal regularity 1 2 − for u. It turns out however, that u − v is better behaved and we are able to control the full parabolic 1 + α Hölder semi-norm [u − v] 1+α .
Our main result, Theorem 1, illustrates an application of these bounds to construct solutions in the random case. In order to avoid having to deal with the large scale behaviour of solutions we restrict ourselves to the simplest possible setting and impose that the noise ξ is 1-periodic in all spatial direction and additionally compactly supported in time, say on the interval t ∈ [0, 1]; we then construct solutions u which satisfy u |t≤0 = 0. Theorem 1. Let A be uniformly elliptic with ellipticity contrast λ and let DA be Lipschitz continuous with constant Λ (in the sense of (11) and (14) below). Let α 0 = α 0 (d, λ) be as in Lemma 1. Let v be given by (23) for a covariance operator K satisfying (24) for some s > d. Then for almost all realisations of v, there exists a unique u = u(t, x) with the following properties: • u is continuous, 1-periodic in all spatial directions (i.e. u(t, x) = u(t, x + k) for all k ∈ Z d ) and u |t≤0 = 0.
where · denotes the expectation with respect to the probability distribution of v.
Several higher regularity results for quasilinear SPDE were derived in the last years: both [3] and [1] considered parabolic equations with a uniformly elliptic leading term with only measurable coefficients and a gradient-dependent noise coefficient. They derived stochastic L p bounds on the space-time L ∞ norm of solutions as well as a stochastic Harnack inequality. In [4] a stochastic porous medium equation driven by a multiplicative noise of the form N k=1 f k u k • dβ k was analysed and using a transformation which removes the noise term of this particular structure, uniform continuity of solutions was shown. The recent work [2], where a Hölder theory for the quasilinear stochastic PDE is developed, is probably closest to the analysis presented here. The key idea of their analysis is to consider first the auxiliary equation and to then use the De Giorgi-Nash Theorem to get an a priori bound on the remainder w = u − z. We pursue a somewhat similar strategy, and work with the equation for w = u − v. However, (1) is more non-linear than (6) and we cannot work with the equation for w directly. Instead, we linearise it by taking spatial differences, see (17) below.
In a previous version of this work, see [8], we treated a quasilinear equation whereξ is a space-time white noise over R t × R x , and derived a stretched exponential moment bound akin to (5) on the Hölder semi-norms [u] α . The results in the present article contain this result, up to the different treatment of large scales. Indeed, specialising (1) to the case d = 1 and differentiating with respect to x yields forū = ∂ x u ∂ tū − ∂ 2 x A(ū) = ∂ x ξ, which coincides with (7), noting that our assumptions on ξ cover the case whereξ = ∂ x ξ is a space-time white noise in one spatial dimension, and that in the one-dimensional case our assumptions on A coincide with the assumptions imposed on π in [8].

Setting
We are interested in the quasi-linear parabolic equation with a rough right hand side as described by v and j. We present a Schaudertheory where we estimate the solution ∇u by the data (v, j) in the Hölder space C α , always with respect to to the parabolic distance This is slightly different from the standard Schauder theory in C 1,α , which cannot be applied due to the right hand side term ∂ t v that is irregular in time. In fact we shall control the C 1,α -semi norm of w : with [·] α defined in (16). We make two assumptions on the nonlinearity A : R d → R d in form of assumptions on the tensor field given by the derivative matrix DA: • DA is uniformly elliptic in the sense that there exists a constant λ > 0 such that ξ · DA(q)ξ ≥ λ|ξ| 2 and |DA(q)ξ| ≤ |ξ| for all vectors q, ξ.
Here, without loss of generality we normalized the upper bound to unity. We will make use of (11) in the following form: For every spatial shift vector y ∈ R d we will work with the increment operator δ y u(t, x) = u(t, x + y) − u(t, x) and use the chain-type rule Then (11) ensures that for all y we have uniform ellipticity of a y : ξ · a y (t, x)ξ ≥ λ|ξ| 2 and |a y (t, x)ξ| ≤ |ξ| for all (t, x), ξ.
• DA is globally Lipschitz in the sense that there exists a constant Λ < ∞ such that We will make use of (14) in the following form: For any exponent β ∈ (0, 1] we have the following estimate on the level of Hölder norms where [·] β denotes the (parabolic) Hölder semi-norm on space-time and d the parabolic distance, cf (9). The use equation (8) exclusively in the following form: We apply the increment operator δ y to it and obtain by (12) We establish our form of C 1,α -Schauder theory, cf Corollary 1, in two lemmas. While the Lemma 1 just relies on the uniform ellipticity (11) and crucially uses the C α -a priori estimate of De Giorgi and Nash, Lemma 2 uses also the Lipschitz continuity (14) and proceeds by a Schauder-type argument.
provided we already have the qualitative information that the left hand side is finite.
Let α 0 be as in Lemma 1 and suppose that L is so small that where P R := (−R 2 , 0) × B R denotes the (centered) parabolic cylinder of size R and [·] β,P R the β-Hölder semi-norm restricted to this set. Then we have for any exponent α ∈ [α 0 , 1) Corollary 1. Let α 0 be as in Lemma 1. Then we have for any exponent α ∈ (0, 1) The example that we have a mind is the case where the right hand side is a stochastic noise which is white in time but coloured in space. Such a noise term is described by a Gaussian random distribution ξ over (t, x) ∈ R × R d , whose probability distribution is characterized by having zero mean and where the spatial correlation K is given by the kernel of a regularising operator. We denote by v the solution of the constant-coefficient heat equation Under suitable conditions on the kernel K it is known that ∇v is regular enough, i.e. α-Hölder continuous, to apply the deterministic theory. We illustrate this in the simplest possible case, where ξ is assumed to be 1periodic in all spatial directions and in addition localised to a compact time interval, say the interval [0, 1]. If we assume in addition that the the probability distribution of ξ is translation invariant in the spatial directions we have the following convenient Fourier series representation where the β k are complex valued standard Brownian motions (i.e. real and imaginary parts are independent and satisfy R(β k (t)) 2 = I(β k (t)) 2 = t √ 2 ), which are independent up to the constraint β k = β −k , which assures that ξ is real-valued, andβ k (t) stands for the distributional time derivative.
The almost sure convergence of (22) in the space of distributions can be shown relatively easily, but we adopt the slightly simpler framework to only work with v which we define by its Fourier series representation In order to ensure that the gradient is well behaved we impose that there exists s > d such that for k ∈ (2πZ) d where we have set the normalisation equal to 1 without loss of generality. Incidentally, this condition on s says precisely that the spatial covariance operator K is of trace class. Then we have the following Lemma.
where · represents the expectation with respect to the probability distribution of v.

Proof of Theorem 1
We prove a slightly stronger statement than announced in the Theorem: We assume we are given continuous functions v and j with [∇v] α , [j] α < ∞ for an α ∈ (0, 1), which are 1-periodic in each spatial direction and with v |t≤0 = j |t≤0 = 0. We show that there exists a unique function u which is one-periodic in each spatial direction, satisfies u t≤0 = 0 and which satisfies for each Schwartz function ϕ. In addition, we have the bound The desired statement then follows, by applying this to the case where v is given by (23), j = ∇v and invoking Lemma 3. From now on, all functions u, v, w etc. appearing in the proof are assumed to be one-periodic in all space directions. Under this periodicity assumption the weak formulation (25) can be restated equivalently by replacing the space integrals over R d by integrals over [0, 1] d and assuming that the test functions ϕ are also periodic.
The existence of solutions follows by approximation through regularisation.
α and such that v ε|t≤−ε = j ε|t≤−ε = 0. Then by classical theory there exists a unique classical solution u ε for which is one-periodic in all spatial directions (see e.g. [7,Thm. 12.14] for a proof in the case of Dirichlet data on a bounded spatial domain. The case of the torus is only simpler). In this situation Corollary 1 applies and yields This estimate together with the initial datum u ε|t=−ε = v ε|t=−ε = 0 yields enough compactness to conclude that up to choosing a subsequence in the distributional sense. Setting u = w+v we obtain (25) and the estimate (26) follows by passing to the limit in (27) using lower semi-continuity.
It only remains to argue for uniqueness. Assume thus that u 1 and u 2 are one-periodic in space, satisfy (25) and vanish for t ≤ 0. Thus the difference in the distributional sense and δu |t=0 = 0. In order to show that δu = 0 we aim to test (28) equation against δu to obtain the identity for all T ≥ 0. Once the identity (29) is justified, we can invoke the uniform ellipticity (13) once more and obtain the point-wise identity so that (29) yields δu = 0.
It thus remains to justify (29). For this we convolve (29) with a temporal regularising kernel at scale ε and then test against δu ε , the temporally regularised version of δu. This yields for any T > 0 We can pass to the limit ε → 0 on both sides using the fact that δu = ( 2 -Hölder in time and using the fact that ∇u 1 and ∇u 2 are α 2 Hölder in time.

Proof of Lemma 1
Based on (17) and (13) we have by a localized version of the Hölder a priori estimate of De Giorgi and Nash that there exists an exponent α 1 = α 1 (d, λ) ∈ (0, 1) such that for all shift vectors y, all length scales ℓ and all space-time points z where means ≤ C(d, λ, α 0 ), where P ℓ (z) = (t − ℓ 2 , t) × B ℓ (x) denotes the parabolic cylinder centered around z = (t, x), and where · P ℓ (z) stands for the supremum norm restricted to the set P ℓ (z). The exponents of the ℓ-factors in (31) are determined by scaling; smuggling in the constant k is possible since (13) is oblivious to changing δ y w by an additive constant. We refer to [7, Theorem 6.28] as one possible reference (with b ≡ 0, c 0 ≡ 0, and g ≡ 0 so that k 1 = sup Q(R) |f | in the notation of that reference). We fix an exponent α 0 ∈ (0, α 1 ) and take the supremum of (31) over all shift vectors y with |y| ≤ r for some r ≤ ℓ We first estimate the right hand side terms of (32). We start with the second right hand side term: From the definition (16) of the Hölder semi-norm and that of the parabolic cylinder, we obtain a y ∇δ y v + δ y j P 2ℓ (z) We now turn to the first right hand side term of (32): We first note that where the right hand side infimum ranges over all c ∈ R d . Indeed, passing tow(t, x) = w(t, x) − c · y, so that ∇w − c = ∇w, and transformingk = k − c · y, so that δ y w − k = δ yw −k, we see that (34) reduces to δ yw P 2ℓ (z) ≤ |y| ∇w P 3ℓ (z) , which because of |y| ≤ r ≤ ℓ is a consequence of the mean-value theorem. Since obviously inf c ∇w − c P 3ℓ (z) ≤ (3ℓ) α 0 [∇w] α 0 , we obtain We finally turn to the left hand side term in (32) and note We now argue that we are done once we establish the norm equivalence Indeed, choosing ℓ = M r with M ≥ 1 to be fixed presently, we take the supremum of (37) over all radii r and all space-time points z to arrive at sup z,r By the triangle inequality in [·] α 0 we post-process this to Since by our qualitative assumption of [∇u] α 0 < ∞, and since α 0 < α 1 , we may choose M = M (d, λ, α 0 ) so large that this turns into the desired (18).
We now turn to the norm equivalence (38); the elements of the argument are standard in modern Schauder theory, in the spirit of [5, Theorem 3.3.1]. By rotational symmetry, it is enough to establish Let k = k(y, r, z) denote the optimal constant in the right hand side of (39). We first argue that for arbitrary but fixed point z, we have for all radii r |k(2re 1 , 2r, z) − 2k(re 1 , r, z)| N r 1+α 0 .

Proof of Lemma 2
Let the two scales r ≤ ℓ ≤ L 4 be arbitrary and for the time being fixed. Let y be an arbitrary shift vector with |y| ≤ r. By (15) in the localized form of [a y ] α 0 ,P 3ℓ ≤ Λ[∇u] α 0 ,P 3ℓ+r and (19) we have where stands for ≤ C(d, λ, Λ, α 0 , α). In conjunction with (13) we see that we may apply standard C 1,α 0 -Schauder theory to the parabolic operator ∂ t − ∇ · a y ∇ when localized to P 3ℓ . We see from rescaling according to (t, x) = (ℓ 2t , ℓx) that (42) is exactly the control on the coefficient needed so that the constant in this localized Schauder theory is of the desired form C(d, λ, Λ, α 0 , α). We refer to [7,Theorem 4.8] for a possible reference (with b ≡ 0, c ≡ 0, g ≡ 0 in the notation of that reference). We apply this to the increment δ y w, cf (17), to the effect of We first argue that we may upgrade (43) to The first ingredient in passing from (43) to (44) is the following elementary interpolation estimate where (·) r denotes convolution on scale r in the spatial variable. Here comes the argument for (45) where without loss of generality we may assume r = 1 and restrict to estimating the first component ∂ 1 w of the gradient. Given (t, x) ∈ P 1 this follows from combining so that c in (45) is given by (δ e 1 w) 1 (0, 0). The second ingredient in passing from (43) to (44) is In order to see this we apply the spatial convolution operator (·) r to (17) to the effect of ∂ t (δ y w) r = ∇ · (a y ∇δ y w + a y ∇δ y v + δ y j) r .
We now address the right hand side terms of (44). In view of (34) (slightly modified) we have for the first right hand side term We now turn to the second right hand side term of (44) and note that by (42) and (13) [a∇δ y v + δ y j] α 0 ,P 3ℓ ≤ [a] α 0 ,P 3ℓ ∇δ y v P 3ℓ + a [∇δ y v] α 0 ,P 3ℓ + [δ y j] α 0 ,P 3ℓ While obviously we need a little argument to see Indeed, let us focus on j; given two points z, z ′ in P 3ℓ we write δ y j(z) − δ y j(z ′ ) in the two ways of (j(z + (0, y)) − j(z)) −(j(z ′ + (0, y)) − j(z ′ )) and (j(z + (0, y)) − j(z ′ + (0, y))) −(j(z ′ ) − j(z)) to see that (because of |y| ≤ r ≤ ℓ) and thus as desired Inserting (49) and (50) into (48) we obtain Inserting (47) Relabelling 4ℓ by ℓ we obtain for all r ≤ ℓ ≤ L By the triangle inequality in · and by sup r≤L r −α inf c ∇v − c Pr ≤ [∇v] α,P L this may be upgraded to Slaving ℓ to r via ℓ = M r for some M ≥ 1 to be fixed presently, we obtain from distinguishing the ranges r ≤ L M and L M ≤ r ≤ L that sup Clearly, the first right hand side term is controlled as follows Hence fixing an M = M (d, λ, Λ, α 0 , α) sufficiently large, we may absorb the second right hand side term in (52) into the left hand side to obtain For this, we do not need to know beforehand that the left hand side side is finite, since (52) also holds when the two suprema are restricted to ǫ ≤ r ≤ L and ǫ ≤ ℓ ≤ L for any ǫ > 0, which is finite since ∇u is in particular assumed to be continuous. Hence we obtain (53) with supremum restricted to ǫ ≤ r ≤ L, in which we now may let ǫ ↓ 0 to recover the form as stated in (53). By the standard norm equivalence and shifting the origin into an arbitrary z ∈ P L , we obtain (20) from (53).

Proof of Corollary 1
In view of Lemmas 1 and (2) and the definition (10) of the C 1,α -semi-norm, it remains to shows that for all spatial points x and times t and t ′ , the difference w := u − v satisfies To this purpose, we rewrite (8) as ∂ t w = ∇ · (A(∇u) + j) to which we apply spatial convolution on scale r to be fixed later. This yields the estimate |t − t ′ | r 1−α . We may take the convolution kernel φ r to be symmetric, so that in particular w r (t, x) =´φ r (x − y)(w(t, y) − ∇w(t, x) · (y − x))dy, to the effect of The last two estimates combine to Optimizing through the choice of r = √ t yields (54).

Proof of Lemma 3
Throughout the proof we fix j ∈ {1, . . . , d} and set h = ∂ j v. We aim to show that for C large enough and α < min{ s−d 2 , 1} We assume without loss of generality that s−d 2 < 1. First we recall that by definition v and h are 1-periodic in each spatial direction and v(t, x) = h(t, x) = 0 for t ≤ 0. Furthermore for t > 1 h solves We thus aim to establish The core stochastic ingredient for the proof of (55) is the following bound on second moments of increments of h: The argument for (56) is based on the following Fourier representation for h: For t ∈ [0, 1] and x ∈ R d we get by differentiating (23) with respect to x j which leads to the expression valid for t ′ ≤ t. In order to deduce (56), we use the triangle inequality and treat the cases t = t ′ , x = x ′ and t = t ′ , x = x ′ separately. In the first case we get Now using the simple estimates |k j | 2 2|k| 2 ≤ 1 2 , |1−e ik·(x−x ′ ) | ≤ min{2, |k·(x−x ′ )|} as well as 1 − e −(t+t ′ )|k| 2 ≤ 1, and recalling condition (24) onK this turns into the estimate where means ≤ C. In the same way we get by specialising (57) to x = x ′ and treating the case

Now using again
and using (24) once more this turns into and thus (56) follows.
We now apply Kolmogorov's continuity theorem to h; for the convenience of the reader we give a self-contained argument. We first appeal to Gaussianity to post-process (56), which we rewrite as 1 for a given scale R. By Gaussianity of h we can upgrade this estimate to Thus proving the desired estimate (55) on Gaussian moments of the local Hölder-norm [h] ′ α amounts to exchanging the expectation and the supremum over (t, x), (s, y) in (58) at the prize of a decreased Hölder exponent α < s−d 2 . To this purpose, we now argue that for α > 0, the supremum over a continuum can be replaced by the supremum over a discrete set: For R < 1 we define the grid By density, we may assume that (t, x), (s, y) ∈ r 2 Z×rZ d for some dyadic r = 2 −N < 1 (this density argument requires the qualitative a priori information of the continuity of h, which can be circumvented by approximating h).
For every dyadic level n = N, N − 1, · · · we now recursively construct two sequences (t n , x n ) (s n , y n ) of space-time points, starting from (t N , x N ) = (t, x) and (s N , y N ) = (s, y), with the following properties a) they are in the corresponding lattice of scale 2 −n , i. e. we have (t n , x n ), (s n , x n ) ∈ (2 −n ) 2 Z × 2 −n Z d , b) they are close to their predecessors in the sense of |t n − t n+1 |, |s n − s n+1 | ≤ 3(2 −(n+1) ) 2 and |x n,i − x n+1,i |, |y n,i − y n+1,i | ≤ 2 −(n+1) , where x n,i , x n+1,i , . . . denote the i-component of x n , x n+1 , . . .. So by definition of Θ we have |h(t n , x n ) − h(t n+1 , x n+1 )| Θ(2 −(n+1) ) α , |h(s n , y n ) − h(s n+1 , y n+1 )| Θ(2 −(n+1) ) α , and c) such that |t n − s n | and |x n − y n | are minimized among the points satisfying a) and b). Because of the latter, we have Equipped with (59), we now may upgrade (58) to (55). Indeed, (59) can be reformulated on the level of characteristic functions as where as in (59) R runs over all 2 −N for integers N ≥ 1. Replacing the suprema by sums in order to take the expectation, we obtain We now appeal to Chebyshev's inequality in order to make use of (58): where in the second step we have used that the number of pairs (t, x), (s, y) of neighboring lattice points is bounded by C 1 R 2+d and in the last step we have used that stretched exponential decay (recall s − d − 2α > 0) beats polynomial growth. The last estimate immediately yields (55).