Quasi-linear SPDEs in divergence form

We develop a solution theory in Hölder spaces for a quasi-linear stochastic PDE driven by an additive noise. The key ingredients are two deterministic PDE lemmas which establish a priori Hölder bounds for a parabolic equation in divergence form with irregular right-hand-side term. We apply these bounds to the case of a right-hand-side noise term which is white in time and trace class in space, to obtain stretched exponential bounds for the Hölder semi-norms of the solution.


Introduction
We are interested in the quasi-linear equation with unknown u : R t × R d x → R for a nonlinearity A : R d → R d that is uniformly elliptic. The right hand side ξ represents an irregular distribution; the key example we have in mind is a noise term which is "white in time" and "coloured in space". The aim of this article is to develop a priori bounds in Hölder spaces leading to a solution theory for (1).
The regularity of the noise terms appearing in stochastic differential equations is often effectively measured on the Hölder scale. This is well known in the finitedimensional case, the most classical example being Brownian motion, which has (locally) α-Hölder continuous trajectories for any α < 1 2 . Statements in other scales of spaces, e.g. in L 2 -based fractional Sobolev spaces are possible but are weaker: Brownian trajectories almost surely take values in H α loc for α < 1 2 , but this does not even imply the continuity of trajectories. It thus seems natural to seek a solution theory in Hölder spaces also for stochastic partial differential equations.
In the case of semi-linear equations such a theory is by now classical and welldeveloped, see e.g. [1,3,11]. For example, in the case of the stochastic heat equation the variation-of-constants formula leads to an explicit representation of v in terms of the heat kernel (the so-called "mild solutions") which can be used to deduce optimal Hölder bounds. This approach extends to equations with lower-order non-linearities such as stochastic reaction-diffusion equations or the stochastic Navier-Stokes equation.
In the case of the quasi-linear equations we consider, there is no natural mild formulation of the equation. However, equations such as (1) have been treated since the 70's (see e.g. the classical works [5,9] or [10] for a more recent presentation) using a "variational formulation", which relies on the theory of monotone operators and yields solutions that satisfy for all T < ∞ almost surely. In fact, these methods allow for much more general equations; generalisations include degenerate cases such as the porous medium equation.
The aim of the present article is to demonstrate how purely deterministic PDE arguments can be used to improve on the energy inequality (3) and obtain estimates on space-time Hölder norms of ∇u. Our main deterministic result, Corollary 1, states, roughly speaking, that we can bound the (parabolic) Hölder semi-norm [∇u] α for solutions of u of (1) in terms of the corresponding semi-norm [∇v] α for solutions of the linear problem (2). The proof splits into Lemma 1 where this bound is established for a small α 0 using the celebrated De Giorgi-Nash Theorem, and into Lemma 2 where it is upgraded to arbitrary α by Schauder theory. The techniques employed follow classical PDE arguments, as developed for example in [6,7], but they have to be adjusted to the low-regularity right hand side.
To illustrate the implications of our deterministic result in the case of random ξ , we treat the case where ξ is a Gaussian distribution that is white in time and coloured in space. This type of noise is commonly studied in the literature, often using the "differential" notation where W is a Wiener process with spatial covariance operator K . Our assumption on ξ corresponds to saying that K is a trace-class operator, which is precisely the assumption needed in the variational approach. We restrict ourselves to the case where ξ is periodic and compactly supported in time. This assumption is made to yield bounds on ∇v which hold uniformly over space and time. The only stochastic ingredient of this article is Lemma 3, where Gaussian moments for [∇v] α are established, using the covariance of ξ and its Gaussianity. Theorem 1 combines the main deterministic result, Corollary 1, and Lemma 3 to construct spatially periodic solutions u with zero initial data, i.e. u |t≤0 = 0. We establish existence and uniqueness of solutions to (1) in Theorem 1, as well as stretched exponential moments for [∇u] α . Our result is closely related to the recent work [2], where a Hölder theory for the quasi-linear stochastic PDE is developed. The first step of that work is to consider the auxiliary equation The authors use some a priori information in the spirit of the energy estimate (3) as well as martingale inequalities to get a priori control on ∇v.
The key observation in their approach is that this a priori control on v allows to rewrite the equation for the remainder w = u − v as and to obtain Hölder regularity for w using the De Giorgi-Nash Theorem. We pursue a similar strategy, and work with the equation for w = u − v. However, (1) is more nonlinear than (5) and the classical PDE results presented in [6,7] do not immediately apply in this low-regularity situation. Our main deterministic result, Corollary 1, provides the necessary bound. In a previous version of this work, see [8], we treated a quasi-linear equation whereξ is a space-time white noise over R t × R x , and derived a stretched exponential moment bound akin to (24) on the Hölder semi-norms [u] α . The results in the present article contain this result, up to the different treatment of large scales. Indeed, specialising (1) to the case d = 1 and differentiating with respect to x yields for which coincides with (6), noting that our assumptions on ξ cover the case wherē ξ = ∂ x ξ is a space-time white noise in one spatial dimension, and that in the onedimensional case our assumptions on A coincide with the assumptions imposed on π in [8]. The key difference between the approach proposed in [8] and the approach we present here is that the core arguments are now purely deterministic and the use of log-Sobolev inequalities can be fully avoided.

Setting
For the deterministic part of our paper we rewrite the noise term ξ as ∂ t v − v where v solves (2). We thus strive to get bounds of solutions to in terms of v. Here and throughout the paper we interpret equations in the distributional sense over all of R t × R d x . In order to stress the divergence form of the right hand side, we relabel those terms and write for j = −∇v. We present a Schauder theory where we estimate the solution ∇u by the data (v, j) in the Hölder space C α , always with respect to to the parabolic distance This is slightly different from the standard Schauder theory in C 1,α , which cannot be applied due to the right-hand-side term ∂ t v that is irregular in time. In fact we shall control the C 1,α -semi norm of w : where [·] α denotes the (parabolic) Hölder semi-norm on space- We make two assumptions on the nonlinearity A : R d → R d in form of assumptions on the tensor field given by the derivative matrix D A: Assumption 1 D A is uniformly elliptic in the sense that there exists a constant λ > 0 such that Here, without loss of generality we normalized the upper bound to unity.
We will make use of (12) in the following form: For every spatial shift vector y ∈ R d we will work with the increment operator δ y u(t, x) = u(t, x + y) − u(t, x) and use the chain-type rule where Then (12) ensures that for all y we have uniform ellipticity of a y : η · a y (t, x)η ≥ λ|η| 2 and |a y (t, x)η| ≤ |η| for all (t, x), η.
Assumption 2 D A is globally Lipschitz in the sense that there exists a constant < ∞ such that We will make use of (15) in the following form: For any exponent β ∈ (0, 1] we have the following estimate on the level of Hölder norms We use Eq. (8) exclusively in the following form: We apply the increment operator δ y to it and obtain by (13) ∂ t δ y u − ∇ · a y ∇δ y u = ∂ t δ y v + ∇ · δ y j, which in terms of the difference w := u − v we rewrite as ∂ t δ y w − ∇ · a y ∇δ y w = ∇ · (a y ∇δ y v + δ y j).
We establish our form of C 1,α -Schauder theory, cf. Corollary 1, in two lemmas. While the Lemma 1 just relies on the uniform ellipticity (12) and crucially uses the C α -a priori estimate for δ y w of De Giorgi and Nash based on (17), Lemma 2 uses also the Lipschitz continuity (15) and proceeds by a more standard Schauder-type argument.

Lemma 1 There exists an exponent
provided we already have the qualitative information that the left hand side is finite.
The critical point in the proof of Lemma 1 is that we extract control of [∇w] α 0 (and thus [∇u] α 0 ) from (17) without having to pass to the limit in the difference (quotient) δ y , which is not possible due to the low regularity of ∇v.

Lemma 2 Let α 0 be as in Lemma 1 and suppose that L is so small that
where P R := (−R 2 , 0) × B R denotes the (centered) parabolic cylinder of size R and [·] β,P R the β-Hölder semi-norm restricted to this set. Then we have for any exponent α ∈ [α 0 , 1) Corollary 1 Let α 0 be as in Lemma 1. Then we have for any exponent α ∈ (0, 1) To illustrate an application of Corollary 1, we treat the case where the right hand side is a stochastic noise which is white in time but coloured in space. Such a noise term is described by a Gaussian random distribution ξ over (t, x) ∈ R × R d , the probability distribution of which is characterized by having zero mean and where (ξ, ϕ) stands for ξ tested against the Schwartz function ϕ ∈ S(R × R d ) and · is used for the expectation of a random variable. The spatial correlation K can be seen as the kernel of a regularising operator. Such a noise term is standard in the SPDE literature, often written in "differential notation" as where W is an L 2 -valued Wiener process with covariance operator K , see e.g. [1,Sect. 5]. Denote by v the solution of the constant-coefficient heat Eq. (2). Under suitable conditions on the kernel K it is known that ∇v is regular enough, i.e. α-Hölder continuous, to apply the above deterministic theory. As illustration we treat the case where ξ is assumed to be 1-periodic in all spatial directions, say of period 1, and in addition localised to a compact time interval, say the interval [0, 1]. If we assume in addition that the probability distribution of ξ is translation invariant in the spatial directions, so that K (x, y) = K (x − y), we have the following convenient Fourier series representation Here the β k are complex-valued standard Brownian motions (i.e. real and imaginary parts are independent and satisfy R(β k (t)) 2 ), that are independent up to the constraint β k = β −k , which ensures that ξ is real-valued, andβ k (t) stands for the distributional time derivative. TheK (k) are real-valued, non-negative and symmetric in the sense thatK (k) =K (−k). The almost sure convergence of (21) in the space of distributions can be easily shown, but we adopt the slightly simpler framework to only work with v, which we define by its Fourier series representation: In order to ensure that the gradient is well behaved we impose that there exists where we have set the normalisation equal to 1 without loss of generality. Incidentally, this condition on s precisely says that the spatial covariance operator K is of trace class. Then we have the following lemma.
where · denotes the expectation with respect to the probability distribution of v.
Combining our main deterministic result, Corollary 1, with the stochastic result in Lemma 3 we arrive at the following theorem. (12) and (15)). Let α 0 = α 0 (d, λ) be as in Lemma 1. Let v be given by (22) for a covariance operator K satisfying (23) for some s > d. Then for almost all realisations of v, there exists a unique u = u(t, x) with the following properties:

Theorem 1 Let A be uniformly elliptic with ellipticity contrast λ and let D A be Lipschitz continuous with constant (in the sense of
where · denotes the expectation with respect to the probability distribution of v.

Proof of Theorem 1
Throughout this proof we use the symbol for ≤ C(d, λ, , α, s). All functions u, v, w etc. appearing in the proof are assumed to be one-periodic in all space directions.
We assume we are given continuous functions v and j with [∇v] α , [ j] α < ∞ for an α ∈ (0, 1), which are 1-periodic in each spatial direction and with v |t≤0 = j |t≤0 = 0. We show that there exists a unique function u which is one-periodic in each spatial direction, satisfies u t≤0 = 0 and which satisfies for each Schwartz function ϕ. In addition we show the bound The desired existence and uniqueness statement then follows, by applying this to the case where v is given by (22), j = −∇v. For (24) we combine (26) and Lemma 3 to get for a suitable C = C(d, λ, , α, s) The existence of solutions follows by approximation through regularisation. Let j ε , v ε be space-time regularisations (e.g. by convolution with suitable smooth kernel) of Then by classical theory there exists a unique classical solution u ε for which is one-periodic in all spatial directions (see e.g. [7,Thm. 12.14] for a proof in the case of Dirichlet data on a bounded spatial domain. The case of the torus is only simpler). In this situation Corollary 1 applies and yields This estimate together with the initial datum u ε|t=−ε = v ε|t=−ε = 0 permit to apply the Arzelà-Ascoli Theorem and to conclude that up to choosing a subsequence u ε − v ε → w, ∇(u ε − v ε ) → ∇w, ∇u ε → ∇u locally uniformly for functions u, w with in the distributional sense. Setting u = w + v we obtain (25) and the estimate (26) follows by passing to the limit in (27) using lower semi-continuity. It only remains to argue for (pathwise) uniqueness. Assume thus that u 1 and u 2 are one-periodic in space, satisfy (25) and vanish for t ≤ 0. Thus the difference δu := u 1 − u 2 satisfies in the distributional sense and δu |t=0 = 0. In order to show that δu = 0 we aim to test Eq. (28) against δu to obtain the identity for all T ≥ 0. Once the identity (29) is justified, we can invoke the uniform ellipticity (14) once more and obtain the point-wise identity so that (29) yields δu = 0. It thus remains to justify (29). For this we convolve (28) with a temporal regularising kernel at scale ε and then test against δu ε , the temporally regularised version of δu.
Here we use the fact that under the periodicity assumption the weak formulation (25) can be restated equivalently by replacing the space integrals over R d by integrals over [0, 1] d and assuming that the test functions are also periodic. This yields for any T > 0 We can pass to the limit ε → 0 on both sides using the fact that δu = (u 1 −v)−(u 2 −v) is 1+α

Proof of Lemma 1
Throughout this proof we write for ≤ C(d, λ, α 0 ). Based on (17) and (14) we have by a localized version of the Hölder a priori estimate of De Giorgi and Nash that there exists an exponent α 1 = α 1 (d, λ) ∈ (0, 1) such that for all shift vectors y, all length scales and all space-time points z where P (z) = (t − 2 , t) × B (x) denotes the parabolic cylinder centered around z = (t, x), and where · P (z) stands for the supremum norm restricted to the set P (z). The exponents of the -factors in (31) are determined by scaling; smuggling in the constant k is possible since (14) is oblivious to changing δ y w by an additive constant. We refer to [7, Theorem 6.28] as one possible reference (with b ≡ 0, c 0 ≡ 0, and g ≡ 0 so that k 1 = sup Q(R) | f | in the notation of that reference). We fix an exponent α 0 ∈ (0, α 1 ) and take the supremum of (31) over all shift vectors y with |y| ≤ r for some r ≤ We first estimate the right-hand-side terms of (32). We start with the second righthand-side term: From the definition (11) of the Hölder semi-norm and that of the parabolic cylinder, we obtain a y ∇δ y v + δ y j P 2 (z) We now turn to the first right-hand-side term of (32): We first note that where the right-hand-side infimum ranges over all c ∈ R d . Indeed, passing tõ w(t, x) = w(t, x) − c · y, so that ∇w − c = ∇w, and transformingk = k − c · y, so that δ y w − k = δ yw −k, we see that (34) reduces to δ yw P 2 (z) ≤ |y| ∇w P 3 (z) , which because of |y| ≤ r ≤ is a consequence of the mean-value theorem. Since obviously inf c ∇w − c P 3 We finally turn to the left hand side term in (32) and note Inserting (33), (35), and (36), into (32) we obtain which we multiply with 1 r 1+α 0 −α 1 to arrive at We now argue that we are done once we establish the norm equivalence Indeed, choosing = Mr with M ≥ 1 to be chosen later, we take the supremum of (37) over all radii r and all space-time points z to arrive at sup z,r By the triangle inequality in [·] α 0 we post-process this to Since by our qualitative assumption of [∇u] α 0 < ∞, and since α 0 < α 1 , we may choose M = M(d, λ, α 0 ) so large that this turns into the desired (18).
We now turn to the norm equivalence (38); the elements of the argument are standard in modern Schauder theory, in the spirit of [4, Theorem 3.3.1]. By rotational symmetry, it is enough to establish Let k = k(y, r , z) denote the optimal constant in the right hand side of (39). We first argue that for arbitrary but fixed point z, we have for all radii r Indeed, based on the telescoping identity δ 2re 1 w = δ re 1 w +δ re 1 w(· + re 1 ) we obtain by the triangle inequality the following additivity of k in the y-variable Likewise, we have that k only mildly depends on the r -variable From the two last estimates, we obtain (40). Since α 0 > 0, we learn from (40) that there exists a constant c 1 (z) such that along a given dyadic sequence of radii r . We insert this into the definition of N to obtain Nr α 0 , from which, since in particular u and thus w is differentiable in the spatial variable, we learn that c 1 (z) = ∂ 1 w(z) so that Since we identified the limit, this now holds for any radius r (and not just the dyadic ones). Given two points z, z we set r := 2d(z, z ), cf. (9), and obtain .

Proof of Lemma 2
Throughout this proof we use for ≤ C(d, λ, , α 0 , α). Let the two scales r ≤ ≤ L 4 be arbitrary and for the time being fixed. Let y be an arbitrary shift vector with |y| ≤ r . By (16) in the localized form of [a y ] α 0 ,P 3 ≤ [∇u] α 0 ,P 3 +r and (19) we have In conjunction with (14) we see that we may apply standard C 1,α 0 -Schauder theory to the parabolic operator ∂ t − ∇ · a y ∇ when localized to P 3 . We learn from rescaling according to (t, x) = ( 2t , x) that (42) is exactly the control on the coefficient needed so that the constant in this localized Schauder theory is of the desired form C (d, λ, , α 0 , α). We refer to [7, Theorem 4.8] for a possible reference (with b ≡ 0, c ≡ 0, g ≡ 0 in the notation of that reference). We apply this to the increment δ y w, cf. (17), to the effect of We first argue that we may upgrade (43) to The first ingredient in passing from (43) to (44) is the following elementary interpolation estimate inf c ∇w − c P r sup |y|≤r r ∂ t (δ y w) r P r + ∇δ y w P r , where (·) r denotes convolution on scale r in the spatial variable. Here comes the argument for (45) where without loss of generality we may assume r = 1 and restrict to estimating the first component ∂ 1 w of the gradient. Given (t, x) ∈ P 1 this follows from combining the following immediate consequences of the mean-value theorem so that c in (45) is given by (δ e 1 w) 1 (0, 0). The second ingredient in passing from (43) to (44) is In order to see this we apply the spatial convolution operator (·) r to (17) to the effect of ∂ t (δ y w) r = ∇ · (a y ∇δ y w + a y ∇δ y v + δ y j) r .
From this representation and r ≤ we obtain the estimate ∂ t (δ y w) r P r α 0 −1 [a y ∇δ y w + a y ∇δ y v + δ y j] α 0 ,P 2 ≤ r α 0 −1 [a y ] α 0 ,P 2 ∇δ y w P 2 + a y P 2 [∇δ y w] α 0 ,P 2 + [a y ∇δ y v + δ y j] α 0 ,P 2 (42), (14) r −1 r α 0 ∇δ y w P 2 + r α 0 −1 [∇δ y w] α 0 ,P 2 + [a y ∇δ y v + δ y j] α 0 ,P 2 , which yields (46) because of r ≤ . Inserting (43) into (46), and the outcome into (45), we obtain (44). We now address the right-hand-side terms of (44). In view of (34) (slightly modified) we have for the first right-hand-side term We now turn to the second right-hand-side term of (44) and note that While obviously we need a little argument to see Indeed, let us focus on j; given two points z, z in P 3 we write δ y j(z) − δ y j(z ) in the two ways of ( j(z + (0, y)) − j(z)) −( j(z + (0, y)) − j(z )) and ( j(z + (0, y)) − j(z + (0, y))) −( j(z) − j(z )) to see that (because of |y| ≤ r ≤ ) and thus as desired Inserting (49) and (50) into (48) we obtain Inserting (47) By the triangle inequality in · and by sup r ≤L r −α inf c ∇v − c P r ≤ [∇v] α,P L this may be upgraded to Slaving to r via = Mr for some M ≥ 1 to be chosen later, we obtain from distinguishing the ranges r ≤ L M and L M ≤ r ≤ L that Clearly, the first right-hand-side term is controlled as follows Hence fixing an M = M(d, λ, , α 0 , α) sufficiently large, we may absorb the second right-hand-side term in (52) into the left hand side to obtain For this, we do not need to know beforehand that the left hand side side is finite, since (52) also holds when the two suprema are restricted to ≤ r ≤ L and ≤ ≤ L for any > 0, which is finite since ∇u is in particular assumed to be continuous. Hence we obtain (53) with supremum restricted to ≤ r ≤ L, in which we now may let ↓ 0 to recover the form as stated in (53). By the standard norm equivalence and shifting the origin into an arbitrary z ∈ P L , we obtain (20) from (53).

Proof of Corollary 1
Throughout the proof, we use as in Lemma 2. By Lemma 1, the hypothesis (19) of Lemma 2 is satisfied provided we fix L = c ([∇v] λ, α) sufficiently small. Hence we obtain from (20) that By translation invariance of our deterministic setting, this persists with P L replaced by the shifted parabolic cylinder P L (z) = z + P L for any point z ∈ R × R d , leading to [∇u] α,P L (z) ([∇v] This yields the desired Hölder estimate on ∇u for points z, z at parabolic distance less than L. For those z, z with d(z, z ) ≥ L we appeal once more to (18) in form of where we used the definition of L in the last step.
It remains to estimate the C 1−α -norm of w := u − v, more precisely, it just remains to estimate the temporal continuity, cf. (10): To this purpose, we rewrite (8) as ∂ t w = ∇ · (A(∇u) + j) to which we apply spatial convolution on scale r to be fixed later. This yields the estimate Form this we deduce We may take the convolution kernel φ r to be symmetric, so that in particular w r (t, x) = φ r (x − y)(w(t, y) − ∇w(t, x) · (y − x))dy, to the effect of The last two estimates combine to Optimizing through the choice of r = √ t yields (54).

Proof of Lemma 3
Throughout this proof we use for ≤ C (α, s, d).
Throughout the proof we fix j ∈ {1, . . . , d} and set h = ∂ j v. We aim to show that for C large enough and α < min{ s−d 2 , 1} We assume without loss of generality that s−d 2 < 1.
First we recall that by definition v and h are 1-periodic in each spatial direction and v(t, x) = h(t, x) = 0 for t ≤ 0. Furthermore for t > 1, h solves We thus aim to establish The core stochastic ingredient for the proof of (55) is the following bound on second moments of increments of h: The argument for (56) is based on the following Fourier representation for h: For t ∈ [0, 1] and x ∈ R d we get by differentiating (22) with respect to x j which for t ≤ t leads to In order to deduce (56), we use the triangle inequality and treat the cases t = t , x = x and t = t , x = x separately. In the first case we get using stationarity in x in the first and the symmetry ofK in the last equality Now using the simple estimates as well as 1 − e −2t|k| 2 ≤ 1, and recalling condition (23) onK this turns into the estimate where we have used our assumption that s − d < 2. In the same way we get by specialising (57) to x = x and treating the case t ≥ t Now using again k 2 j 2|k| 2 ≤ 1 2 as well as |2 − e −2t|k| 2 − e −2t |k| 2 − 2e −(t−t )|k| 2 + 2e −(t+t )|k| 2 | ≤ 4 min{1, |t − t ||k| 2 }, and using (23) once more this turns into and thus (56) follows.
We now apply Kolmogorov's continuity theorem to h; for the convenience of the reader we give a self-contained argument. We first appeal to Gaussianity to postprocess (56), which we rewrite as 1 R s−d (h(t, x) − h(s, y)) 2 1 provided |t − s| ≤ 3R 2 , |x − y| ≤ R for a given scale R. By Gaussianity of h we can upgrade this estimate to exp 1 C R s−d (h(t, x) − h(s, y)) 2 1 Thus proving the desired estimate (55) on Gaussian moments of the local Höldernorm [h] α amounts to exchanging the expectation and the supremum over (t, x), (s, y) in (58) at the prize of a decreased Hölder exponent α < s−d 2 . To this purpose, we now argue that for α > 0, the supremum over a continuum can be replaced by the supremum over a discrete set: For R < 1 we define the grid and claim that By density, we may assume that (t, x), (s, y) ∈ r 2 Z × r Z d for some dyadic r = 2 −N < 1 (this density argument requires the qualitative a priori information of the continuity of h, which can be circumvented by approximating h). For every dyadic level n = N , N − 1, . . . we now recursively construct two sequences (t n , x n ), (s n , y n ) of space-time points, starting from (t N , x N ) = (t, x) and (s N , y N ) = (s, y), with the following properties a) they are in the corresponding lattice of scale 2 −n , i. e. we have (t n , x n ), (s n , x n ) ∈ (2 −n ) 2 Z × 2 −n Z d , b) they are close to their predecessors in the sense of |t n − t n+1 |, |s n − s n+1 | ≤ 3(2 −(n+1) ) 2 and |x n,i − x n+1,i |, |y n,i − y n+1,i | ≤ 2 −(n+1) , where x n,i , x n+1,i , . . . denote the i-component of x n , x n+1 , . . .. So by definition of we have |h(t n , x n ) − h(t n+1 , x n+1 )| (2 −(n+1) ) α , |h(s n , y n ) − h(s n+1 , y n+1 )| (2 −(n+1) ) α , and c) such that |t n − s n | and |x n − y n | are minimized among the points satisfying a) and b).