Weak error analysis for the stochastic Allen-Cahn equation

We prove strong rate resp. weak rate ${\mathcal O}(\tau)$ for a structure preserving temporal discretization (with $\tau$ the step size) of the stochastic Allen-Cahn equation with additive resp. multiplicative colored noise in $d=1,2,3$ dimensions. Direct variational arguments exploit the one-sided Lipschitz property of the cubic nonlinearity in the first setting to settle first order strong rate. It is the same property which allows for uniform bounds for the derivatives of the solution of the related Kolmogorov equation, and then leads to weak rate ${\mathcal O}(\tau)$ in the presence of multiplicative noise. Hence, we obtain twice the rate of convergence known for the strong error in the presence of multiplicative noise.


Introduction
) is defined on a given filtered probability space (Ω, F, (F t ) t≥0 , P), and u 0 is a given initial datum.Here W denotes a cylindrical Wiener process and Φ takes values in the space of Hilbert-Schmidt operators; see Section 2.1 for details.The deterministic version of (1.1) is the well-known Allen-Cahn equation -a phase-field model to approximate the dynamics of an (material) interface by a diffuse interface; see e.g.[20] for a recent review of (deterministic) phase-field models.The related mathematical, underlying modeling, and physical conclusions usually base on the underlying Helmholtz free energy functional E : W 1,2 (T d ) → R, where Here, in particular, the latter energy part accounts for the interfacial/mixing energy, and is related to f (x) = x 3 − x in (1.1) by F ′ = f .As a consequence, the Allen-Cahn equation is the gradient flow of (1.2), i.e., (1.3) where DE(u) denotes the Fréchet derivative of E at u: multiplication of (1.3) with DE(u), integration in space, and the chain rule then lead to the energy identity For E(u 0 ) < ∞, this identity may serve in mathematical analysis to deduce (a priori) bounds for solutions in physically relevant norms -reflecting the fact that the Helmholtz energy E is the proper functional to explain the dynamics of (1.3).This energy-driven approach also serves as guidance to construct the numerical scheme (1.5) below to properly address the specific nature of (1.3).But before we also mention here 'general-purpose' operator-splitting methods, where the nonlinearity in DE(u) is treated explicitly in a time marching context to avoid the use of nonlinear numerical solvers: it is, however, the concomitant violation of the dissipative energy law (1.4) on a discrete level, see also (1.6) below, that a related stability/convergence analysis for splitting schemes usually only gives a priori bounds in non-physical norms, and its derivation is based on a discrete Gronwall type estimate -rather than identity (1.4) -, which heavily affects discrete long time stability.This drawback, in particular, may usually only be compensated in simulations by using (comparatively) much smaller step sizes than in the context of the structure-preserving discretization below; we also mention here that the usual application of model (1.3) in sciences (e.g., multiphase flow in complicated, or even moving domains, [20,Ch. 2]) or geometric PDEs (e.g., approximation of mean curvature flows, [19]) involves a small scaling factor ε > 0 under which realistic (diffuse) interface motion holds -which even further worsens inherent discrete instabilities of structure notably; see also [20,Ch.'s 4 & 5] for a related further discussion.
For these reasons we favor as starting point for a temporal discretisation for (1.1) one which inherits the (gradient flow) structure of the original problem -as is the following implicit Euler discretisation for (1.3) governing iterates (u m ) M m=0 , on an equi-distant mesh (t m ) M m=0 ⊂ [0, T ] of size 0 < τ < 1, and satisfying the discrete energy inequality (see [21]) In this work, we derive weak rates of convergence for the time iteration scheme (1.7) u m − u m−1 − τ ∆u m + τ f u m = Φ(u m−1 )∆ m W for m ≥ 1 , u 0 = u 0 , with ∆ m W = W (t m ) − W (t m−1 ), to address the SPDE (1.1) of gradient type, generalising (1.5).Again, its construction is motivated by the demand to inherit properties in terms of the underlying energy E, for which the concept of a variational strong solution for (1.1) is natural -see Definition 2.1 below -which satisfies the following energy identity and therefore generalises property (1.4) for (1.3); see e.g.[22], where so-called strong variational solutions to stochastic equations of gradient-type are considered.
A discretisation close to (1.7) for (1.1) has been studied in [24] in this spirit, and it was shown that the strong error -i.e., the expectation of the discrete L ∞ (0, T ; L 2 x ) ∩ L 2 (0, T ; W 1,2  x ) distance of discrete and continuous solution -is of order O( √ τ ).This is certainly optimal in general due to the low temporal regularity of the driving Wiener process in (1.1).Our goal in this work is to verify first order weak error estimates for (1.7); it turns out, that its derivation crucially depends on the kind of noise, which we assume to be spatially smooth throughout: a) If Φ(u) ≡ Φ in (1.1) generates additive noise, we even verify strong convergence rate O(τ ) for scheme (1.7) in Theorem 3.1 for d = 1, 2 and 3. Its derivation is based on focusing on the random PDE (3.2) for the transformation y(t) = u(t) − ΦW (t), which is now differentiable in time (see Corollary 3.1), and a simple use of the binomial formula to express f u(t) = f y(t) + ΦW (t) right below (3.2).No discrete stability for iterates of (1.7) is needed here, but the implicit use of the one-sided Lipschitz nonlinearity f , as well as its explicit form are exploited to verify the strong error bounds in Section 3.This approach is motivated from [10].b) If Φ(u) in (1.1) exerts multiplicative noise, we use the Kolmogorov equation (4.45) associated with (1.1), in combination with higher order discrete stability for iterates (u m ) M m=0 from scheme (1.7); cf.Lemma 4.4.For estimate (4.32) and d = 3 concerning the second derivative of the time-discrete solution we are forced to work under the assumption of an affine linear noise, see (N2) below, whereas for d = 2 the same argument even works for a general nonlinear noise, see (N1) below.For the weak error analysis, we conceptually borrow tools from [7,17,5]; see Remark 5.1, 3. in particular, the structural restrictions are detailed.As in [18,17,5], the error E[ϕ(u m )−ϕ(u(t m ))] for a smooth function ϕ can be linked to the solution of the Kolmogorov equation by means of an application of Itô's formula.Therefore we need time-continuous interpolation of the discrete iterates (u m ) M m=0 , which yields an (F t )-adapted process.Since we work with the fully implicit scheme (1.7) this is more complicated than in previous works [18,17,5].A natural candidate for the interpolation is where the solution map T τ for (4.1) is the discrete nonlinear semi-group corresponding to DE, which we analyse in Section 4.1.We observe that the nonlinearity does not appear explicitely in the formula for u τ .In previous works the linear discrete semigroup S τ = (Id − τ ∆) −1 is used instead and f is explicitly evaluated.It turns out that the nonlinear map T τ has nice properties similar to the linear case as a consequence of the one-sided Lipschitz property of f ; see Lemma 4.1.In particular, T τ can be linearised around the identity with an error of order O(τ ) in various norms; see Corollary 4.2.The goal in this paper is a weak error analysis of (1.7) as an example for an implementable scheme in a broad setting of applications, including multiphase flow dynamics in complicated domains, which inherits (long time) discrete stability even for scalings ε ≪ 1 to properly simulate diffuse phase field dynamics; see [20,19], and also item 5. in Remark 3.1.In fact, there exist several other works on weak error analysis for different schemes to solve (1.1) in the literature, most of which apply to restricted data settings, such as domains for d = 1 being intervals (a, b) ⊂ R, or related drift operators being generators of linear semigroups, for which spectral properties need be available for actual computations; see items 3.-4. of Remark 3.1.In this context, admissible spatial meshes are often needed to be equi-distant, which clearly affects the capacity of related schemes to simulate multiscale phase evolution via (1.1) for general data settings.

Mathematical framework
2.1.Probability setup.Let (Ω, F, (F t ) t≥0 , P) be a stochastic basis with a complete, rightcontinuous filtration.The process W is a cylindrical Wiener process, that is, W (t) = k≥1 β j (t)e j with (β i ) i≥1 being mutually independent real-valued standard Wiener processes relative to (F t ) t≥0 , and (e i ) i≥1 a complete orthonormal system in a separable Hilbert space U. Let us now give the precise definition of the diffusion coefficient Φ taking values in the set of Hilbert-Schmidt operators L 2 (U; H), where H can take the role of various Hilbert spaces such as L 2 (T d ), W 1,2 (T d ) and W 2,2 (T d ) for which we use the shorthand notations L 2 x , W 1,2 x and W 2,2 x .
In the following we formulate two sets of assumptions regarding the (regularity of the) diffusion coefficient Φ, which will allow us to derive high moment bounds for higher derivatives of the solution of (1.1) in Section 2.2.In the first case (a) below we consider a general nonlinear multiplicative noise.In the second case (N2) we assume that Φ is an affine linear function of u.As we shall see below, assumption (N1) is always sufficient for our analysis in the case d = 1, 2; see Remark 5.1, 3..
and sometimes also Note that (2.1) and (2.5) imply x , (2.9) x , (2.10) whereas (2.1), (2.5) and (2.6) together yield x , (2.11) (2.12) Assumption (2.1) allows us to define stochastic integrals.Given an (F t )−adapted process 1 We denote by ∇x the derivative with respect to first variable, i.e., the d-valued spatial variable and by ∇ ξ the derivative with respect to the second variable.is a well-defined process taking values in L 2 (T d ) (see [16] for a detailed construction).Moreover, we can multiply by test functions to obtain Similarly, we can define stochastic integrals with values in W 1,2 (T d ) and W 2,2 (T d ) respectively if u belongs to the corresponding class.
2.2.The concept of solutions.In this section we give a precise definition of a solution to (1.1) and derive some of its basic properties.
Definition 2.1.Let (Ω, F, (F t ) t≥0 , P) be a given stochastic basis with a complete right-continuous filtration and an (F t )-cylindrical Wiener process W . Suppose that Φ satisfies (N1) (a) and that d = 1, 2, 3. Let u 0 be an F 0 -measurable random variable with values in L 2 (T d ).Then u is called a weak pathwise solution to (1.1) with the initial condition u 0 provided (a) the function u is (F t )-adapted and The existence of a solution can be shown by the popular variational approach; see e.g.[26].
Theorem 2.1.Let (Ω, F, (F t ) t≥0 , P) be a given stochastic basis with a complete right-continuous filtration and an (F t )-cylindrical Wiener process W . Suppose that Φ satisfies (N1) (a) and that d = 1, 2, 3. Let u 0 be an F 0 -measurable random variable such that u 0 ∈ L q (Ω; L 2 (T d )) for some q > 2. Then there exists a unique weak pathwise solution to (1.1) in the sense of Definition 2.1 with the initial condition u 0 .
(a) Assume that E(u 0 ) ∈ L q (Ω) for some q ≥ 1.Then we have Proof.Part (a) is standard and similar results can be found in the literature, see, e.g., [24]).For the reader's convenience we decided to give the details nevertheless.Applying Itô's formula (this can be justified by truncating the function F and applying the Itô-formula in Hilbert spaces from [16,Theorem 4.17].) to the function t → E u(t) yields We have by (2.1) Similarly, by Burkholder-Davis-Gundy inequality, For the second estimate we apply Itô's formula to t x and obtain similarly to (2.20) We clearly have ≤ cτ using (2.13).Now (2.13) implies Finally, x + 1 ds x + 1 ≤ cτ by (2.1) and (2.13).Arguing as above in the proof of (2.13) we have x ) ds x ) ds x + 1 using (2.13).Plugging all together shows (2.14).
(a) Assume that u 0 ∈ L p (Ω, L p (T d )) for some p ≥ 2. Then we have )) hold and that Φ satisfies additionally (N1) (b).Then we have
Proof.Ad (a).We apply Itô's formula (see [16,Thm. 4.17]) to the functional t → 1 p u(t) p L p x and obtain We have On account of (2.1) we obtain x + 1 ds.
Finally, we use Burkholder-Davis-Gundy inequality, (2.1) and Young's inequality to conclude where Gronwall's lemma comes again into play.Combining everything, choosing κ small enough and using Gronwall's lemma yields the claim.Ad (c).Differentating (2.19) again yields where γ 1 , γ 2 ∈ {1, 2, 3}.The nonlinearity can be handled as in (b), but for the noise we obtain the two terms The first four of them can be estimated similarly to (b) using (2.5) but the last one requires more care (note that it disappears for affine linear noise).For the correction term we have x + 1 ds on account of (N1).Applying expectations and using part (b) (with the assumption u 0 ∈ L 2p (Ω; W 1,p (T d )) the second term is bounded, whereas the first one can be handled by Gronwall's lemma.The supremum of the corresponding stochastic integral can be estimated by similar means.

2.3.
The semigroup of the Laplace operator.In this subsection we recall some well-known facts concerning the Laplace operator and (the discretisation of) its semigroup.We denote by L(X, Y) the space of bounded linear operators between two Banach spaces X and Y and write L(X) for L(X, X).It is classical that there is a basis of L 2 (T d ) consisting of eigenfunctions (v j ) j≥1 of A = −∆ with positive eigenvalues (ν j ) j≥1 such that ν j → ∞ as j → ∞.We can use that to define powers of A by setting Similarly, we can define for r < 0 the space W r,2 (T d ) as the closure of L 2 with respect to the norm It is classical that A generates a strongly continuous semigroup and it is well-known that its discretisation S τ = (Id + τ A) −1 satisfies Note that (2.23) follows from sup t tAS(t) L(L 2 x ) < ∞, while (2.24) is a consequence of Id − S τ = τ AS τ and (2.22) with k = 1.Let us also mention the following estimate which controls the error between S(τ ) and S τ Finally, (2.25) easily implies 2.4.Malliavin calculus.We recall some basic facts from Malliavin calculus, see [27] for a thorough introduction.Given a cylindrical (F t )-Wiener process as in Section 2.1, a smooth real-valued function F defined on U n and smooth random variables with values in ψ 1 , . . ., ψ n ∈ L 2 (0, T ; U) we set This allows to define DF by DF (t)f = D f t F .With this definition it can be shown D defines a closable operator on L 2 (Ω × (0, T ); U).A natural domain space for D is given by D 1,2 which is the closure of the random variables (taking values in the set of smooth functions on U n ) with respect to the norm We have a version of the chain rule: For ϕ ∈ C 1 b (R) and F ∈ D 1,2 we have ϕ(F ) ∈ D 1,2 together with the usual form Dϕ(F ) = ϕ ′ (F )DF .Most important for us is the Malliavin integration by parts formula It holds for all F ∈ D 1,2 and all adapted ψ ∈ L 2 (Ω×(0, T ); U) with Similarly, we can define the Malliavin derivative of random variables taking values in L 2 (T d ).We denote by Again the chain rule holds: to recast (1.1) into the form where we employ the following calculation for the last identity, Assuming sufficient regularity of Φ, Lemma 2.2 implies the following corollary concerning the regularity of y.
Corollary 3.1.Let u be the unique weak pathwise solution to (1.1).(a) Assume that u 0 ∈ L p (Ω, W 2,2 (T d )) for some p ≥ 2 and Φ ∈ L 2 (U; W 2,2 (T d )).Then we have A reformulation corresponding to (3.2) was used in [10] to accomplish a corresponding goal, leading to an improved (local) rate of convergence for a discretisation of the 2D Navier-Stokes equation with additive noise.In comparison, the nonlinearity f = F ′ in (1.1) is again non-Lipschitz, but now is one-sided Lipschitz, which here leads to a discretisation scheme with strong rate O(τ ).
For its derivation, we define y m := u m − ΦW (t m ) for iterates (u m ) M m=1 from (1.7), which satisfy the following identity, (3.5) for m ≥ 1, and y 0 = u 0 .Here d t denotes the discrete time derivative, that is d t y m = τ −1 (y m − y m−1 ) for m ≥ 1.We make the following observations: (1) An equivalent derivation of equation (3.5) is via an implicit Euler-based time discretisation for the random PDE (3.2).Equation (3.5) may be used instead of (1.7) as an alternative scheme to solve (1.1).(2) In the strong error analysis for (y m ) M m=1 below, we prove order O(τ ); the proof rests on bounds for the time-derivative of the (weak variational) solution y from (3.2) in different norms, which are now available.Since y m − y(t m ) = u m − u(t m ), this result transfers to (1.7); see also Remark 3.1, item 4.
The following perturbation analysis evidences the role that the involved nonlinearity f = F ′ will play in the strong error analysis in Subsection 3.2.

3.1.
A perturbation analysis for (3.5).We start with a perturbation analysis, where we refer to (y m i ) M m=1 as the solution of (3.5), with y 0 i = y 0,i , for i ∈ {1, 2}.Subtracting both identities then leads to (e m := y m 2 − y m 1 ) where d t e m := 1 τ (e m − e m−1 ).We write (3.6) with an obvious meaning of the terms I m , II m and III m .When written in weak form, we test with e m , and treat resulting terms for I m and II m independently.For the first term I m we use the following identity which is based on binomial formula (a, b ∈ R), With its help, we obtain that And, by binomial formula, and (3.7), x ≤ e m 2 L 2 x , and the implicit version of Gronwall's lemma settles the estimate x .
3.2.Error analysis for (3.5).We integrate in and define In order to apply the parts from Subsection 3.1, we write where For the error E m := y(t m ) − y m , on using IV m (t m−1 ), we may now easily deduce an equation of the form (3.6), with terms similar to I m , II m , and III m , and a non-vanishing right-hand side tm tm−1 rest IV m (s) ds instead.When tested with E m , the arguments to estimate the first three terms may then be estimated as in Subsection 3.1, leading to inequality (3.8), such that we finally arrive at P-a.s.: m ; for the remaining term on the right-hand side, we treat the three terms {rest (i) where the first term is bounded by Cτ 2 , thanks to (3.4), and Corollary 3.1.By Sobolev's embedding, we estimate for κ > 0 arbitrary where we also used interpolation of L 3 x between L 2 x and W 1,2 x .Now we can absorb the last term into the left-hand side choosing κ sufficiently small.By (3.10) and Corollary 3.1 we can estimate the first term by Cτ 2 .The missing term IV m (s) can be estimated via an analogous (even easier) chain of inequalities.We resume with Estimating the final term by Cτ 2 and applying the discrete Gronwall lemma, the above considerations lead to the following theorem.Theorem 3.1.Let (Ω, F, (F t ) t≥0 , P) be a given stochastic basis with a complete right-continuous filtration and an (F t )-cylindrical Wiener process W .Let T > 0 be fixed.Assume that Let u be the solution to (1.1), and (u m ) M m=1 be the solution to (1.7).Then we have the error estimate By its proof, this result also applies to (y m ) M m=1 from (3.5), and y from (3.2).
Remark 3.1.1. Problem (1.1) usually involves a small-scale parameter ε > 0 to address the width of diffuse interfaces of adjacent material phases, whose resolution in terms of numerical scalings is crucial for accurate simulation; see [1].In this work, we choose ε = 1 to address non-Lipschitzness of f only, but expect a corresponding analysis as in [1] to go through for ε ≪ 1 -avoiding a factor exp( T ε ) that otherwise would occur in a straight-forward application of Gronwall's lemma.
2. The proof of a strong rate O( √ τ ) for an implicit time discretisation -which slightly differs from (1.5) -in the case of (1.1) with multiplicative noise in [24] exploits its character as a structure preserving discretisation, and therefore inherits the gradient structure of the problem, and related energy estimates.The crucial step in the error analysis then uses the weak monotonicity property of the cubic nonlinearity (see [24, (4.11)]) to avoid a truncation argument for the nonlinearity in the sense of [28] for the stochastic Navier-Stokes equation; see also [10].
3. The construction and analysis of numerical schemes in [24,1] is based on the strong variational solution concept for (1.1) -which is in contrast to other numerical works in the literature where the linear semi-group S := {S (t); t ≥ 0}, for S (t) = e t∆ is used as building block to set up the mild solution concept for (1.1) with additive noise for d = 1 and ε = 1, In this setting, the authors of [6,8] prove strong rates of convergence for a splitting scheme, even addressing space-time white noise.Splitting schemes have a long tradition in evolutionary problems to avoid solving nonlinear PDEs for iterates, but cause structure violation of (1.1); in fact, the underlying stability bounds in [8, Prop.3] and [6, Lemma 3.1] are not in the natural energy norm of the underlying problem (1.1) with gradient structure, and also require a Gronwall-type estimation -which reflects usually needed smaller mesh sizes in 'splitting-scheme based' simulations at intermediate or large times T ≫ 1 to keep accuracy.In comparison, the (energy) structure-preserving schemes (1.5) and (3.5) are nonlinear, but the larger computational effort caused by Newton-type fast nonlinear solvers here usually goes along with admissible larger mesh sizes to attain the same accuracy.4. In [6], the strong rate O(τ ) is shown for the scheme [6, (1.2)] in [6,Thm. 4.6].A crucial step in their proof is also to employ a transformation, see [6, (2.4) and (2.7)], which is similar conceptionally to (3.1) -however in [6] on the level of mild solutions, using (3.16); in fact, the (corresponding) identity is used instead of SPDE (3.16).Eventually, only the random PDE is studied in the main part of the analysis in [6].Here the last term is the 'stochastic convolution process W ∆ := { W ∆ (t); t ≥ 0}'.From a practical point of view, however, the authors in [6] clearly mention the restricted applicability of their scheme, which requires the explicit knowledge of S , and (an approximation) of W ∆ : as such, its efficient use requires to know its spectrum -which restricts its practical application to prototypic domains O ⊂ R d where eigenvalues of ∆ are explicitly known.On the other hand, the approach in [6] allows less regular noise compared to here.
Corresponding restrictions hold for schemes that are studied in [25,3,2,14], where again the construction rests on (3.18) and (3.17), and involves (the approximation of ) W ∆ . 5. To our knowledge, optimal strong convergence O(τ ) for a time discretization to solve (1.1) with additive trace-class noise was first established in [29] 2 : here, again, the authors start with a mild solution for (1.1), and base their error analysis of (1.5) on its reformulation [29, (4.2)], and the known (smoothing) properties of the semigroup S t = e t∆ .The error analysis provided in this section rather exploits variational arguments: it bases on reformulation (3.5), as well as simple, explicit calculations for the specific f in Subsection 3.1, and may easily be generalized to weakly elliptic (e.g.non-selfadjoint) operators which do not generate a semi-group; see e. 4. Preparations for the weak error analysis 4.1.The nonlinear semigroup.In this section we study properties of the discrete nonlinear semigroup T τ on L 2 (T d ), which is the solution operator to the equation where g ∈ L 2 (T d ) is given.We start with some stability estimates.We can write (4.1) equivalently as Due to the choice of the continuous interpolation u τ in (4.26) below, the weak error analysis in Section 5 heavily depends on stability estimates for T τ , which we derive in the following lemma.Eventually, we estimate the distance of T τ to the identity and consider its Fréchetderivatives.They are used for the same purpose due to the representation for u τ in (4.27).Lemma 4.1.Suppose that τ < 1 2 and p ≥ 2. Then we have x , (4.5) for all functions g for which the quantities on the right-hand side are finite.Here c = c(p) > 0 is independent of τ and g.

Proof. Ad (4.3). Testing (4.1) by |v|
p 2 , where we used The term on the right-hand side can be handled by Young's inequality, while we have This proves (4.3) for τ < 1 2 .
Ad (4.4).Similarly, by applying ∂ i to (4.1), multiplying with |∂ i v| p−2 ∂ i v (and arguing as in (4.6)) integrating in space 3 and summing with respect to i ∈ {1, 2, 3} yields The two last terms can be handled analogously to the proof of (4.3), such that (4.4) follows.
Ad (4.5).Now we apply ∂ i ∂ j to the equation and multiply with |∂ i ∂ j v| p−2 ∂ i ∂ j v.It remains to control the nonlinear term as the rest can be handled analogously to the estimates above.We obtain on the right-hand side After integration over T d the first term can be absorbed, while the other two can be controlled using (4.3) and (4.4).

Now we turn to the Fréchet derivative of T τ and prove estimates on
x .(4.15) 3 This step can be made rigorous be working with difference quotients.
(c) For all g ∈ W 2,2 (T d ).We have Proof.Ad (a) By inverse function theorem 4 for Fréchet derivatives we have Hence the claim follows.Ad (b) We obtain x .
The last term does not have an obvious sign and needs to be estimated.We have x + 1) h L 6 x such that we conclude Replacing h by DT τ (g)h (recall that T τ : L 2 (T d ) → W 2,2 (T d ), cf.equation 4.1) and using (4.14) shows x and the claim follows.Ad (c) We have where the first term can be estimated by x , by interpolation and Sobolev's embedding.We further conclude x − cτ g 4 L 8 x + 1 ∇h and the second term in (4.18) has the lower bound We conclude as in the proof of (4.15) x .Replacing h by DT τ (g)h and using (4.15) shows x ∇g 2 L 6 x (1 + g 4 x , which yields the claim.
Finally, we estimate the distance between DT τ and the identity.
Using (4.5) we can finish the proof.Ad (c).Now we turn to the second derivative which can be written as Using (4.14) and (4.3) with p = 6 yields x for all h 1 , h 2 ∈ W 1,2 x .Hence the claim follows.4.2.Semi-discretisation in time.We consider an equidistant partition of [0, T ] with mesh size τ = T /M and set t m = m∆t.Let u 0 be an F 0 -measurable random variable with values in W 1,2 (T d ).We aim at constructing iteratively a sequence of F tm -measurable random variables u m with values in W 1,2 (T d ) such that for every ϕ ∈ W 1,2 (T d ) it holds true P-a.s.
. The existence of a unique u m (given u m−1 and ∆ m W ) solving (4.23) follows from its re-interpretation as a convex minimisation problem.Moreover, the discrete energy estimate holds under the assumptions made in Lemma 4.4.In fact, we will study the stability of u m in detail in Section 4.3 and derive more general (and higher order) estimates in Lemma 4.4.
For the weak error analysis in Section 5 it will turn out to be useful to write (4.23) as where T τ is the discrete nonlinear semigroup corresponding to DE, which we analysed in Section 4.1 above.Note that different to previous works on weak error analysis, cf.[5,7,17], we treat the nonlinearity implicitly.Hence it is more complicated to define a time-continuous interpolant which coincides with u m in t m and is still progressively measurable.Note, however, that T τ features nice properties similar to those that we have seen in the previous subsection. Setting we introduce the (F t )-adapted process which coincides with u m−1 in t m−1 and with u m in t m .In the following we linearise this formula around U τ , which gives a part which is (given U τ ) linear in u m−1 (similar to the method from [5,7,17]) plus an error term.The latter will turn our to be globally of order τ as required.Applying Itô's formula to the second term in (4.26) yields Finally, we derive some uniform estimates for U τ in terms of u m .By the definition of U τ , the Burkholder-Davis-Gundy inequality, and estimates (4.3) and (2.1), we have for all q ≥ 2 A similar argument applies when we replace L 2 (T d ) by W 1,2 (T d ) or W 2,2 (T d ) using this time (2.22) combined with (2.1) and (4.4) or (2.22), (2.5) and (4.5) respectively.We conclude uniformly in τ for q ≥ 2, t ∈ [t m−1 , t m ] and k = 0, 1, 2. By the estimates (4.3) and (4.4) in Lemma 4.2, formula (4.26) thus yields uniformly in τ for q ≥ 2, t ∈ [t m−1 , t m ] and k = 0, 1.Similarly, (4.5) in Lemma 4.2 implies Note that controlling the right-hand sides of (4.28)-(4.30) is not straightforward and will be done in the next subsection.

4.3.
Estimates for the time-discrete solution.We now derive some uniform estimates for the solution of the time-discrete problem (4.23).These estimates involve the energy E(•) from (1.2) as relevant Liapunov functional, reflecting that the discrete problem (4.27) inherits the relevant gradient flow property of the original problem (1.1).Lemma 4.4.Let (Ω, F, (F t ) t≥0 , P) be a given stochastic basis with a complete right-continuous filtration and an (F t )-cylindrical Wiener process W .Let T ≡ t M > 0, assume that E(u 0 ) ∈ L 2 q (Ω) for some q ∈ N 0 .Choose τ ≤ 1  4 .The iterates (u m ) M m=1 from (4.23) satisfy the following estimates.
(a) Suppose that either (N1) (a) holds and that Φ is bounded and summable in the sense that sup x,ξ |Φ(x, ξ)e k | ≤ µ k with k≥1 µ k < ∞ or that assumption (N2) is in place.For all q ∈ N 0 there exists c = c(q, T, u 0 ), such that (Ω) for some q ∈ N 0 and that Φ satisfies (N1) if d = 1, 2, and (N2) if d = 3.Then we have

33)
Here c = c(q, T, u 0 ) > 0 is independent of τ .Remark 4.1.Part (a) is an energy estimate for the solution (u m ) M m=1 from (4.23); in the context of phase-field models where the mesh width ε > 0 enters, it avoids exponential growth with respect to ε −1 in time -as its continuous counterpart.Its derivation below is close to [1, Lemma 3.2], but here generalizes to noise of type (N2).The remaining parts (b) and (c) give higher norm estimates.
Proof of Lemma 4.4.Ad (a).Choose q = 0 first.We proceed in several steps: I. derives a first energy estimate, which creates a new term, which is then bounded independently in II.. To extend the estimate (a) to q ≥ 1 then follows from a general inductive argument as in [ which we write as The whose short proof is recalled here for the reader's convenience, and which is based on repeated use of binomial formulae: since we have that x .By coming back to (4.34), we then obtain after summation To proceed we must control the stochastic term N m := m n=1 II n , where It follows from the definition of E(u n ) that m , and N C m , which we now bound independently.I 1 ) We have by Young's inequality and Itô-isometry for κ > 0 arbitrary x using also (2.1) in the last step.The first term can be absorbed for κ ≪ 1 on the left-hand side of (4.36) and, since ∇u n−1 x ≤ 2E(u n−1 ), the final term can be handled by (discrete) Gronwall's lemma there.
m , which is motivated by the algebraic identity By generalized Young's inequality we estimate (κ > 0) For κ ≪ 1 sufficiently small, the first term may be absorbed on the left-hand side of (4.36); for the second term, we note that The latter part in the last inequality is part of E(u n ), which is why this term may now be bounded via discrete Gronwall inequality.For the final term we use estimates for the stochastic integral in UMD-Banach spaces, see [30].For an UMD Banach space (X; • ) and a separable Hilbert space H with orthonormal basis (h k ) k≥1 we denote by γ(H, X) the space of γ-radonifying operators from H → X with norm where (γ k ) k≥1 is a sequence of standard Gaussian random variables given on probability space (Ω ′ , F ′ , P ′ ) with expectation E ′ .Note that this differs from the original probability space (Ω, F, P).However, in the case of Hilbert spaces the norm above coincides with the Hilbert-Schmidt operator norm.In particular, the additional randomness disappears.In our case, it will be removed by the following estimate and is only introduced to quote the estimate from [30].We obtain using the assumption made on Φ x .Using now the estimate from [30] for X = L 4 x as well as (2.1) we obtain To bound the term N B,2 m , we distinguish the type of admissible noise: I 21 ) Let Φ satisfy (N1) (a) and be bounded.Then there exists c ≥ 0 such that (κ > 0 arbitrary): The first term will again be be bounded via discrete Gronwall inequality, while the second term will be bounded below.I 22 ) Let Φ satisfy (N2).We use a compatibility property of data Φ and f from DE.For the derivation of the bound, it suffices to consider the case that all β k ≡ 0 in (N2) only.Now fix one k ∈ N. Then After summation, and taking expectations, we first use Young's inequality, and then use Itô-isometry as well as (2.7) (κ > 0) The leading term may now be absorbed on the left-hand side of (4.36), while the last term is the same as in (4.38), and will be bounded below.The estimation of E max 1≤m≤M m n=1 B 2 n is immediate.
m is a martingale we finally obtain by Burkholder-Davis-Gundy inequality, the expression DE(u n−1 ) = −∆u n−1 + f (u n−1 ) and (2.1), which can be handled by absorption (for κ small enough) and Gronwall's lemma.
By inserting the estimates I 1 )-I 3 ) into (4.36) and choosing κ ≪ 1 small enough to allow absorption of terms, there exists c ≡ c(t M ) > 0 such x , (4.41) where the terms I m A and I m B are from (4.35).Hence, we resume that For τ ≤ 1 4 , and after summation over 1 By Itô-isometry, the last term is bounded by upon inserting this estimate into (4.40)settles the estimate (a) for q = 0. III.We may now proceed inductively to settle (a) for q ≥ 1, by multiplying (4.34) with E(u m ) 2 q −1 before summation; it is the implicit numerical treatment of drift terms in scheme (4.23) that generates new numerical diffusion terms which then control newly arising terms; see [1, Lemma 3.2] for details.
Ad (b).Before we come to the proof of (4.32) we need some preliminary estimates for lower order deriavtives.We choose u m as a test function to get where, clearly, x .By binomial formula, and iterating this estimate and applying Gronwall's lemma proves where m is a (discrete) martingale such that, by Burkholder-Davis-Gundy inequality, (2.1) and Young's inequality, for any κ > 0. Furthermore, we have for arbitrary κ > 0 x + t M due to Young's inequality, Itô-isometry and (2.1).Absorbing the κ-terms and applying Gronwall's lemma we conclude x + 1 .
Let now q ∈ N: we argue by induction as in (a) (first multiply (4.42) by u n x , then take expections to get bounds for involved; afterwards, use these bounds when max is applied first before taking expectations; see [1,Lemma 3.2] for details) to get By formally testing with −∆u m , using that x and controlling the stochastic integral with the help of (2.3) we obtain similarly and again for q ∈ N, by multiplication with u m x before taking expectations, Now we come to the proof of (4.32).We formally test the equation with ∆ 2 u m .For the nonlinear term we obtain (κ > 0) x .Here the second term can be absorbed for κ ≪ 1, whereas the first one (in expectation and summed form) can be controlled by (4.44) (with q = 3).If d = 3 we use that Φ is assumed to be affine linear in u, cf.assumption (N2).In this situation the stochastic terms can be estimated exactly as in the proof of (4.43).If d = 1, 2 this problem can be overcome by Ladyshenskaya's inequality and (N1).In particular, we have by (2.8) x , which is uniformly bounded by (4.32) with q = 2.This settles the proof of (4.32) for q = 1.An inductive argument may now be employed to complete the proof for (4.32) for q ∈ N.
Ad (c).The proof works along the same lines testing with ∆ 3 u m and estimating the stochastic terms by means of (2.7).For the nonlinear term we obtain x .The second term on the right-hand side can be absorbed provided we choose κ > 0 small enough.The first one (summed with respect to m) ins bounded by (4.32).

The Kolmogorov equation. We set
In the following we derive some estimates for U .The proofs are only formal and can be made rigorous by considering a finite-dimensional Galerkin approximation 5 of (1.1) (leading to a finite-dimensional Kolmogorov equation approximating (4.45)) and establishing estimates which are uniform with respect to the Galerkin parameter.Such a procedure is technical and tedious but standard in literature.We refer to [12] and [16,Chapter 9] for a detailed analysis of Kolmogorov equations.
A crucial ingredient in our proof is the monotonicity of the leading term in f (u) = u 3 − u.It is currently an open problem to obtain similar results for semilinear equations without this property such as the 2D Navier-Stokes equations.Since we allow more regularity for the solution to (1.1) through more regular data when compared to previous papers, we only require estimates in L 2 x for the weak error analysis.For example, the estimates in [5,7,17] are given in fractional Sobolev spaces (with differentiability strictly smaller than 1/2) for the derivatives of U are proved instead; morover, the situation here is more complicated than that in [5,17] due to the non-Lipschitz nonlinearity in (1.1).In [7] the Kolmogorov equation for (1.1) in 1D is considered (with additive space-time white noise), while we study (1.1) with smooth multiplicative noise in cases d = 1, 2, 3 in the following.
for some p ∈ (1, ∞) and suppose that (N1) (a) holds.Then we have for t ∈ (0, T ) and h ∈ L p (T d ) x ≤ c(p, ϕ, T ).Proof.We proceed formally.A rigorous proof can be obtained as follows: • We first proof the result for p = 2 by means of a Galerkin approximation and pass to the limit obtaining well-defined infinite-dimensional objects.• In particular, we obtain a variational solution to (4.47) below to which we can apply Itô's formula.This can be justified by truncating the function z → z p and applying the Itô-formula in Hilbert spaces from [16,Theorem 4.17].
Differentiating U with respect to h in direction g ∈ L p ′ (T d ) with p ′ := p/(p − 1) yields where η h,g solves dη h,g = ∆η h,g − f ′ (u) η h,g dt + DΦ(u)η h,g dW, η h,g (0) = g.(4.47)Since Dϕ is bounded we have Applying Itô's formula to (4.47) yields Since f ′ ≥ −1, the second term on the right-hand side is clearly bounded by while the third one vanishes under the expectation.Using (N1), the same bound follows for the last term.We conclude x ds such that, by Gronwall's lemma, Plugging this into (4.48) and taking the supremum with respect to g proves the claim.
Note that it is easy to generalise the argument above to obtain for all q ≥ p ′ .For that purpose it is sufficient to argue as before and estimate the stochastic integral by means of the Burkholder-Davis-Gundy inequality and (2.1).By a parabolic interpolation (4.50) (with p ′ = 2) yields E η h,g (t) q L 10/3 (0,T ;L and suppose that ((N1)) holds.Then we have for t with ζ h,g,k (0) = 0, which can be seen from differentiating (4.47).Note that the second term in (4.52) can be estimated by means of (4.51).In order to estimate the first one, we apply Itô's formula to (4.53) yielding By Hölder's inequality we can bound the last line by the second last one, whereas the stochastic integrals have zero expectation and thus can be ignored.By (2.1) we have x ds.Clearly, it also holds that x ds.
The remaining two terms are more complicated.First of all, (2.5) yields 10/3 10/3 x + 1 ds, the expectation of which can be controlled by g L 2 x and k L 6 x using (4.50).The most critical term is We can absorb the first term, whereas using the embedding L x the second one is bounded by x .ds Hence we obtain from (2.13) (with p = 6) and (4.51) x + 1 .We conclude for (4.54) that which finishes the proof by bilinearity of D 2 U (t, h).

Estimate (4.55) above can be strengthened to
for any q ≥ 2: By taking the power q/2 in (4.54), the only difference is that the stochastic integrals do not vanish and must be estimated.Using Burkholder-Davis-Gundy inequality and (4.51) we obtain the upper bounds (assuming that q ≥ 4) x ds q 4 ≤ cκ E ζ h,g,k (s) q L 10/3 (0,T ;L x , as well as which can be handled by Gronwall's lemma.By a parabolic interpolation we obtain (4.56) as well as for any q ≥ 2.
and suppose that (N1) holds.Then we have for t ∈ (0, T ) where (by differentiating (4.53)) with ξ h,g,k,l (0) = 0.The last three terms in (4.58) can be estimated by means of (4.51) and (4.56).So, our focus is on the first one.We apply now Itô's formula and argue similarly to the proof of Lemmas 4.5 and 4.6.The correction terms can be handled as there using (2.1)-(2.6).
For instance, we have x + 1 ds ≤ c q, g L 2 x , k L 6 x , l L 6 x using (4.51) and (4.57).We clearly have x ds.
Let us now focus on the remaining terms arising from the nonlinearity being more critical.It holds x ds, x ds, x ds, x ds.
Estimating the W −1,2 x -norm by the L 2 x -norm and applying Hölder's inequality, the c(κ) terms are controlled (line by line) by the terms       x -norms of g, k and l.

Weak first Order Convergence Rate
This section is the heart of the paper and is dedicated to the proof of following theorem, which establishes an optimal weak error rate for the time discretisation (4.23).
Theorem 5.1.Let (Ω, F, (F t ) t≥0 , P) be a given stochastic basis with a complete right-continuous filtration and an (F t )-cylindrical Wiener process W . Suppose that u 0 ∈ W 3,2 (T d ) and that Φ satisfies (N1) if d = 1, 2, and (N2) if d = 3.Let u be the unique pathwise solution to (1.1) in the sense of Definition (2.1) and let (u m ) M m=1 be the solution to (4.23).For any where c > 0 depends on ϕ, u 0 , T and Φ.
Remark 5.1.1.For d = 1 and additive white noise in (1.1), a related result has been obtained in [7,Theorem 3.3] for the implementable splitting scheme [7, (1.1) 2 ], using bounds for related iterates from [8, Proposition 3] in supremum norm, whose derivation used a Gronwall argument.The result is obtained under a compatibility condition for white noise and drift operator [7, Section 2.1.2]whose physical interpretation is not immediate, and crucially exploits the additive character of noise to accomplish the estimate right before [7, Section 2.1.2].The key part of the weak error analysis then is based on the Kolmogorov equation [7, (2.11)], where the appearing L (∆t) is generator of the semigroup generated by the regularized problem [7, (2.7)] with solution X (∆t) -rather than for (1.1) as we do here; note that this modification also gives rise to a modified energy E (∆t) for the scalar order parameter u, if compared to the Helmholtz free energy functional E in (1.2). -In this work, we heavily profited from the tools given in the error analysis in [7], as well as [18,17,5].
2. The estimates for the solution of the Kolmogorov equation from Section 4.4 are somewhat weaker than those used for the weak error analysis in [5,7,18,17] (see also [4,13,15] for related results).We compensate this by assuming more regularity for the data (initial datum and noise) which results in the higher order estimates for the time-discrete solution from Section 4.3.Our analysis in the additive case in Section 3 is also based on higher spatial regularity of the diffusion coefficient.
3. The reason for the restriction to affine linear noise in Theorem 5.1 for d = 3 is only due to estimate (4.32) in Lemma 4.4 which is heavily used.Apart from that the proof of Theorem 5.1 does not require this restriction.In particular, in the 2D case the result also hold for nonlinear noise.
4. A large part of the analysis in this paper directly extends to the Dirichlet case -at least if the underlying domain is sufficiently smooth.Only the results from Section 4.1 require some additional technical effort: as we are working with nonlinear test functions it is not possible to justify the estimates from Lemma 4.1 by means of a Galerkin approximation (using eigenfunctions of the Laplace operator) anymore.Instead one has to prove localised estimates then by means of cut-off functions.In order to obtain global estimates one has to parametrise the boundary with local charts, change the coordinates and reflect the transformed solution at the hence obtained flat boundary.The local estimates can then be applied to the reflected transformed solution.Such a procedure is tedious and technical but standard in literature.A detailed presentation can be found in detail, e.g., in [9, Section 4].
5. Large parts of the analysis of this section extend to the case of more general functions f in equation (1.1) with q-growth for some q ≥ 2. However, controlling the terms in (5.8) requires that the leading part of f is exactly of the form f (z) = az q with a > 0 and q ∈ N.
The remainder of this section is dedicated to the proof of Theorem 5.1, which is split into several subsections corresponding to the estimates of individual error terms.
We start by decomposing the error in several parts which will be analysed in the subsequent subsections.Let u be the solution of (1.1) and (u m ) M m=1 be the solution to its time-discretisation (4.23), both with respect to the initial datum u 0 = h ∈ L 2 (T d ).The error is where U is the solution to the Kolmogorov equation (4.45) with initial condition ϕ ∈ C 2 b (L 2 (T d )).We decompose Recalling the definition we rewrite equation (4.27) as We apply Itô's formula (see [16,Thm. 4.17]) to Ψ(t, u τ ) to get for t Setting Ψ(t, h) = U (T − t, h) we obtain by (4.45)We will estimate the terms (I)-(V) in the following five subsections.
We clearly have x dσ ds , Moreover, it holds by (4.21) and Lemma 4.4 x dσ ds such that, by (2.5) and (4. where In order to proceed we apply Itô's formula to f j (u τ ) and recall the definition of u τ in (4.27).Using the orthonormal basis (e i ) i≥1 of the Hilbert space U introduced in Section 2.1 we obtain where ∂ j u and Du are evaluated at (T − s, u τ ).We have by Lemma 4.5, (4.21) and (4.14) x dσ ds x dσ ds x ) dσ ds using also (4.3)-(4.5) in the last step.Finally, by (2.1), ) dσ ds ≤ cτ on account of Lemma 4.4 and (4.29).In order to estimate (III) 2 2 we rewrite We obtain further with the help (4.19) x + 1) dσ ds using also (4.14).We estimate further with the help of (2.1), Lemma 4.
One easily checks that the first term disappears when integrating over [t m−1 , t m ] using that f (0) = 0.As far as the second term is concerned we write (5.8) The first term vanishes under the expectation and hence can be ignored, while the expectation of the (L 2 x -norm of the) second one can be controlled by τ E u m−1 L 2 x Φ(u m−1 ) 2 L2(U;W 2,2 x ) .as a consequence of Itô-isometry.By (4.3) and (4.12) Finally, x ) DΦ(u τ )e i L(L 2 x ) .We conclude We use time discretisation to approximate the stochastic Allen-Cahn equationdu = ∆u − (u 3 − u) dt + Φ(u)dW in Q T , u(0) = u 0 in T d , (1.1) in Q T := (0, T ) × T d , where T d = (−π, π)| {−π,π} d (supplemented with periodic boundary conditions) with T > 0 and d = 1, 2, 3.The use of periodic boundary conditions is for an easy of presentation only, see Remark 5.1, 4. The unknown u in (1.1

2 .
The error in the linear part (II).We use AS τ = S τ A and A − S τ A = −τ S τ A 2 to get (

2 x 5 . 3 .
dσ ds ≤ cτ using Lemma 4.4 in the last step.Plugging all together we have shown |(II)| ≤ cτ.(5.5)The error in the non-linear part (III).The non-linear part can be written as