An adaptive time-stepping method based on a posteriori weak error analysis for large SDE systems

We develop an adaptive algorithm for large SDE systems, which automatically selects (quasi-)deterministic time steps for the semi-implicit Euler method, based on an a posteriori weak error estimate. Main tools to construct the a posteriori estimator are the representation of the weak approximation error via Kolmogorov’s backward equation, a priori bounds for its solution and the Clark–Ocone formula. For a certain class of SDE systems, we validate optimal weak convergence order 1 of the a posteriori estimator, and termination of the adaptive method based on it within O(Tol-1)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\mathcal {O}}}(\mathtt{Tol}^{-1})$$\end{document} steps.


Introduction
Let L, K ∈ N\{0}, and T > 0. In this work, we study a new adaptive time-stepping strategy to efficiently approximate the R L -valued solution X ≡ {X t ; t ∈ [0, T ]} of the stochastic differential equation (SDE) dX t = −A X t +f(X t ) dt + K k=1 σ σ σ k (X t ) dβ k (t) for all t ∈ [0, T ], X 0 = y ∈ R L , (1.1) where {β k (t); t ∈ [0, T ]}, k = 1, . . . , K are independent R-valued Wiener processes on the filtered probability space ( , F, {F t } t≥0 , P), and A ∈ R L×L is invertible and positive definite. We refer to Sect. 2, where proper settings for data A , f, {σ σ σ k } k , are given. Problem (1.1) may be motivated from a spatial discretization of the semilinear B Andreas Prohl prohl@na.uni-tuebingen.de Fabian Merle merle@na.uni-tuebingen.de 1 Mathematisches Institut, Universität Tübingen, Auf der Morgenstelle 10, 72076 Tübingen, Germany stochastic partial differential equation (SPDE) on a bounded domain D D D ⊂ R d , dX t = ε X t + [β β β · ∇]X t + F(X t ) dt + K k=1 k (X t ) dβ k (t) for all t ∈ [0, T ], for given ε > 0, β β β : D D D → R d constant for simplicity, and H a Hilbert space; see Sect. 5 for further details. Our aim is an adaptive mesh strategy for the semi-implicit Euler method applied to (1.1), which, for every j ∈ N 0 , automatically selects the new step size τ j+1 = t j+1 −t j , and then determines the R L -valued random variable Y j+1 from ( j ∈ N 0 ) (1.3) for j+1 β k := β k (t j+1 ) − β k (t j ), to approximate the solution X t j+1 from (1.1) at time t j+1 . Conceptually, we base this local step size selection strategy on a (computable) a posteriori weak error estimator G in each step, i.e., for φ ∈ C 3 (R L ) with globally bounded first, second and third derivatives. A criterion may then be set up to select a new, large τ j+1 in every step, such that the right-hand side of (1.4) stays below a chosen tolerance Tol > 0. For the derivation of (1.4) we benefit from [28], where an expansion of the weak approximation error for uniform deterministic time steps (and originally for the explicit Euler method) was obtained via Kolmogorov's backward equation: for all x ∈ R L , (1.5) where L ≡ L X is the generator of the Markovian semigroup from X ≡ {X t ; t ∈ [0, T ]} in (1.1), Tr σ σ σ k (x)σ σ σ k (x)D 2 x u(t, x) with σ σ σ (·) ≡ σ σ σ 1 (·), . . . , σ σ σ K (·) ∈ R L×K . Under proper assumptions, such as for instance those stated in Sect. 2, the function u(t, x) = E φ(X t,x T ) is the unique solution of (1.5); see e.g. [15, p. 366ff.]. As usual we denote by X t,x ≡ X t,x s ; s ∈ [t, T ] the R L -valued process which starts at time t ∈ [0, T ] in x ∈ R L .
As already mentioned, we are motivated by (a spatial discretization of) SPDE (1.2), which is why we aim for adaptive methods, which are applicable to SDE (1.1) with L 1 large; in this respect, we prefer (quasi-)deterministic (rather than random) meshes {t j } j≥0 ⊂ [0, T ] to avoid requirements for too large storage resources, or time-consuming post-processing tasks to synchronize data, such as interpolation, or projection. This approach lends itself to a vectorized implementation (see Algorithm 4.1) and is of advantage over procedurally generated meshes (such as those in [10,16,17]) where the efficient implementation as a vectorized algorithm is an unsolved problem.
The following example illustrates local mesh refinement and coarsening by the adaptive Algorithm 4.1, which is detailed in Sects. 4 and 5.
Example 1.1 Let L = 25. Consider SDE system (1.1) with K = 5, which results from a finite element discretization (with spatial mesh size h = 1 L+1 ) of SPDE (1.2) with ε = 1, β β β ≡ 0 0 0, and y(x) = sin(π x), x ∈ (0, 1); see Sect. 5.1 for details. For the test function φ(x) = √ h x R L to approximate the L 2 -norm, and an initial step of size 0.1, we observe an instantaneous refinement via Algorithm 4.1 to τ 1 ≈ 10 −4 ; the mesh size then rapidly increases to values close to 10 −1 at times t ≈ 0.5, reflecting (spatial) smoothing dynamics; see Fig. 1b. Figure 1a shows a typical trajectory, where the buckling is caused by the driving noise. Different adaptive methods to solve SDE (1.1) may be found in the literature, addressing diverse numerical goals: in [20], an adaptive time-meshing concept is combined with the Euler-Maruyama method to foster discrete stability of the explicit time-stepping scheme in cases where the local Lipschitz drift only satisfies a 'onesided Lipschitz condition'. Automatic mesh refinement (resp. coarsening) for each realization ω ∈ is applied if rapid (resp. slow) changes in the drift at two subsequent states are observed, where a maximum mesh size max bounds local (random) mesh sizes {τ j (ω)} j≥0 ⊂ [0, max ] to conclude asymptotic strong convergence. Another work which uses adaptive random meshes as well to strengthen the stability of the underlying explicit discretization of the above mentioned class of SDEs (1.1) is [10]: here, a 'discrete one-sided Lipschitz condition' is used to generate random mesh sizes {τ j (ω)} j≥0 , which are then further constrained to lie in [ min , max ]. The main result in [10] is the derivation of an optimal convergence rate O( 1/2 max ) on variable random meshes of size between min and max . Close to the goals and applied tools in this work are [16,17], where, again, to set parameters min and max requires some a priori knowledge, and the complexity in the worst case of the method may depend on −1 min , and the dimension L of the problem due to the explicit character of the discretization that affects relevant discrete solution bounds; see also [13] in this respect.
A different line of research derives a posteriori error estimates (such as (1.4)) to judge the quality of the current approximation, and uses it then as a 'steering tool' to initiate an automatic remeshing strategy. While this conceptual idea to design adaptive methods has been well-known in the context of (certain) ODEs and PDEs before, it has first been introduced in [23,27] for SDEs in the contents of weak approximation of SDE solutions (here again via Euler-Maruyama discretization). In these works, an (asymptotic weak) a posteriori error expansion (1.6) with computable {ρ j } J j=1 has first been obtained. Its derivation in [23,27] rests on the weak error expansion of Talay and Tubaro [28] via Kolmogorov's backward equation (1.5), and numerically approximates derivatives of the solution u of the PDE (1.5), whose simulation is limited to small dimensions L. Then, (random) time meshes are generated automatically based on the computable part of the right-hand side of (1.6)with no minimum or maximum mesh sizes to be set, but only the parameter Tol (also serving as convergence parameter) to bound the leading error term on the right-hand side of (1.6). The iterative generation of an adapted time mesh requires the repeated computation of (approximations of) the global problem (1.5)-opposed to determining local time steps τ j+1 based on 'so far' computed solutions {Y } j+1 =0 only. From an analytical viewpoint, the results in [23,27] crucially rest on the assumed boundedness of the involved drift and diffusion functions to circumvent the deficiency of 'discrete stability' of the governing (explicit) Euler-Maruyama scheme; the advantage, however, is a theoretical backup for this weak adaptive algorithm in terms of termination at optimal rate, and the asymptotic weak a posteriori error estimate (1.6).
Conceptually, the derivation of the adaptive Algorithm 4.1 below is close to [23,27] and uses a related weak error representation (see (1.8) below) for the semi-implicit Euler scheme (1.3) with the help of the solution u of (1.5)-but differs in some relevant aspects: the first is the use of the semi-implicit Euler scheme (1.3), which allows for L−independent (higher moment) stability bounds for its solution in case data satisfy (A1)-(A3) in Sect. 2.1; see Lemma 2.6. These stability bounds for (1.3) in Lemma 2.6 are the relevant property to show optimal order of weak convergence of the a posteriori weak error estimator proposed in Theorem 3.1 on given meshes; cf. Theorem 3.5.
A second difference to [23,27] is that we bound derivatives of u that appear in (1.5) in the weak error representation (1.9) by a priori bounds (see (1.10) below) in terms of derivatives of φ, which removes the neccessity to numerically approximate derivatives of the solution of (1.5)-and thus enables the applicability of Algorithm 4.1 to large SDE systems, as they e.g. come from SPDE (1.2) via spatial discretization (in Example 1.1).
(1.9) see Lemma 3.2 for the justification of this identity. Conceptionally, the right-hand side of (1.9) uses the continuified process Y Y Y built from the iterates Y j and Y j+1 -and first and second derivatives of the solution u from (1. An interpretation of (the right-hand side of) (1.9) in a corresponding setting in [23,27] is its view as products of (local) error indicators for the drift and diffusion, and weights D x u, D 2 x u, which 'encode' the chosen test function φ. In the next step, we use the first to third variation equations for (1.1), see (2.2)-(2.4) in Sect. 2.3, to deduce bounds (1.10) for derivatives of the solution u of (1.5), where constants C ,i > 0 do not depend on the dimension L. Note that only derivatives of u are involved in (1.9), so their estimation with the help of (1.10) suggests to choose test functions φ : R L → R in (1.4) whose derivatives are uniformly bounded on R L -such as norms; see also Example 1.1.
While the derivation of estimate (1.10) is known for a general class of SDEs, see e.g. [3,Sec. 1.3], we calculate the constants {C ,i ; 1 ≤ i ≤ , 1 ≤ ≤ 3} under the assumptions (A1)-(A3), which are needed in the a posteriori error estimate (1.4); see Lemma 2.5. A further tool to derive (1.4) then is the use of common Malliavin calculus techniques such as the Clark-Ocone formula, to avoid that the error estimator uses the interpolated process Y Y Y in (1.7) rather than computable iterates Y j from (1.3). Here, we benefitted from similar ideas and concepts, which were used in [6] in the context of a priori weak error analysis of SPDEs of form (1.2) with β β β ≡ 0; see Remark 3.2 for further details. For a fixed φ ∈ C 3 (R L ), our first main result in this work then is the weak a posteriori error estimate (1.4) (see also Theorem 3.1), giving quantitative error bounds for iterates {Y j } j≥0 solving (1.3) on a given a mesh {t j } j≥0 covering [0, T ] with the help of computable (local) weak error estimators {G φ; τ j+1 , Y j } j≥0 .
In Sect. 4, we use the a posteriori error estimation (1.4) for iterates {Y j } j≥0 of (1.3) to automatically steer the computation of local mesh sizes τ j+1 via the adaptive Algorithm 4.1, yielding tuples (τ j+1 , Y j+1 ) j≥0 . Given j ≥ 0, the guiding criterion for admissibility of a new tuple (τ j+1 , Y j+1 ) is that the evaluation with the local error estimator yields G φ; τ j+1 , Y j ≤ Tol T , where the tolerance Tol > 0 is provided by the user. We generate such an admissible tuple by successively halving the previous time step, and thus generating a sequence {τ j+1, } ≥0 ⊂ R + with τ j+1, = τ j+1,0 2 and τ j+1,0 ≡ τ j , until admissibility of a tuple for some τ j+1, * j+1 is attained; see Fig. 2. This sequence of steps precedes a single potential step of coarsening; see Algorithm 4.1 for further details. As a result, we obtain an adaptive method where only Tol > 0 needs be set, and where admissible tuple (τ j+1 , Y j+1 ) j≥0 satisfy (1.4), where the right-hand side is now bounded by Tol > 0. The second main result in this paper then is Theorem 4.2, which ensures computation of each new time step τ j+1 in Algorithm 4.1 after no more than * j+1 = O log(Tol −1 ) many iterations (indexed by j; local determination), and at most J = O(Tol −1 ) many steps to reach T (global termination); its proof again rests on the stability bounds given in Lemma 2.6 and yields the existence of a lower bound of the step sizes generated via Algorithm 4.1; see also Fig. 2. We remark that this local construction of the new mesh size τ j+1 with the help of only Y j differs from the strategy in [23,27], where admissible meshes are obtained by iterative computation of global problems ('approximate Kolmogorov equation'), and where again the assumed boundedness of drift and diffusion is crucial to conclude optimality of attained meshes.
Section 5 then reports on computational studies for different SPDEs (1.2) after finite element discretization with the help of the adaptive Algorithm 4.1: we specify the corresponding a posteriori error estimator, and pinpoint those computable expressions {E E E } ≥1 involved in the error estimator in Example 1.1, which are mainly responsible for local mesh adjustments. For the different examples, including one which is convection dominated, the results evidence efficiency in comparison with uniform meshing, and accuracy of the weak adaptive Algorithm 4.1.
The paper is organized as follows: Sect. 2 collects the assumptions needed for the data A , f, {σ σ σ k } k of (1.1) and recalls relevant tools from Malliavin calculus; moreover, variation equations for (1.1) are recalled to verify the bounds (1.

Assumptions
Throughout this work, ( , F, {F t } t≥0 , P) is a given filtered probability space with natural filtration of the Wiener processes in (1.1). Below, we use positive constants , and λ A to specify dependence on data A , f, σ σ σ ≡ σ σ σ 1 , . . . , σ σ σ K in (1.1); none of these constants depend on L. For a sufficiently smooth g ∈ C(R L ; R n ), corresponding (matrix) operator norms are given as follows ( n, L ∈ N\{0}, x ∈ R L ): where · R n denotes the (Euclidean) vector norm of a R n -valued vector. If n = L, we write L ≡ L R L × · · · × R L ; R L . If n = 1, D ≡ D x denotes the gradient and D 2 ≡ D 2 x the Hessian matrix of g, and we also write L ≡ L R L × · · · × R L ; R . Moreover, where · R L×L denotes the spectral (matrix) norm.

(A1) (a)
The matrix A ∈ R L×L is invertible and positive definite, i.e., there exists a constant λ A > 0, s.t.
(A2) The maps σ σ σ k ∈ C 3 (R L ; R L ) for every k = 1, . . . , K , and there exist constants Throughout this work, we admit test functions φ ∈ C 3 (R L ) with globally bounded first, second and third derivatives.

Malliavin calculus
We briefly recall the Malliavin derivative, recall the chain rule for Malliavin derivatives and state the Clark-Ocone formula. For further details, we refer to [25].-We denote by C ∞ p (R L ) the space of all smooth functions g : R L → R, such that g and all of its partial derivatives have polynomial growth. Let P the set of R-valued random variables of the form We further define for any F ∈ P its R-valued Malliavin derivative process DF : (2.1) For any p ≥ 1, let D 1, p denote the closure of the class of smooth random variables with respect to the norm Next, we recall the chain rule for Malliavin derivatives; see [25, p. 28 Let ϕ : R L → R be a continuously differentiable function with bounded partial derivatives of order 1, and p ≥ 1 be fixed. Let further F F F = F 1 , . . . , F L be a random vector whose components belong to the space D 1, p . Then ϕ(F F F) ∈ D 1, p , and Finally, we recall the Clark-Ocone representation formula; see [25, p. 46, Prop. 1.3.14]. Lemma 2.1 Let F ∈ D 1,2 , and β be a one-dimensional Wiener process. Then

Variation equations for (1.1) and a priori bounds for
, whenever assumptions (A1)-(A2) hold; see e.g. [15, p. 366ff.]. The derivation in Sect. 3 requires explicit bounds for derivatives u, which are uniform in L, in particular. To this end, we use the variation equations corresponding to (1.1) and derive these results here. For a detailed verification of the upcoming results, we refer to [22].
For y ∈ R L fixed, we denote by X ≡ X 0,y the solution of (1.1). Let h ∈ R L . Following [3, p. 37ff.], we recall the first variation equation corresponding to (1.1), Using assumptions (A1)-(A2) leads to Applying Gronwall's inequality leads to the first result. (b) p = 1: This follows by using Jensen's inequality.
Next, following [3, p. 39ff.], we consider the second variation equation corresponding to (1.1), that is, for h, w ∈ R L , Proof The proof follows the steps outlined in the proof of Lemma 2.2: by Itô's formula, we may control 1 p E ζ ζ ζ h,w t p R L in terms of data, and h, w in (2.3); the assertion then follows with the help of generalized Young, and Gronwall inequalities.
Here, S 3 denotes the set of all permutations of a set of three elements. Since (A1)-(A2) apply, there exists a unique solution Moment bounds for the solution of (2.4) may be obtained as in Lemmata 2.2 and 2.3 ; see [22] for further details; in view of Lemma 2.5 below, it suffices to consider only second moment bounds in the following lemma.
Let t ∈ [0, T ] and x ∈ R L . In order to obtain global bounds for the first and higher derivatives of u in (1.5), we use the following identities, which can be found in e.g. [3, p. 94] in connection with Kolmogorov's forward equation, which is just a time reversal t → T − t and a change of the terminal condition into an initial condition in (1.5). and (2.7) In the following Lemma 2.5, based upon (2.5)-(2.7) and Lemmata 2.2, 2.3 and 2.4 , we derive global bounds for the first, second and third derivatives of the solution u of (1.5) in terms of derivatives of φ, which are independent of the dimension L.
We apply the Cauchy-Schwarz inequality to the identity (2.5) and use Lemma 2.2 to get Thus, taking h = D x u(t, x) immediately yields the assertion.

Stability bounds for iterates {Y j } j≥0 from (1.3)
We derive L-independent stability bounds for iterates We give the proof for p = 1, 2, and the proof for general p > 2 then follows inductively.
and use the binomial formula, as well as Young's inequality (δ 1 , δ 2 ≥ 1) to estimate (2.8) Note that the last term vanishes if E[·] is applied. By (A1)-(A2), the tower property for expectations, and the identity E | j+1 β k | 2 = τ j+1 , we further conclude that We set Summation over all iteration steps, and using (A3) then lead to Now, the discrete Gronwall inequality yields the assertion. (b) p = 2: Multiply (2.8) with Z Z Z j+1 2 R L and use the binomial formula to get the estimate We now add and substract Z Z Z j 2 R L in the two terms on the right-hand side which involve random increments, to then absorb part of it to the second term on the lefthand side. Forδ 1 ,δ 2 ,δ 3 ,δ 4 > 0, thanks to (A1)-(A2), taking expectations and using the tower property leads to Choosing δ 1 , δ 2 ≥ 1,δ 1 ,δ 2 ,δ 3 ,δ 4 ≥ 4 and using (A1) then leads to Now, similar arguments as in b) yield the assertion.

A posteriori weak error estimates for the Scheme (1.3)
In Theorem 3.1, we derive an a posteriori error estimate for iterates {Y j } J j=0 of scheme (1.3). It is shown in Theorem 3.5 that this error estimator converges with optimal order on uniform meshes, recovering a corresponding result in [6] on a priori weak error analysis for a corresponding time discretization of (1.2); the relevant tools to verify Theorem 3.5 are the (discrete) stability properties of (1.3) in Lemma 2.6.

A posteriori weak error estimation: derivation and properties
We bound the weak approximation error max form in Theorem 3.1. For this purpose, we employ the (data-dependent) estimates in Lemma 2.5, and therefore define [cf. also (1.10)] The following result estimates the weak error caused by {Y j } J j=0 from (1.3) on a mesh of local mesh sizes {τ j+1 } J −1 j=0 in terms of a computable a posteriori error  ≡ 0 (k = 1, . . . , K ), a different approach to derive a (residual-based) a posteriori estimate on a mesh is via duality methods [8], which exploit (strong stability properties of) the related adjoint equation; see also 2. below. Another variational approach here that avoids duality methods is [24], where an inherited '(discrete) energy dissipation' property of the implicit discretization of (1.1) is used to bound max 1≤ j≤J X t j − Y j R L for cases where the drift operator in (1.1) is the gradient of a convex functional. We also mention [29,Ch. 6], where (residual-based) a posteriori estimates are derived by variational methods for space-time discretizations of the more general (1.2) with k ≡ 0 (k = 1, . . . , K ), where the drift operator need not be the gradient of a convex functional.
2. 2. 2. For finite element based discretizations of (linear elliptic, parabolic) PDEs A(u) = f , residual-based a posteriori estimates are obtained in [8], where dual/adjoint problems are the relevant tool; their (global) stability properties may then be exploited to bound the error in terms of the residual ρ(u h ) = A(u h )− f of the computed solution u h , times a related stability constant. In later works, dual problems involve functionals φ, and its solution z is computed approximately to then enter as local weights ω(z h ) in the 'duality-based weighted residual' estimator of the form to sharpen computable error bounds; see the surveys [2,11].
The derivation of a posteriori error estimate (3.1) for iterates of (1.3) uses the (backward) Kolmogorov equation t n+1 ) -instead of an adjoint evolutionary problem on (0, t n+1 ) that is motivated from optimal control: the works [23,27] approximate derivatives of u to build local weights contained in {ρ j } J j=1 in (1.6), which is possible for small L; in this work, we use the global stability estimate (1.10) that leads to the a posteriori error estimator in The a posteriori weak error analysis now starts again with (1.8), but lacks Itô's formula in (1.9), and thus proceeds with the mean value theorem and Taylor's formula, The proof of Theorem 3.1 consists of several steps: Lemma 3.2 is based on the identity (1.8) in the introduction, where we represent the weak approximation error via (1.5). Lemma 3.3 then examines the first expectation on the right-hand side of (3.3), where only the drift term of (1.3) appears; and in a similar manner, Lemma 3.4 examines the second expectation on the right-hand side of (3.3), where only the diffusion term is involved. The proof of Theorem 3.1 then follows by combining these lemmata. Note that similar concepts for the investigation of the weak approximation error with the explicit Euler scheme have been proposed in e.g. [28], which here are adapted to the implicit scheme (1.3); see also [6], where weak a priori error estimates are obtained for a time-implicit discretization of SPDE (1.2) with β β β ≡ 0, and Remark 3.2.   The next lemma examines the first expectation appearing on the right-hand side of (1.9).

Lemma 3.3 Suppose the setting in Lemma 3.2. Then we have
In the proof of Lemma 3.3, we use Lemma 2.5 to (globally) bound the first and second derivative of u by C C C D (φ) and C C C D 2 (φ), respectively. Besides some standard arguments, we also use Malliavin calculus techniques to validate O(|τ j+1 | 2 ) on the right hand-side of (3.6).
Proof Using the fact that for any v ∈ R L it holds we obtain in a first calculation (3.7) We use this identity and Lemma 2.5 to bound the left-hand side of (3.6), We estimate the terms in (3.8) independently, starting with Step 1: (Estimation of III) (a) We apply Itô's formula with We add and substract Df(Y j ) and D 2 f(Y j ) as integrands in order to get closer to computable optimal terms. This step leads to the additional terms which will be estimated in (c) below. Thus, we obtain We apply the Cauchy-Schwarz inequality, Lemma 2.5 (i), and some standard calculations to obtain We estimate the terms  [1,25]), we apply the Clark-Ocone formula in (3.13) Now, inserting (3.12) into (3.11) and using the fact that the expectation of a stochastic integral w.r.t. the Wiener process is zero, we conclude (3.14) where in the last step we used a (generalized) Itô isometry argument; see e.g. [12,p. 135,Thm. 4.2.3]. An application of the tower property (law of total expectation) in (3.14), and (3.13) then lead to In the next step, we add and substract in the second argument Df(Y j ) as well, to then obtain Using Lemma 2.5 (ii), and Tr(Bvw ) ≤ B R L×L v R L w R L for any B ∈ R L×L , v, w ∈ R L , consequently lead to are the higher order terms in (3.5), which account for the difference between Y Y Y and {Y j } j≥0 . Since the treatment of K 1 We start with a standard calculation, that is for s ∈ [t j , t j+1 ] and r ∈ [t j , s] we have, considering the continuified process (1.7) and using some standard calculations Cauchy-Schwarz inequality and Lemma 2.5 (i), we get Using (3.16) consequently leads to In a similar way, we obtain (3.18) and Step 2: (Estimation of II) Similar arguments as used for III in Step 1 give the bound (3.20) Step 3: (Finishing the proof) We combine (3.17), (3.18), (3.19) and (3.15) with (3.10) and plug the resulting expression as well as (3.20) into (3.8), which proves the assertion.
We now bound the last sum on the right hand-side of (3.3).

(3.21)
Proof Similar as in (3.7), we start with a straightforward calculation. For k = 1, . . . , K , we havē This yields We set and plug (3.22) into the left-hand side of (3.21) to deduce Step 1: (Estimation of IV) Some standard calculations and Lemma 2.5 (ii) lead to (3.23) Step 2: (Estimation of V) Again, standard calculations and Lemma 2.5 (ii), lead to Step 3: (Estimation of T 1 T 1 T 1 ) (a) For k = 1, . . . , K , we add and substract σ σ σ k (Y j )σ σ σ k (Y Y Y s ) and use Tr(B) = Tr(B ) for B ∈ R L×L to obtain Plugging (3.25) into T 1 T 1 T 1 immediately leads to where with the help of Lemma 2.5 (ii), Assumption (A2) and (3.16) then lead to (c) Next, we consider the expression T 1,a T 1,a T 1,a . We apply Itô's formula with {σ (i) k ; 1 ≤ k ≤ K , 1 ≤ i ≤ L} to (1.7) and use standard arguments to get where T 1,a,1 T 1,a,1 T 1,a,1 : Almost the same arguments as we used for the treatment of (3.10) in Lemma 3.3, that is generating additional higher order terms to get closer to computable terms gives and T 1,a,2 T 1,a,2 T 1,a,2 ≤ (e) Hence, plugging (3.30), (3.31) and (3.32) into (3.28) yields (3.34) (g) It remains to examine the terms K 4 K 4 K 4 , K 5 K 5 K 5 and K 6 K 6 K 6 and to show that these terms are indeed higher order terms. Again, a similar treatment as we did for the higher order and Step 4: (Finishing the proof) We combine (3.35), (3.36) and (3.37) with (3.34), and then combine the resulting expression with (3.23) and (3.24), which proves the assertion.
Next, we show convergence with optimal weak order O(τ ) for the a posteriori error estimator in (3.1) on a mesh with maximum mesh size τ > 0 with the help of Lemma 2.6-and hence of the weak error of {Y j } J j=0 from (1.3) thanks to Theorem 3.1.
j=0 and maximum mesh size τ = max j τ j+1 . Then, there exists C C C ≡ C C C(φ) > 0 independent of L, such that Remark 3.2 1. 1. 1. The work [7] derives a weak a priori error estimate for the linear stochastic heat equation with additive noise, where the analysis exploits the representation formula for the mild solution, and a transformation of it to another process which solves a further SPDE without drift term and additive noise. In contrast, the weak a priori error analysis in [6] for SPDE (1.2) with β β β = 0 requires Malliavin calculus to efficiently estimate additionally appearing stochastic integral terms due to the nonlinearities F and -which are of similar type as M 1 M 1 M 1 in (3.9) and M 2 M 2 M 2 in (3.29) appearing here. The a posteriori error analysis to verify Theorem 3.1 with estimators {G φ; τ j+1 , Y j } j≥0 also exploits Malliavin calculus, and eventually enables Theorem 3.5. We remark that the tools from Malliavin calculus used here slightly differ from those in [6]: while [6] utilizes an integration by parts formula (cf. [6, Lemma 2.1]), we are making use of the Clark-Ocone formula (cf. Lemma 2.1). 2. 2. 2. The work [6] uses less regular initial data, in particular, and exploits the regularizing effect of the involved semigroup. In this work, we assume (A3) to verify Theorem 3.5; however, we believe a corresponding result to hold for less 'regular' initial data, by using a modification of Lemma 2.6 that involves temporal weights in the functional to handle less regular initial data, and mimic the regularizing effect in the present context of arguments.
(c): By means of (a) and (b), we can find a constantC C C ≥ 1 independent of L and j such that for all j ≥ 0 (3.38) Hence, plugging (3.38) into (3.1), using τ j+1 ≤ τ for every j ≥ 0, and setting C C C :=C C C · T yields the assertion.
In the next section, we base an adaptive method on the a posteriori error estimate (3.1) to automatically select local step sizes. For every j ≥ 0, we show that the adaptive method selects a new time step τ j+1 within finitely many steps, and that the algorithm reaches the terminal time T > 0 after finitely many steps as well (global termination).

Weak adaptive approximation via (1.1): algorithm and convergence
By Theorem 3.1, the weak error caused by scheme (1.3) on a given partition {t j } J j=0 ⊂ [0, T ] is controllable via the a posteriori error estimate (1.4). In this section, we use this result for an adaptive method that automatically steers local mesh size selection. For this purpose, we check if the criterion G φ; τ j+1 , Y j ≤ Tol T is met or not: in the first case, τ j+1 is admissible, bounding the new local error in such a way that the overall error will be bounded by Tol through Theorem 3.1; in the latter case, τ j+1 will be replaced by the refined mesh size τ j+1 := τ j+1 2 , and the criterion will be checked again. The following algorithm contains a refinement step (1) to generate {τ j+1, } ≥0 and-if G φ; τ j+1, , Y j is 'too small'-a final coarsening step (3) in the loop of generating the subsequent iterate from (1.3), after accepting the possible underestimation of G φ; τ j+1, , Y j . Proof (a) Termination for each j ≥ 0: Fix j ≥ 0, and recall (3.38) in the proof of Theorem 3.5. Since the constantC C C ≥ 1 appearing there does not depend on j, we generate a finite sequence {τ j+1, } * j+1 =0 with τ j+1, = τ j+1,0 2 , = 0, . . . , * j+1 , according to the refinement mechanism (1) of Algorithm 4.1, until either (2) or (3) is met. In view of (3.38), we find out that = log τ j+1,0C C CT Tol /log (2) is the smallest natural number such that Consequently, we have which yields a maximum of O log(Tol −1 ) (refinement) steps to accept the local .

(b) Global termination rate:
We show by induction that whereC C C ≥ 1 is the constant in (3.38). In particular, this means that T is reached within J = O Tol −1 many time steps. The base case follows by the choice of the initial mesh size τ 1 ≥ Tol T . Now suppose that we have generated Y j with step size τ j ≥ Tol . Since x < 1 + x, x ∈ R, we conclude by means of (4.2) (c) Estimate (4.1) immediately follows from (3.1) and part (2) of Algorithm 4.1.

Computational experiments
and which now holds with high probability. In order to obtain (5.1), we first add and substract E φ(Y j ) and use Theorem 3.1, which yields Replacing all arising expectations E E E (·), = 1, . . . , 15 in the representation of the error estimator G by their corresponding empirical means E E E (M) (·) and writing G (M) for the related (empirical) error estimator further leads to where ERR G (M), ERR φ (M) denote the arising statistical errors resulting from the approximation of the error estimator G and from the approximation of the expectation of the test function φ. Algorithm 4.1 controls the first expression on the right-hand side of (5.2). In order to control the remaining statistical errors, i.e., to ensure that ERR G (M) + ERR φ (M) ≤ Tol holds with high probability, and to conclude (5.1), one can (asymptotically) determine a number M ≡ M(Tol) ∈ N\{0} of Monte-Carlo samples by means of concentration inequalities, the central limit theorem or other (non-)asymptotic controls; we refer to [12] for more details in this direction-In the computational studies reported below, we mostly chose M = 10 4 for which the Monte-Carlo simulations performed stably.
For different noise parameters K ∈ {0, 1, 3, 5, 6, 8, 10}, Fig. 4 compares the total amount of time steps needed for adaptive and uniform meshes to perform equally well, indicating that: the larger K the more improved performance of the adaptive Algorithm 4.1 compared to scheme (1.3) with uniform time steps may be expected.

A convection dominated (stochastic) problem
Example 5.1 Consider (1.2) on D D D = (0, 1), T > 0, with ε > 0, β β β ∈ R, F ≡ 0 and homogeneous Dirichlet boundary conditions. After a finite element discretization, using 'mass lumping', and h = 1 L+1 for some L ∈ N\{0}, we obtain where σ σ σ k (X h t ) as in (5. We study three different cases given in Table 1, where we, among other things, discuss the influence of varying ε on the total amount of adaptive versus uniform time steps; see √ h x R L (resp. φ(x) = x ∞ := max =1,...,L |x |) to approximate the L 2 −norm (resp. the L ∞ −norm) of (the finite element approximation of) {X t , t ∈ [0, T ]}, from SPDE (1.2). Although both figures illustrate similar behaviours of time step size, error and a posteriori error plots, the amount of time steps needed to stay below the given error threshold Tol in Fig. 7 is larger compared to Fig. 6, which is due to the different scalings of the considered norms.
Different choices of β β β and ε affect the amount of total steps generated via Algorithm 4.1. A larger size of β β β increases the convection effect and leads to more time steps; see Figs. 10 and 6 . In turn, larger values of ε reduce the transport, which requires fewer time steps, see Fig. 9. For different parameters ε ∈ {h, 2h, 5h, 15h, 30h, 1}, Fig. 5 compares the total amount of time steps needed for adaptive and uniform meshes to perform equally well, indicating that the smaller ε is, the more savings are obtained via the adaptive Algorithm 4.1, if compared to scheme (1.3) with uniform time steps.
Funding Open Access funding enabled and organized by Projekt DEAL.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.