Maximal inequalities for stochastic convolutions and pathwise uniform convergence of time discretisation schemes

We prove a new Burkholder-Rosenthal type inequality for discrete-time processes taking values in a 2-smooth Banach space. As a first application we prove that if $(S(t,s))_{0\leq s\leq T}$ is a $C_0$-evolution family of contractions on a $2$-smooth Banach space $X$ and $(W_t)_{t\in [0,T]}$ is a cylindrical Brownian motion on a probability space $(\Omega,P)$, then for every $0<p<\infty$ there exists a constant $C_{p,X}$ such that for all progressively measurable processes $g: [0,T]\times \Omega\to X$ the process $(\int_0^t S(t,s)g_sdW_s)_{t\in [0,T]}$ has a continuous modification and $$E\sup_{t\in [0,T]}\Big\| \int_0^t S(t,s)g_sdW_s \Big\|^p\leq C_{p,X}^p \mathbb{E} \Bigl(\int_0^T \| g_t\|^2_{\gamma(H,X)}dt\Bigr)^{p/2}.$$ Moreover, for $2\leq p<\infty$ one may take $C_{p,X} = 10 D \sqrt{p},$ where $D$ is the constant in the definition of $2$-smoothness for $X$. Our result improves and unifies several existing maximal estimates and is even new in case $X$ is a Hilbert space. Similar results are obtained if the driving martingale $g_tdW_t$ is replaced by more general $X$-valued martingales $dM_t$. Moreover, our methods allow for random evolution systems, a setting which appears to be completely new as far as maximal inequalities are concerned. As a second application, for a large class of time discretisation schemes we obtain stability and pathwise uniform convergence of time discretisation schemes for solutions of linear SPDEs $$ du_t = A(t)u_tdt + g_tdW_t, \quad u_0 = 0,$$ Under spatial smoothness assumptions on the inhomogeneity $g$, contractivity is not needed and explicit decay rates are obtained. In the parabolic setting this sharpens several know estimates in the literature; beyond the parabolic setting this seems to provide the first systematic approach to pathwise uniform convergence to time discretisation schemes.

Abstract. We prove a new Burkholder-Rosenthal type inequality for discretetime processes taking values in a 2-smooth Banach space. As a first application we prove that if (S(t, s)) 0 s t T is a C 0 -evolution family of contractions on a 2-smooth Banach space X and (Wt) t∈[0,T ] is a cylindrical Brownian motion on a probability space (Ω, P) adapted to some given filtration, then for every 0 < p < ∞ there exists a constant C p,X such that for all progressively measurable processes g : [0, T ] × Ω → X the process with that of Burkholder's inequality and is therefore optimal as p → ∞.
Our result improves and unifies several existing maximal estimates and is even new in case X is a Hilbert space. Similar results are obtained if the driving martingale gt dWt is replaced by more general X-valued martingales dMt. Moreover, our methods allow for random evolution systems, a setting which appears to be completely new as far as maximal inequalities are concerned.
As a second application, for a large class of time discretisation schemes (including splitting, implicit Euler, Crank-Nicholson, and other rational schemes) we obtain stability and pathwise uniform convergence of time discretisation schemes for solutions of linear SPDEs dut = A(t)ut dt + gt dWt, u 0 = 0, where the family (A(t)) t∈[0,T ] is assumed to generate a C 0 -evolution family (S(t, s)) 0 s t T of contractions on a 2-smooth Banach spaces X. Under spatial smoothness assumptions on the inhomogeneity g, contractivity is not needed and explicit decay rates are obtained. In the parabolic setting this sharpens several know estimates in the literature; beyond the parabolic setting this seems to provide the first systematic approach to pathwise uniform convergence to time discretisation schemes.

Introduction
In this paper we study maximal inequalities for the mild solutions of timedependent stochastic evolution equations of the form Here, (A(t)) t∈[0,T ] is a family of closed operators acting in a Banach space X generating a C 0 -evolution family (S(t, s)) 0 s t T , (W t ) t∈[0,T ] is a Brownian motion defined on a probability space (Ω, F , P), adapted to some give filtration (F t ) t∈[0,T ] , and (g t ) t∈[0,T ] is a progressively measurable stochastic process with values in X. Under these assumptions the mild solution is given, at least formally, by the Xvalued stochastic convolution-type integral An important special case of (1.1) is the time-independent case where A(t) ≡ A generates a C 0 -semigroup (S(t)) t 0 on X and S(t, s) = S(t − s). More generally we will consider stochastic convolutions driven by cylindrical Brownian motions and assume that g is operator-valued; this extension is mostly routine and for the ease of presentation will not be considered in this introduction.
In order to give a rigorous meaning to the stochastic integral in (1.2) one needs to impose suitable measurability and integrability assumptions on g and geometrical properties on X, such as 2-smoothness [BD90,Brz95,Brz97,Det89,Det91,Nei78,Ond04,Ond05] or the UMD property [NVW07,NVW08]. The UMD theory is in some sense the definitive theory, in that it features a two-sided Burkholder inequality and completely natural extensions of the martingale representation theorem [NVW07,NVW08] and the Clark-Ocone theorem [MvN08]; from the point of view of applications to SPDE it provides stochastic maximal L p -regularity for parabolic problems [NVW12a,NVW12b,NVW15,PV19] which in turn can be used to study quasi-and semi-linear PDEs [AV20]. The 2-smooth theory only allows for limited versions of these results, but it is easier in its basic constructions and adequate for many other purposes, and will provide the setting for this paper.
Instrumental in proving pathwise continuity of mild solutions to (1.1) is the availability of suitable estimates for the maximal function u ⋆ : Ω → [0, ∞), where (u t ) t∈[0,T ] is the process defined by (1.2); norms are taken in X pointwise on Ω. The first such estimate was obtained by Kotelenez [Kot83,Kot84] who showed that if (A(t)) t∈[0,T ] generates a contractive evolution family (S(t, s)) 0 s t T on a Hilbert space X, then the process (u t ) t∈[0,T ] defined by (1.2) has a continuous modification which satisfies the maximal inequality where C is some absolute constant. The extension of (1.3) to 2-smooth Banach spaces and general exponents 0 < p < ∞ has been investigated by many authors [BP00, HS01, HS08, Ich86, NZ11, Tub84] who all limited themselves to the special case of contraction semigroups. This development is surveyed in [NV20], where also some extensions to evolution families are discussed. The more general case of stochastic convolutions driven by Lévy processes has been studied in the 2-smooth setting in [ZBH17,ZBL19].
For Brownian motion as the driving process, the best result available to date is due to Zhu and the first author in [NZ11], where it was shown that if (S(t)) t 0 is a C 0 -semigroup of contractions on a 2-smooth Banach space X and (g t ) t∈[0,T ] is a progressively measurable process with values in X, then the process (u t ) t∈[0,T ] defined by the stochastic convolution where C p,X is a constant depending only on p and X. In certain applications it is important to have explicit information on the constant in the asymptotic regime p → ∞. In the special case S(t) ≡ I the estimate (1.4) reduces to the Burkholder inequality for 2-smooth Banach spaces, for which the asymptotic dependence of C p,X is known to be of order O( √ p) as p → ∞ [Sei10]. For Hilbert spaces X and C 0 -semigroup of contractions, (1.4) is known to hold to order O( √ p) as p → ∞ [HS01,HS08]. In that setting the Sz.-Nagy dilation theorem can be used to reduce matters to the Burkholder inequality. The order O( √ p) can be used to derive exponential estimates which in turn can be used to study large deviations (see [Cho92] and [Pes94]). Inspecting the proof of (1.4) in [NZ11] in the 2-smooth case, it is seen that the asymptotic p-dependence of the constant in that paper is non-optimal. The aim of the present paper is to simultaneously improve the results cited above in two directions: • to extend (1.4) to arbitrary C 0 -evolution families of contractions on 2smooth Banach spaces X (not even assuming the existence of a generating family (A(t)) t∈[0,T ] ); • to show that the constant C p,X in the resulting maximal inequality is of order O( √ p) as p → ∞.
The precise statement of our main result, which corresponds to the special case of Theorem 4.1 for Brownian motion, is as follows.
Theorem 1.1. Let (S(t, s)) 0 s t T be a C 0 -evolution family of contractions on a 2-smooth Banach space X. Let (W t ) t∈[0,T ] be an adapted Brownian motion on a probability space (Ω, P), and let (g t ) t∈[0,T ] be a progressively measurable process with values in X. Then the X-valued process (u t ) t∈[0,T ] defined by has a continuous modification which satisfies where the constant C p,X only depends on p and the constant D in the definition of 2-smoothness for X. For 2 p < ∞ the inequality holds with Theorem 4.1 considers the more general situation of a cylindrical Brownian motion with covariance given by the inner product of a Hilbert space H and progressively measurable processes g with values in the space γ(H, X) of γ-radonifying operators from H to X (the definition of this space is recalled in Section 2).
For evolution families, Theorem 1.1 is new even for Hilbert spaces X. In the 2-smooth case it completely settles the asymptotic optimality problem; this is new even in the semigroup case. The proof of the theorem is very different from [HS01,HS08] and [NZ11] and combines ideas of Kotelenez [Kot83] and Seidler [Sei10]. Seidler's proof of the O( √ p) bound for the constant in Burkholder inequality in 2-smooth Banach spaces is based on a clever modification of the Burkholder-Rosenthal inequality due to Pinelis [Pin94]. We further extend Pinelis's inequality by accommodating additional predictable contraction operators in it which enable us to merge the inequality with a splitting technique already used by Kotelenez. Theorems 1.1 and 4.1 are also applicable in the setting where the evolution family S itself is not contractive, but admits a dilation to a contractive evolution family on a 2-smooth Banach space. In the semigroup case, the boundedness of the H ∞calculus of the generator A of angle < 1 2 π implies that the semigroup has a dilation to an isometric C 0 -group (see [FW06,HNVWxxa,NV20,Sei10,VW11]). In this case, however, there is no need to use Theorem 1.1 since one can apply the simpler method of [HS01,HS08].
Our method can be used quite naturally to prove the stability (uniformly in time) of certain numerical schemes associated with (1.1). This is pursued in Section 5, where we prove that if (S(t)) t 0 is a C 0 -semigroup of contractions on a (2, D)smooth Banach space X with generator A, and u is a continuous modification of the process ( t 0 S(t − s)g s dW s ) t∈[0,T ] , then for any contractive approximation scheme R which approximates (S(t)) t 0 to some order α ∈ (0, 1] on the domain D(A) one has E sup j=0,...,n The crucial observation underlying (1.5) is that the sequence (u (n) j ) n j=0 defined by (1.6) is precisely of the right format to apply our extension of Pinelis's inequality. For C 0 -semigroups which are not necessarily contractive and functions g ∈ L p (Ω; L 2 (0, T ; D(A)), we show that convergence holds with the following explicit rate: where C is a constant independent of n and g. This estimate is somewhat simpler, in that it directly uses Seidler's version of the Burkholder inequality of Proposition 2.6 in combination with a simple trick, in Proposition 2.7, involving switching back and forth from ℓ ∞ n (X) to ℓ q(n) n (X) for a clever choice of q(n). This can be done at the expense of a constant n 1/q(n) , exploiting the fact proven in Proposition 2.2 that ℓ q(n) (X) is 2-smooth for 2 q < ∞ with constant of order q(n). This appears to be a new technique whose potential deserves further investigation.
Examples of numerical schemes to which our abstract results can be applied include the splitting method (with R(t) = S(t) with α = 1), the implicit Euler method (with R(t) = (I − tA) −1 and α = 1/2), and the Crank-Nicholson method (with R(tA) = (2 + tA)(2 − tA) −1 and α = 2/3). Moreover, if g takes values in suitable intermediate spaces between X and D(A m ) with m 1, appropriate rates of convergence can be obtained for each of these methods.
We expect that the new results in the simple linear setting will provide new insights for approximation of nonlinear SPDEs also by adapted time schemes and plan to address this in future work.
To illustrate the main result we consider the stochastic heat equation with the implicit Euler scheme (cf. Example 5.15). For simplicity, here we state the result in terms of Sobolev spaces. In Example 5.15, the use of Bessel potential spaces allows us to take the smoothness exponent m fractional and also negative. Further examples can be found in Section 5.3.
Example 1.2 (Heat equation, implicit Euler scheme). Consider the inhomogeneous stochastic heat equation on R d : Here, W = (W k ) k 1 is a sequence of independent standard Brownian motions. We further assume that each g k : is finite. For n = 1, 2, . . . set t (n) j := jT /n and consider the partition π (n) := {t (n) j : j = 0, . . . , n}. Let (S(t)) t 0 denote the heat semigroup on L q (R d ) and set This stochastic integral is well defined as an L q (R d )-valued Itô integral by Proposition 2.6 and (2.8).
Define the discrete approximation by u (n) 0 := 0, and Let W j,q (R d ) be the Sobolev space of smoothness j and integrability q. Then the following results hold: This follows from Theorems 5.13 and 5.14.
In the final Section 6 we extend some of results to stochastic convolutions involving random evolution families, which arise naturally if the operator family (A(t)) t∈[0,T ] in (1.1) depends on a random parameter in an adapted way. That this is possible at all in the abstract setting of evolution equations in infinite dimensions is quite remarkable. It requires replacing the Itô integral with the forward integral of [RV93] in order to avoid adaptedness problems. Stochastic convolution in the forward sense is known to still give the weak solution to (1.1) (see [LN98,Proposition 5.3], [PV14, Theorem 4.9] and Theorem 6.6 below). In the parabolic setting, space-time regularity results have been derived by Pronk and the second-named author in [PV14] using so-called pathwise mild solutions (see Proposition 6.2) and a simple integration by parts trick. Pathwise mild solutions have been recently used to study quasilinear PDEs in [FS15,KN20,MS17] and random attractors in [KNS21]. The new maximal estimates proved in our current paper are expected to have implications for these results as well.
For adapted families (A(t)) t∈[0,T ] , maximal inequalities can be alternatively derived via Itô's formula (see [NV20] and references therein). In contrast to the results obtained here, however, this does not lead to constants of order O( √ p) as p → ∞.
In the setting of monotone (possible nonlinear) operators and p = 2, the Itô formula argument is applicable in a wider setting (see [LR15]). Some extensions to p > 2 have been obtained recently in [NŠ19].

Preliminaries
Throughout this paper we work over the real scalar field. Unless otherwise stated, random variables and stochastic processes are defined on a probability space (Ω, F , P) which we consider to be fixed throughout. On this probability space we fix a filtration (F t ) t∈[0,T ] once and for all. Standard notions from the theory of stochastic processes always refer to this filtration. Whenever we consider stochastic integrals with respect to a (cylindrical) Brownian motion or a more general type of driving process, it is always assumed that it is adapted with respect to this filtration. The conditional expectation of a random variable ξ with respect to a sub-σ-algebra G ⊆ F will be denoted by E G (ξ). The progressive σ-algebra associated with (F t ) t∈[0,T ] , i.e., the σ-algebra generated by sets of the form B × A with B ∈ B([0, t]) and A ∈ F t , where t ranges over [0, T ], is denoted by P. We will use the subscript P to denote the closed subspace of all progressively measurable process in a given space of processes.
When X is a Banach space, under an X-valued random variable we understand a strongly measurable function (i.e., a function which is the pointwise limit of a sequence of simple functions) from Ω into X; for details the reader is referred to [HNVW16,HNVW17]. For the purposes of this article, an X-valued process is a family of X-valued random variables indexed by [0, T ]. Two processes (g t ) t∈[0,T ] and (h t ) t∈[0,T ] are said to be modifications of each other if for al t ∈ [0, T ] we have g t = h t almost surely (with exceptional set that may depend on t). A process (g t ) t∈[0,T ] with values in X is said to be progressively measurable if g is strongly measurable as an X-valued function on the measurable space ([0, T ] × Ω, P). It is a deep result in the theory of stochastic processes that every adapted and strongly measurable X-valued stochastic process admits a progressively measurable modification; an elementary proof is offered in [OS13].
A Banach space is called p-smooth if it is (p, D)-smooth for some D 1.
By a fundamental result due to Pisier [Pis75] every p-smooth Banach space has martingale type p, and conversely every Banach space with martingale type p admits an equivalent p-smooth norm. Moreover, if X has martingale type p with constant C, an equivalent (p, C)-smooth norm can be found; if X is (p, D)-smooth, then X has martingale type p with constant at most 2C (and the constant 2 can be omitted for p = 2, see Remark 2.5). Detailed proofs of these facts can be found in [Pis16,Wen05,Woy19].
The class of 2-smooth Banach space is of particular interest from the point of view of stochastic analysis. It includes all Hilbert spaces (with D = 1, by the parallelogram identity) and the spaces L p (µ) with 2 p < ∞ (with D = √ p − 1, see [Pin94, Proposition 2.1] and Proposition 2.2 below). The reason for being interested in 2-smooth spaces rather than spaces with martingale type 2 is as follows. Martingale type 2 is preserved under passing to equivalent norms, but this is not the case for 2-smoothness. In the results to follow, semigroups and evolution families of contractions (i.e., operators of norm 1) play a distinguished role. Since contractivity need not be preserved under passing to equivalent norms, such a distinguished role cannot be expected in the setting of martingale type 2 spaces. In this connection the following interesting question seems to be an open: if X has martingale type 2 and supports a C 0 -semigroup (or C 0 -evolution family), does there exist an equivalent (2, D)-smooth norm with respect to which the semigroup (or evolution family) is contractive?
In what follows we recall some useful properties of 2-smooth Banach spaces that will be needed in this paper.
If X is (2, D)-smooth, then by [NZ11, Lemma 2.1] and its proof the function is Fréchet differentiable on X and its derivative is Lipschitz continuous. Conversely, if ρ is twice Fréchet differentiable and ρ ′′ (x)(y, y) 2D 2 y 2 at every x ∈ X, then X is (2, D)-smooth (see [Pin94] for a more general version of this converse). Unlike in finite dimensions, Lipschitz continuity does not imply almost everywhere differentiability (the latter even being meaningless in the absence of a reference measure). One way to get around this is to consider the functions ρ x,y (t) := ρ(x + ty) = x + ty 2 .
The following lemma is implicit in [Pin94]. For the reader's convenience we include a proof.
(2)⇒(1): For all x, y ∈ X we have As an application we prove the following vector-valued analogue of [Pin94, Proposition 2.1]. It will be needed in the proof of Proposition 2.7, which in turn is applied in Section 5.
Proposition 2.2. Let (S, A , µ) be a measure space and X be a (2, D)-smooth Banach space. Then for all 2 p < ∞ the space L p (S; X) is (2, p − 2 + D 2 )smooth.
Notice that D 1 implies p − 2 + D 2 D 2 (p − 1), so in particular L p (S; X) is Proof. The proof is based on the equivalence in Proposition 2.1. For Banach spaces X with the property that · 2 is twice continuously Fréchet differentiable the proof can be somewhat simplified. Throughout the proof we use · and · p to denote the norms of X and L p (S; X), respectively. Thus if f ∈ L p (S; X), then f is the function s → f (s) in L p (S).
As in [DGZ93, Theorem V.1.1] one checks that the functions where the duality ·, · between X and X * is applied pointwise on S. For q ∈ R let w q;x,y (t) := x + ty q , x, y ∈ X; The Fréchet differentiability of ψ p and Ψ p implies the differentiability of w q;x,y and W q;f,g (except possibly at t = 0 when x = 0 and y = 0, respectively f = 0 and g = 0). Denoting derivatives with respect to t by ∂ t , for q = 0 the chain rule gives where Ψ 2 (g) := g 2 p . Also, Combining these identities with (2.1), we obtain Since X is 2-smooth and Lipschitz functions are almost everywhere differentiable, for all x, y ∈ X the function w 2;x,y is twice differentiable almost everywhere by Proposition 2.1. The exceptional set may depend on the pair (x, y), however, so in order to be able to differentiate the right-hand side of (2.2) under the integral we will consider simple functions f, g ∈ L p (S; X) from this point onward. Then the right-hand side of (2.2) is differentiable for almost all t ∈ R and Since f and g are simple, the 2-smoothness of X and Proposition 2.1 imply that t → ∂ t W 2;f,g is Lipschitz continuous. Therefore it follows from (2.3) that t → ∂ t W 2;f,g is Lipschitz continuous with Lipschitz constant 2(D 2 + p − 2) g 2 p . The proof of the implication (2)⇒(1) of Proposition 2.1 then gives the inequality f + g 2 + f − g 2 2 f p + 2(D 2 + p − 2) g 2 for simple f, g ∈ L p (S; X). The inequality for general f, g ∈ L p (S; X) follows by approximation.
Remark 2.5. Applying the first part of this lemma iteratively to Rademacher sums, we obtain the folklore result that (2, D)-smoothness implies martingale type 2 with constant D.

2.2.
Stochastic integration in 2-smooth Banach spaces. Let H a Hilbert space. An H -isonormal process on Ω is a mapping W : H → L 2 (Ω) with the following two properties: It is easy to see that every H -isonormal process is linear and that for all h 1 , . . . , h N ∈ H the R N -valued random variable (W h 1 , . . . , W h N ) is jointly Gaussian. For more details the reader is referred to [HNVWxxb,Nua06].
If H is another Hilbert space, an H-cylindrical Brownian motion indexed by [0, T ] is an isonormal process W : L 2 (0, T ; H) → L 2 (Ω). Following common practice we write In what follows we always assume that H-cylindrical Brownian motions are adapted to (F t ) t∈[0,T ] . The space of finite rank operators from a Hilbert space H into a Banach space X is denoted by H ⊗ X. Every finite rank operator T ∈ H ⊗ X can be represented in the form T = N n=1 h n ⊗ x n with (h n ) N n=1 orthonormal in H and (x n ) N n=1 a sequence in X. We then define where (γ n ) N n=1 is a sequence of independent standard Gaussian random variables. It is an easy consequence of the preservation of joint Gaussianity under orthogonal transformations that the norm · γ(H,X) is well defined. The completion of H ⊗ X with respect to this norm is denoted by γ(H, X). The natural inclusion mapping from H ⊗ X into L (H, X) extends to a contractive inclusion mapping γ(H, X) ⊆ L (H, X). A linear operator T ∈ L (H, X) is said to be γ-radonifying if it belongs to γ(H, X). For 1 p < ∞ the Kahane-Khintchine inequalities guarantee that replacing L 2 -norms by L p -norms in (2.7) gives an equivalent norm on γ(H, X). The space γ(H, X), when endowed with this equivalent norm, will be denoted by γ p (H, X).
For Hilbert spaces K we have is the space of Hilbert-Schmidt operators from H to K. For 1 p < ∞ and any Banach space X the identity mapping on H ⊗ L p (µ) extends to an isometric isomorphism of Banach spaces ) with the space L p (µ; L 2 (ν)) of 'square functions' using terminology from harmonic analysis. For more details the reader is referred to [HNVW17, Chapter 9].
A stochastic process Φ : [0, T ] × Ω → L (H, X) is called an adapted finite rank step process if there exist 0 = s 0 < s 1 < . . . < s n = T , random variables ξ ij ∈ L ∞ (Ω, F sj−1 )⊗X (the subspace of L ∞ (Ω; X) of strongly F sj−1 -measurable random variables taking values in a finite-dimensional subspace of X) for i = 1, . . . , m and j = 1, . . . , n, and an orthonormal system h 1 , . . . , h m in H such that For such processes the stochastic integral with respect to the H-cylindrical Brownian motion W is defined by Since t → W t h, being a Brownian motion, has a continuous modification, it follows that t → t 0 Φ s dW s has a continuous modification. Such modifications will always be used in the sequel. It was shown by Neidhardt in his PhD thesis [Nei78] (see also [Det89], [NVW15]) that if Φ is an adapted finite rank step process, then L 2 (Ω;L 2 (0,T ;γ(H,X))) . (2.10) By (2.10), standard localisation arguments, and Doob's inequality, the stochastic integral can be extended to arbitrary progressively measurable processes Φ : [0, T ]× Ω → γ(H, X) for which the L 2 (0, T ; γ(H, X))-norm is finite almost surely and the resulting stochastic integral process ( t 0 Φ s dW s ) t∈[0,T ] has a continuous modification. At this juncture it is useful to observe that a process Φ : [0, T ] × Ω → γ(H, X) is progressively measurable (as a process with values in the Banach space γ(H, X)) if and only if Φh : [0, T ] × Ω → X is progressively measurable (as a process with values in X) for all h ∈ H; this follows from [HNVW17, Example 9.1.16].
The following version of the classical Burkholder inequality is the result of contributions of many authors [BD90,Brz97,Brz03,Det89,Det91,Ond04].
Proposition 2.6. Let X be a (2, D)-smooth Banach space, let W be an adapted H-cylindrical Brownian motion on Ω, and let 0 < p < ∞. For all adapted finite rank step process Φ : where C p,D is a constant depending only on p and D.
By using Pinelis's version of the Burkholder-Rosenthal inequalities [Pin94], Seidler [Sei10] has shown that the constant C p,D has the same asymptotic behaviour for p → ∞ as in the scalar-valued setting, i.e., As a special case of our main result we will recover Seidler's result, with C p,D = 10D √ p if 2 p < ∞, by setting S(t, s) ≡ I in Theorem 4.1. As a consequence of Proposition 2.6 we obtain the following result, which will be useful in the error analysis of numerical schemes for SPDEs in Section 5.
Proof. The method of proof is inspired by [DGVW10]. The idea is to view the sequence Φ = (Φ (k) ) n k=1 as an ℓ q n (X)-valued process for a clever choice of q = q(n) ∈ [2, ∞).
To prove (2.12) we argue in the same way, but this time we use that for a sequence Γ : , applying the Kahane-Khintchine inequalities (see [HNVW17, Theorem 6.2.6]) in the last step. Now (2.12) follows by taking q = log n.
Remark 2.8. The same method of proof can be used to show that if X is (2, D)smooth, then ℓ ∞ n (X) has martingale type 2 with constant D 2 − 2 + 2 log n if n 3.

Extending Pinelis's Burkholder-Rosenthal inequality
On the probability space (Ω, F , P) we consider a finite filtration (F j ) k j=0 and denote by E j := E Fj the conditional expectation with respect to F j . When (f j ) k j=0 is an X-valued martingale with respect to (F j ) k j=0 , we denote by (df j ) k j=1 its difference sequence, i.e., df j := f j − f j−1 . We further define the non-negative random variables f ⋆ j (for 0 j k) and df ⋆ j and s j (f ) (for 1 j k) by If G is a sub-σ-algebra of F , we call the X-valued random variables ξ and η conditionally equi-distributed given G if for all Borel sets B ⊆ X we have As in [HNVW16, Lemma 4.4.5] one sees that this is equivalent to the requirement that An adapted X-valued sequence (ξ j ) k j=1 is called conditionally symmetric given (F j ) k j=0 if for all Borel sets B ⊆ X and 1 j k the random variables ξ j and −ξ j are conditionally equi-distributed given F j−1 . Taking f (x) = 1 { x r} x in (3.1), it follows that for conditionally symmetric sequences we have x is strongly measurable for all x ∈ X, and a random contraction on X is a random operator on X whose range consists of contractions.
The main result of this section is the following extension of Pinelis's version of the Rosenthal-Burkholder inequality [Pin94]. Recently, other extensions of some of Pinelis's estimates for p-smooth Banach spaces have been obtained in [Luo21].
Theorem 3.1. Let X be a (2, D)-smooth Banach space. Suppose that (f j ) k j=0 is an adapted sequence of X-valued random variables, (g j ) k j=0 is an X-valued martingale, (V j ) k j=1 is a sequence of random contractions on X which is strongly predictable (i.e., each V j x is strongly F j−1 measurable for all x ∈ X), and assume that we have f 0 = g 0 = 0 and Then for all 2 p < ∞ we have If, moreover, (g j ) k j=0 has conditionally symmetric increments, then Here and in the rest of the paper, · p is the norm of L p (Ω). The proof of Theorem 3.1 closely follows that of [Pin94, Theorem 4.1] (which, up to the value of the constants, corresponds to taking V j = I and g j = f j ). We point out that even in the case p = 2, Theorem 3.1 is not obvious because the additional predictable sequence (V j ) k j=1 destroys the martingale structure of f . The proof in [Pin94] is written up rather concisely and therefore we shall present the proof of Theorem 3.1 in full detail. At the same time this provides the opportunity to give more precise information on the constants.
We need some auxiliary results, the first of which is a classical 'good λ' inequality (see [Bur73,Lemma 7.1]).
Lemma 3.2. Suppose that g and h are non-negative random variables and suppose that β > 1, δ > 0, and ε > 0 are such that for all λ > 0 we have If 1 p < ∞ and β p ε < 1, then The next lemma is a minor extension of [Pin94, Theorem 3.4].
Lemma 3.3. Suppose that (g j ) k j=0 is a martingale with values in a (2, D)-smooth Banach space X with g 0 = 0 and let (h j ) k−1 j=0 be an adapted sequence of random variables with values in X. Set and assume that h j f j almost surely for all 0 j k − 1. Suppose further that dg ⋆ ∞ a and s(g) ∞ b/D for some a > 0 and b > 0. Then for all r > 0 we have

Proof.
We begin by noting that the almost sure conditions f 0 = 0, h j−1 f j−1 , f j := dg j + h j−1 , and dg j a imply that the random variables h j−1 and f j , j = 1, . . . , k, are essentially bounded and h 0 = 0 almost surely.
Fix λ > 0 and 1 j k. By Lemma 2.4, Note that the random variables e j are non-negative. This means that the sequence (G j ) k j=0 defined by is a positive supermartingale. Fix r > 0 and set τ := min{1 j k : f j r} on the set {f ⋆ r} = {max 1 j k f j r} and τ := ∞ on its complement. By the optional sampling theorem, the sequence (G τ ∧j ) k j=0 is a positive supermartingale. It follows that E1 {τ k} G τ EG τ ∧k EG 0 = 1. Therefore, by the inequality cosh u > 1 2 e u and Chebyshev's inequality, the last inequality being elementary. The function defined by ψ(0) := 1 2 and ψ(u) := (e u − 1 − u)/u 2 for u = 0 is increasing, and therefore for all λ > 0 we have Combining this with the definition of the random variables e j and the assumption s(g) ∞ b/D, we obtain the pointwise inequalities Taking the supremum norm and substituting the result into above tail estimate for f ⋆ we arrive at . Up to this point the choice of λ > 0 was arbitrary. Optimising the choice of λ > 0 leads to the estimate a 2 ln 1 + ra b 2 which, by elementary estimates, implies the inequality in the statement of the lemma.
The next lemma gives a sufficient condition in order that Lemma 3.2 can be applied and extends [Pin94,Lemma 4.2]. Terminology is as in Theorem 3.1.
j=0 is a martingale with values in X with g 0 = 0 such that each dg j is F j−1 -conditionally symmetric, the sequence of random operators (V j ) k j=1 on X is strongly predictable and contractive. Let (f j ) k j=0 be the sequence of random variables defined by Then for all λ, δ 1 , δ 2 > 0 and β > 1 + δ 2 we have Proof. Fix λ, δ 1 , δ 2 > 0 and β > 1 + δ 2 . Setting g 0 := 0 and where the stopping times µ, ν, and τ are defined by we set µ := ∞, ν := ∞, and τ := ∞ if the respective sets over which the infima are taken are empty. Note that the sequence (h j ) k j=0 is adapted. Notice that h j = 0 on the set {j µ}; in particular h µ = 0.
On the set {w λ} we have dg ⋆ δ 2 λ and in particular dg i δ 2 λ and therefore dg i = dg i for all i = 0, . . . , k, so f j = f j for all j = 0 . . . k. It follows that It also follows that s(g) On this set we also have The second identity follows from h µ = 0 and induction pointwise on Ω, noting that if µ n < n + 1 ν, then where we used the definitions of h and f , the linearity of V n+1 , and the induction hypothesis. Therefore, on the set {f ⋆ > βλ, w λ}, we obtain We have shown that Let 0 n k be such that P(Ω n ) > 0, with Ω n := {µ = n}. We claim that (a) the random variables 1 {µ<j τ ∧ν} dg j form a martingale difference sequence on the probability space (Ω n , F | Ωn , P n ), where F | Ωn := {F ∩ Ω n : F ∈ F } and P n := P/P(Ω n ), and (b) for this martingale difference sequence the conditions of Lemma 3.3 are satisfied on the probability space Ω n , with f j , g j , and h j replaced by the restrictions to Ω n of h j , γ j := 1 {µ<j τ ∧ν} dg j , and V j+1 h j respectively, and with a = δ 2 λ, and b = δ 1 λ.
Indeed, fix 1 j k. If j n, then j µ on Ω n and therefore γ j = 1 {µ<j τ ∧ν} dg j = 0 on Ω n . If j > n, then {µ < j τ ∧ ν} ∩ Ω n = {j τ ∧ ν} ∩ Ω n is F j−1 -measurable as a subset of Ω and F j−1 | Ωn -measurable as a subset of Ω n and consequently for all This proves part (a) of the claim. Turning to part (b) of the claim, the condition dγ ⋆ δ 2 λ of Lemma 3.3 is immediate from the definition, and the adaptedness of V j+1 f j as well as the pointwise inequalities V j+1 f j f j are also clear. The pointwise inequality s(γ) δ 1 D −1 λ on Ω n = {µ = n} follows from where ( * ) follows from the F j−1 | Ωn -measurability of {j τ ∧ ν} ∩ Ω n for j > n and the last step uses the definition of τ .
Putting together the various inequalities and applying Lemma 3.3 on the space Ω n as indicated above, taking r = (β − 1 − δ 2 )λ, and using that h ⋆ = 0 on {µ = ∞}, by definition of ε we arrive at Proof of Theorem 3.1.
With the choices so Lemma 3.2 can be applied with these choices. This . and consequently This completes the proof in the conditional symmetric case.
Step 2. The general case will be reduced to the conditional symmetric case. This is a variation of a standard symmetrisation argument (cf. the proof of [Hit90, Theorem 4.1]). In view of the rather intricate setting and in order to obtain explicit constants, we present some details.
Using the terminology of [dlPnG99, Chapter 6], let (d g j ) k j=0 be the decoupled tangent sequence of (dg j ) k j=0 on a possibly enlarged probability space. There exists a σ-algebra G such that the sequence (d g j ) k j=0 is G -conditionally independent and such that Moreover we may assume that G = F k , trivially extending the latter σ-algebra to the larger probability space (see [dlPnG99,p. 294]). Let f 0 := f 0 = 0 and The differences dG j are conditionally symmetric. Therefore, by the symmetric case of Theorem 3.1, We estimate each of the three terms on the right-hand side.
As in [Hit88, Lemma 1 and p. 227], To estimate s(G) we note that s(G) s(g) + s( g) = 2s(g), where we used that [HNVW16,Lemma 4.4.5]). Thus j=1 be yet another decoupled tangent sequence of (dg j ) k j=1 on a further enlarged probability space. This sequence can be chosen in such a way that (dg j ) k j=1 and (d g j ) k j=1 are G -conditionally independent with G as before. Let f 0 := f 0 = 0 and f j : are G -conditionally independent. Therefore, by Jensen's inequality and the fact that E G f j = 0 (which follows by induction using E G dg j = 0), As before, (G j ) n j=1 is conditionally symmetric and therefore, by the symmetric case of Theorem 3.1, where the last step is the same as (3.4). The desired inequality is obtained by combining all estimates.
Remark 3.5. choices of the parameters β, δ 1 and δ 2 lead to related inequalities, with a different behaviour of the constants in p. In particular, as in [Pin94, Theorem 4.1] one can prove that there exists a constant C such that for all p ∈ [2, ∞) , and the latter growth is known to be optimal in the scalar case (see [Hit90]).
The next result extrapolates Theorem 3.1 to exponents 0 < p < 2. By using a variation of the method in [Bur73,, an estimate is obtained without the term dg * p .
Corollary 3.6. Let X be a (2, D)-smooth Banach space. Suppose that (f j ) k j=0 is an adapted sequence of X-valued random variables, (g j ) k j=0 is an X-valued martingale, (V j ) k j=1 is a sequence of random contractions on X which is strongly predictable (i.e., each V j x is strongly F j−1 measurable for all x ∈ X), and assume that we have f 0 = g 0 = 0 and Then for all 0 < p < 2 we have If, moreover, (g j ) k j=0 has conditionally symmetric increments, then Proof. By Doob's maximal inequality and the fact that X has martingale type 2 with constant D (by Remark 2.5) Therefore, Theorem 3.1 implies where (A, B) = (10, 10 √ 2) if g has conditionally symmetric increments and (A, B) = (60, 40 √ 2) in the general case. For non-negative random variables Z and exponents 0 < q < 1 we have the identity Once this has been verified, upon taking q = p/2, Z = |f ⋆ | 2 , and then Z = s(g) in (3.6), we obtain and the result follows.
To prove the claim, set τ := inf{0 n k − 1 : n+1 j=1 E j−1 dg j 2 λ}, with the convention that τ := k if the set is empty. Let the adapted sequence of random variables (F j ) k j=0 be defined by F 0 := 0 and F j := W j F j−1 + dG j , j = 1, . . . , k, where W j := V j if 0 j τ , W j := I if j > τ , and dG j := 1 {0 j τ } dg j . One checks that f j∧τ = F j for all j = 0, . . . , k. Applying (3.5) to F gives which gives the claim.

Maximal inequalities for stochastic convolutions
A family (S(t, s)) 0 s t T of bounded operators on a Banach space X is called a C 0 -evolution family if: (1) S(t, t) = I for all t ∈ [0, T ]; (2) S(t, r) = S(t, s)S(s, r) for all 0 r s t T ; (3) the mapping (t, s) → S(t, s) is strongly continuous on the set {0 s t T }.
C 0 -Evolution family typically arise as the solution operators for the linear timedependent problem u ′ (t) = A(t)u(t) in much the same way as C 0 -semigroups solve the time-independent problem u ′ (t) = Au(t). The reader is referred to [EN00,Paz83,Tan79] for systematic treatments. If (S(t)) t 0 is a C 0 -semigroup on X, then S(t, s) := S(t − s) defines a C 0 -evolution family (S(t, s)) 0 s t T for every 0 < T < ∞.
4.1. The main result. The following theorem is the main result of this paper.
Theorem 4.1. Let (S(t, s)) 0 s t T be a C 0 -evolution family of contractions on a (2, D)-smooth Banach space X and let W be an adapted H-cylindrical Brownian motion on Ω. Then for every g ∈ L 0 P (Ω; L 2 (0, T ; γ(H, X))) the process The stochastic integral is well defined by (2.10). By rescaling, more generally it may be assumed that there exists a λ 0 such that that The estimate of the theorem then holds with constant C p,D replaced with e λT C p,D .
Proof. The proof is split into four steps. In the first two steps we prove the theorem for 2 p < ∞, in the third step we consider the case 0 < p < 2, and in the fifth the pathwise continuity assertion for p = 0.
which can be seen to have a continuous modification. Working with such a modification, we will first prove that for all 2 p < ∞ we have √ p g L p (Ω;L 2 (0,T ;γ(H,X))) . (4.1) By a limiting argument it suffices to consider p > 2.
For the proof of (4.1), by density we may assume that g is as in (2.9), i.e., where 0 = s 0 < s 1 < . . . < s k = T and h i and ξ ij are as in (2.9). Refining π if necessary, we may assume that s j ∈ π for all j = 0, . . . , k. We prove (4.1) in two steps.
For fixed j = 1, . . . , m we have, by property (iii), where we set V j := S(t j , t j−1 ) and dG j := tj tj−1 K(t j , s)g s dW s . We further set f 0 := 0 and G 0 := 0. By using the symmetry of normally distributed random variables as in [HNVW16,Proposition 4.4.6] it is seen that the difference sequence (dG j ) m j=1 is conditionally symmetric. Therefore, by Theorem 3.1, where f = (f j ) m j=0 and G = (G j ) m j=0 .
Since p > 2, this proves (4.2) for finite rank adapted step processes g.
Since the right-hand side tends to zero by the dominated convergence theorem, (v (n) ) n 1 is a Cauchy sequence in L p (Ω; C([0, T ]; X)) and hence converges to some
Step 3. In the case 0 < p < 2 one can argue in the same way as in the previous steps, using Corollary 3.6 instead of Theorem 3.1. The estimate (4.3) simplifies as the term dG * p does not appear anymore. Alternatively, one could use a standard extrapolation argument involving Lenglart's inequality [RY99,Proposition IV.4.7].
Step 4. The continuity assertion for p = 0 follows by a standard localisation argument.
As a consequence of Theorem 4.1, a simple optimisation argument in the exponent p gives the following exponential tail estimate (see [NV20,Corollary 4.4] for details).
Corollary 4.2 (Exponential tail estimate). If, in addition to the conditions of Theorem 4.1, we have g ∈ L ∞ (Ω; L 2 (0, T ; γ(H, X))), then P sup where σ 2 = 100eD 2 g 2 L ∞ (Ω;L 2 (0,T ;γ(H,X))) . This method to derive exponential tail estimates only uses that the constant C p,X in the maximal estimate has order O( √ p) for p → ∞. By the same method, similar exponential tail estimates can therefore be deduced from all other results in this paper where the constant is of asymptotic order O( √ p) .
Remark 4.3. Under additional assumptions on the evolution family (which are satisfied in the case of C 0 -semigroups of contractions), a variant of Itô's formula can be used to give an alternative proof of the estimate of Corollary 4.2 with sharper variance σ 2 = 2D 2 g 2 L ∞ (Ω;L 2 (0,T ;γ(H,X))) (see [NV20, Theorem 5.6]). 4.2. The non-contractive case. We briefly discuss two sets of sufficient conditions for the existence of continuous versions and the validity of maximal estimates for general (i.e., not necessarily contractive) C 0 -evolution families (S(t, s)) 0 s t T . The first of these replaces the condition 'g ∈ L 0 P (Ω; L 2 (0, T ; γ(H, X)))' by 'g ∈ L 0 P (Ω; L q (0, T ; γ(H, X))) for some q > 2'. Under this stronger assumption, a maximal inequality for general C 0 semigroups on Hilbert spaces was obtained by Da Prato, Kwapień, and Zabczyk [DPKZ87] by the so-called factorization method. It was extended to C 0 -evolution families on Hilbert by Seidler [Sei93]. His proof extends mutatis mutandis to give the following result, which is taken from [NV20] where a further discussion is to be found.
Proposition 4.4 (Additional time regularity). Let (S(t, s)) 0 s t T be a C 0 -evolution family on a (2, D)-smooth Banach space X and let 2 < q < ∞. For all g ∈ L 0 P (Ω; L q (0, T ; γ(H, X))) the process ( In the second result we assume that g has additional space regularity. Although this may not seem surprising, we have not been able to find a reference for this in the literature, and for this reason we provide a detailed proof. The result will play a role in Theorem 5.13, where convergence rates for time discretisation schemes are studied under space regularity assumptions on g. When A is generator of a C 0 -semigroup on the Banach space X, for ν ∈ (0, 1) we denote by X ν,∞ =: (X, D(A)) ν,∞ the real interpolation space between X (see [Lun18] for more details).
Proposition 4.5 (Additional space regularity). Let A be the generator of a C 0semigroup S = (S(t)) t 0 on a (2, D)-smooth Banach space X and let 0 < ν < 1.
Proof. By localisation and Lenglart's inequality, it suffices to prove the continuity and maximal estimate for p > 1 2ν .
By Proposition 2.6, v has a continuous version satisfying the required maximal estimate, so it remains to prove the same for u. For this we will use the Kolmogorov-Chentsov continuity criterion [RY99, Theorem I.2.1].
For 0 s t T we have Therefore, by interpolation, Next, for 0 s t T we have (K|t − s| ν ) p E g p L 2 (0,T ;γ(H,Xν,∞)) . Xν,∞)) . Now we will use the assumption p > 1 2ν , which allows us to apply the Kolmogorov-Chentsov continuity criterion. It implies that for 0 < δ < 2ν − 1 p the process u has a (δ-Hölder) continuous version which satisfies Remark 4.6. The same result holds if we replace X ν,∞ by any Banach space which continuously embeds into X ν,∞ . In particular this implies to complex interpolation spaces and fractional domain spaces.

Martingales as integrators: Hilbert spaces.
In the remainder of this section we consider stochastic convolutions driven by an L 2 -martingale (M t ) t∈[0,T ] with values in a separable Hilbert space H. For details on stochastic integration in this setting we refer to [MP80,Mét82] and the summary in [HS08]. We will use a couple of notions from the theory of stochastic processes that have not been introduced in Section 2 but are otherwise completely standard; see for instance [Kal02,RY99].
In the present subsection we also let X be a Hilbert space; the case where X is a 2-smooth Banach space is discussed in the next subsection. By a standard argument involving the essential separability of the ranges of strongly measurable functions, there is no loss of generality in assuming X to be separable. This is relevant as we cite some results from the literature which are stated for separable spaces.
For details on the concepts we introduce below we refer to [Mét82, Chapter 4], where proofs of the various claims made below can be found. We denote by whenever the right-hand side of (4.6) is finite. Moreover, the predictable quadratic variation is given by In these identities, L 2 (H, X) denotes the space of Hilbert-Schmidt operators from H to X.
The following theorem shows that the main result of [Kot83] also holds with a strong type estimate instead of a weak estimate. A similar result was obtained in [Kot84] under additional assumptions on the evolution family (S(t, s)) 0 s t T . The result also covers the Poisson case; this can be seen in the same way as in [HS08, Section 3]. This result can be extended to a larger class of processes g by a density argument, but the description of the space is quite technical. The interested reader is referred to [HS08,Mét82].
Proof. By Lenglart's theorem and a localisation argument as in Theorem 4.1 it suffices to consider p = 2. Moreover, by localisation we may assume that M is a continuous (respectively, càdlàg) L 2 -martingale. By approximation it furthermore suffices to consider adapted step processes g. We will focus on the continuous case, the càdlàg case being similar. Only the required changes in the proof of Theorem 4.1 will be indicated.
First of all, g L 2 (0,T ;γ(H,X)) must be replaced by With this adjustment, up to (4.3) the proof is verbatim the same. By Theorem 3.1 with p = 2 we find that Noting that dG ⋆ 2 2 G ⋆ 2 4 G 2 = 4 s(G) 2 by Doob's maximal inequality and combining the above with (4.6) and the bound K(t j , s) 1, we obtain where C = 240 + 40 √ 2 < 300. The proof is similar to those of Theorems 4.1 and 4.7, but some modifications are required which we sketch below.
By a stopping time argument we may assume that M and M are uniformly bounded on [0, T ] × Ω. By approximation it can be assumed that g is an adapted finite rank step process. Then up to (4.3) the proof is the same. Theorem 3.1 gives that Moreover the following extension of (4.6) holds: Since g is uniformly bounded it follows that By dominated convergence the right-hand side tends to zero as the mesh size tends to 0. The result follows once we have shown that with convergence in L p/2 (Ω). If we replace s(G) 2 by s(G) 2 := m j=1 dG j 2 this follows from (4.7) (as explained in [Bur88, Section 4], the scalar case considered in [Dol69] extends to the Hilbert space). The proof will be completed by showing that for any q ∈ [1, ∞). Without loss of generality we may take q 2 and since g is an adapted finite rank step process. To prove the convergence in L q (Ω) we note that by the scalar case of Theorem 3.1, applied with V j = I and martingale differences dL j = dG j 2 − E j−1 ( dG j 2 ), for all 2 q < ∞ we have We have already seen that the first term tends to 0 as the mesh size tends to zero. For the second term we use [HNVW16, Proposition 3.2.8] and Hölder's inequality to find that 4.4. Martingales as integrators: 2-smooth UMD Banach spaces. As before we let H be a separable Hilbert space and turn to the case where X is a (2, D)smooth Banach space with the UMD property. Discussions of UMD spaces can be found in [HNVW16,Pis16]. Rather than introducing this property here, we content ourselves by mentioning that examples of Banach spaces with this property include Hilbert spaces, L p -spaces with 1 < p < ∞ and most classical function spaces constructed from these. We will prove an extension of the maximal estimate of the preceding subsection to this setting by using some results from [Yar20a]. To avoid technicalities with non-predictable quadratic variations we only consider continuous local martingales with values in H. In that case the quadratic variation considered in [Yar20a] coincides with the one of Subsection 4.3 (see [Mét82,Theorem 20.5]). Let g : [0, T ] × Ω → L (H, X) be a process such that g(h) is predictable for all h ∈ H and g t Q whenever the expression on the right-hand side is finite. If in addition X has type 2 (which holds if X is 2-smooth), then by [NW05, Theorem 6.1] where τ 2,X is the type 2 constant of X. We will consider processes for which the right-hand side is finite almost surely. Proof. We argue as in Theorem 4.7 and Remark 4.8. Since we may assume that g takes values in a finite dimensional subspace of X, as in Remark 4.8 it follows that dG ⋆ p → 0 as the mesh(π) → 0. It remains to estimate s(G). By a standard argument (4.8) and (4.9) imply where C X is a constant only depending on X. Therefore, by [HNVW16, Proposition 3.2.8], The proof can now be completed as before.
Observe that this method gives the result with C p,X = 40 √ 2 pC X for p 2, which is linear in p as p → ∞; this contrasts with the O( √ p) growth obtained in all other places in the paper.
The infinite dimensional version of the Dambis-Dubins-Schwarz theorem of [VY16, Theorem 4.9] suggests that the correct order of the constant in Theorem 4.9 is O( √ p).
We expect that a large portion of Theorem 4.9 extends to the setting of (non necessarily continuous) local martingales if one replaces the predictable quadratic variation M by the process [M ] as defined in [Mét82,Theorem 20.5]. However, usually it is preferred to work with a predictable quadratic variation. An alternative substitute for predictability has been recently developed in [Dir14] in the Poisson case and in [DY19,Yar20b] for general local martingales, but the norms are much more complicated to work with. It would be interesting to see if one can combine our techniques with the estimates in [Dir14,DY19] for X = L q with 2 q < ∞, or in [Yar20b] for more general Banach spaces X.

Applications to time discretisation
In this section we will apply our abstract results to prove stability of certain numerical approximations of stochastic evolution equations with additive noise of the form This setting covers to both parabolic and hyperbolic time-dependent SPDEs; the latter class includes the stochastic wave equation and the Schrödinger equation. To solve (5.1) numerically one typically uses discretisation in time and space [JK11,LPS14]. Here we will only consider time discretisation, leaving space-time discretisation and the extension to semi-linear equations with multiplicative noise for a future publication. In that respect the results presented here serve as a proof-ofprinciple only. We mainly focus on the splitting scheme and the implicit Euler scheme, although the method is robust and can be applied to other schemes as well.
In what follows, for n = 1, 2, . . . we set t (n) j := jT /n and consider the partition as a discretision of the interval [0, T ]. We fix a process g ∈ L 0 P (Ω; L 2 (0, T ; γ(H, X))) and consider the continuous martingale For j = 0, . . . , n we set In the presence of a C 0 -evolution family (S(t, s)) 0 s t T we set This covers the special case of C 0 -semigroups by letting S(t, s) = S(t − s).
5.1. The splitting method. Our first result gives stability of a time discretisation scheme for the stochastic convolution process involving a C 0 -evolution family of contractions called the splitting method (also called the exponential Euler method). This scheme has already been employed in the proof of Theorem 4.1. An extension to random evolution families is discussed in Remark 6.8.
Theorem 5.1 (Uniform convergence of the splitting method). Let (S(t, s)) 0 s t T be a C 0 -evolution family of contractions on a (2, D)-smooth Banach space X. Let g ∈ L p P (Ω; L 2 (0, T ; γ(H, X))) with 0 < p < ∞. Define, for n 1, u (n) 0 j M is given by (5.2). Then for all n 1 we have The process u has a continuous modification by Theorem 4.1. We will not need this modification in the proof, because the suprema in (5.3) and (5.4) are taken with respect to finite index sets. This remark applies to all results in this subsection and the next (in Theorem 5.13 the existence of the continuous modification follows from Proposition 4.5).
Proof. To simplify notation we fix n 1 and write t j := t Therefore,  X)) . The assertion E n → 0 as n → ∞ follows by dominated convergence in combination with the convergence criterion [HNVW17, Theorem 9.1.14].
In the next corollary we obtain explicit convergence rates for processes g taking values in intermediate spaces. In order to make the statement easy to formulate we only consider the case of semigroup generators.
Corollary 5.2 (Uniform convergence of the splitting method with decay rate). Let (S(t)) t 0 be a C 0 -contraction semigroup on a (2, D)-smooth Banach space X. As in the preceding theorem, for n 1 let u (n) 0 j M is given by (5.2). Let X ν := (X, D(A)) ν,∞ for ν ∈ (0, 1) and X 1 := D(A), where A is the generator of the semigroup. If g ∈ L p P (Ω; L 2 (0, T ; γ(H, X ν ))) with 0 < p < ∞, then for all n 1 we have For 2 p < ∞ the inequality holds with C p,D = 10D √ p.
A version of the above result for C 0 -semigroups which are not necessarily contractive and a general class of discretisation schemes will proved in Theorem 5.13.
Proof. Since  the sum on the right-hand side being convergent since (ν − β)p > 1.

5.2.
General time discretisation methods. We now investigate whether analogues of Theorem 5.1 hold for general time discretisation methods. Before returning to convergence questions, we consider a stability result for abstract numerical schemes featuring random operators V j,n satisfying an F tj−1 -measurability condition. In particular, the operators are allowed to depend on u and g up to time t j−1 . This makes this result applicable to nonlinear problems.
Proposition 5.4 (Stability). Let X be a (2, D)-smooth Banach space and assume that g ∈ L p P (Ω; L 2 (0, T ; γ(H, X))) with 2 p < ∞. For n = 1, 2, . . . and j = 1, . . . , n assume that the random contraction V j,n : Ω → L (X) is such that V j,n x is strongly F t (n) j−1 -measurable for all x ∈ X, and define u (n) 0 Proof. We fix n 1 and write t j := t We will estimate the terms on the right-hand side separately. By Proposition 2.6, M T p 10D √ p g L p (Ω;L 2 (0,T ;γ(H,X))) .
To estimate s(M ), by (2.10) we have . By the dual of Doob's maximal inequality (see [HNVW16, Proposition 3.2.8]) and using p/2 1 The required estimate follows by combining the estimates.
Remark 5.5. For p = 2 the inequality holds with K 2,D = 40D + 10 √ 2D 2 . This is because in the case p = 2 we can use (2.10) instead of Proposition 2.6.
Remark 5.6. In the setting of monotone operators on Hilbert spaces, a related stability result for p = 2 for the implicit Euler method can be found in [GM07,Theorem 2.6].
Returning to the problem of convergence, the convergent numerical schemes which we will consider are given in the following definition.
Definition 5.7. Let X be a Banach space. An L (X)-valued scheme is a function R : [0, ∞) → L (Y, X). If A generates a C 0 -semigroup S on X and Y us a Banach space continuously and densely embedded in X, an L (X)-valued scheme R is said to approximate S to order α > 0 on Y if for all T > 0 there exists a constant K 0 such that for all integers n 1 and t ∈ [0, T ] we have 1 for all n 1 and t 0.
If R approximates S to order α on Y and there exists a constant C 0 such that R(t/n) n C and S(t) C for all n 1, t ∈ [0, T ], then by real interpolation it approximates S to order θα on the real interpolation spaces (X, Y ) θ,∞ for θ ∈ (0, 1) with estimate An interesting special case arises when Y = D(A m ). If an L (X)-valued scheme R approximates S to order α on D(A m ), then R approximates S to order θα on (X, D(A m )) θ,∞ .
Proposition 5.8. Let T > 0 and suppose that there exists a constant C 0 such that for all t ∈ (0, T ] and integers n 1, R(t/n) n C and S(t) C. Suppose that the L (X)-valued scheme R approximates S to order α on D(A m ) for some integer m 1, and let 0 < θ < 1. Then R approximates S to order θα on (X, D(A m )) θ,∞ .
Since the continuous embedding D((−A) θm ) ֒→ (X, D(A m )) θ,∞ holds, we obtain the following: If S(t) M e µt for all t 0, with M 1 and µ ∈ R, then R approximates S to order θα on the fractional domain D((µ − A) θm ).
We will now review some examples of numerical schemes satisfying the conditions of the above definition. Classical references include [BT79,HK79] and, for analytic semigroups, [CLPT93]. A new and unified approach to approximation of semigroups which sharpens several classical estimates has been recently developed in [GT14,GKT19].
Theorem 5.9 (Time discretisation). Let r : C → C be a rational function such that |r(z)| 1 for all ℜz 0, and assume that there exists an integer ℓ 1 such that |r(z) − e z | = O(z ℓ+1 ) as z → 0.
Let A be the generator of a bounded C 0 -semigroup on (S(t)) t 0 a Banach space X and set R(t) := r(tA), t 0.
Then R approximates S in each of the following cases: (1) splitting: r(z) = e z , to any order on X.
(2) implicit Euler: Example 5.11 (Time discretisation for analytic C 0 -semigroups). Let A be the generator of a bounded analytic C 0 -semigroup (S(t)) t 0 on X. For each of the functions r below we set R(t) := r(tA), t 0.
Then R approximates S in each of the following cases: (1) splitting: r(z) = e z , to any order on X.
(3) Crank-Nicholson: r(z) = (1 + 1 2 z)(1 − 1 2 z) −1 , to order 2ν on D(A 2ν ) for any ν ∈ (0, 1]. If A generates a contractive C 0 -semigroup (S(t)) t 0 the splitting method and implicit Euler methods lead to contractive approximants S n (t). In the following proposition we discuss another class of examples where this holds. It applies to all numerical schemes of the form R(t) = r(tA) considered in Theorem 5.9 and includes all schemes considered in [BT79,HK79]. We use the notation where the argument is taken from (−π, π]. Proposition 5.12. Let A be the generator of a C 0 -semigroup of contractions on a Hilbert space. Suppose that r : Σ σ → C is holomorphic for some 1 2 π < σ < π and satisfies |r(z)| 1 for all ℜz 0. Then r(−tA) 1 for all t > 0, where r(−tA) is defined through the H ∞ -calculus of −A.
The proof is immediate from [HNVW17, Theorem 10.2.24]. The proposition is false beyond the Hilbert space setting. Indeed, for the operator A = d/dx on X = L p (R) with p = 2 or X = C 0 (R), in [BT70] it was shown that contractivity of R(t) fails for a general class of schemes (see also [CLPT93] for the Crank-Nicholson scheme).
In what follows we restrict ourselves to the semigroup setting, but expect the results to extend to evolution families under suitable additional conditions. In the next theorem we obtain convergence rates for a rather general class of discretisation schemes, which in case of the splitting method turn out to be equal to the ones of Corollary 5.2 up to a logarithmic term. Modulo this term, the theorem extends Corollary 5.2 in two ways: • contractivity of S is not needed; • the result holds for arbitrary approximation schemes. The proof directly uses Seidler's version of the Burkholder inequality of Proposition 2.6 in combination Proposition 2.7 and works for C 0 -semigroup and numerical schemes that are not necessarily contractive. The results of Sections 3 and 4 are not used. One should carefully note, however, that inhomogeneities g taking values in γ(H, X ν ) are considered, where X ν is a suitable intermediate space between X and D(A m ). The case of inhomogeneities g taking values in γ(H, X) will be considered in Theorem 5.14 and does require contractivity.
Examples of numerical schemes satisfying the conditions of the theorem can be obtained from Examples 5.10 and 5.11. Note that the embedding condition Y ֒→ X α is satisfied for the real interpolation spaces (X, D(A)) α,r with 1 r ∞, the complex interpolation spaces [X, D(A)] α and the fractional domain spaces D((µ − A) α ) for suitable µ ∈ ̺(A) for all α ∈ (0, 1).
With this notation, Therefore, By the bound (2.11) in Proposition 2.7, for n 3 we have where we may take C p,D = 10D √ 2ep if 2 p < ∞. By (4.5), for 0 s t T we have Hence from the assumption on the numerical scheme we conclude that for all s ∈ [t For C 0 -semigroups of contractions and contractive discretisation schemes, the next theorem provides uniform convergence in time for inhomogeneities g taking values in γ(H, X).
Theorem 5.14 (Convergence for contractive schemes). Let A be the generator of a C 0 -contraction semigroup S = (S(t)) t 0 on a (2, D)-smooth Banach space X. Let R be an L (X)-valued contractive scheme approximating S to some order α ∈ (0, 1] on D(A). Let g ∈ L p P (Ω; L 2 (0, T ; γ(H, X))) with 2 p < ∞ and let u t := By Theorem 4.1 and Proposition 5.4, the operators J and J (n) are (uniformly) bounded with J C p,D and J n K p,D respectively, the latter constant being defined as in Proposition 2.7.
To prove convergence in Z (n) p , fix ε > 0 and let f ∈ L p (Ω; L 2 (0, T ; γ(H, D(A)))) be such that g − f L p (Ω;L 2 (0,T ;γ(H,X))) < ε. By the boundedness and linearity of J and J (n) , , and the last term tends to zero as n → ∞ by Theorem 5.13. Since ε > 0 was arbitrary the result follows.
5.3. Applications to SPDE. We will now apply the results to some simple examples of stochastic PDE and compare the results with results available in the literature. It goes without saying that with additional work more sophisticated problems can be treated. While this will be taken up in forthcoming work, the objective here is to treat some model problems in order to see where our methods can be expected to improve the presently available rates.
We begin with the stochastic heat equation. The results of the next example can be extended to more general uniformly elliptic operators with space-dependent coefficients. As will follow from Section 6, if one is only interested in the splitting method the coefficients can even be taken progressively measurable in (t, ω).
Example 5.15 (Stochastic heat equation). Consider the inhomogeneous stochastic heat equation on R d : We assume that g = (g k ) k 1 belongs to L p P (Ω; L 2 (0, T ; H λ,q (R d ; ℓ 2 ))) with 0 < p < ∞, and W = (W k ) k 1 is a sequence of independent standard Brownian motions. We can view W as an ℓ 2 -cylindrical Brownian motion in a natural way by putting, for h = (k k ) k 1 ∈ ℓ 2 , W t h := W (1 (0,t)⊗h ) := k 1 h k W k , noting that the sum on the right-hand side converges in L 2 (Ω). As is well known, the operator ∆ generates an analytic C 0 -semigroup of contractions on the Bessel potential spaces H λ,q (R d ) and D(∆) = H λ+2,q (R d ) for all λ ∈ R and 1 < q < ∞.
Let us now assume that 2 q < ∞. By Theorem 4.1, the mild solution u to the problem (5.8) has a continuous modification with values in H λ,q (R d ) which satisfies where we may take C p,q = 10 √ p(q − 1) if 2 p < ∞. Here we used that H λ,q (R d ) is (2, √ q − 1)-smooth by Proposition 2.2 and that g t γ(ℓ 2 ,H λ,q (R d )) g t γq(ℓ 2 ,H λ,q (R d )) = γ q g t H λ,q (R d ;ℓ 2 ) by Hölder's inequality and [HNVW17, Proposition 9.3.2]), where γ is a standard Gaussian random variable (whose moments satisfy γ q √ q − 1). We consider the approximation scheme (5.6) for the splitting (S), implicit Euler (IE), and Crank-Nicholson (CN) schemes discussed in Example 5.11. Each of them leads to a sequence of approximate solutions (u (n) j ) n j=0 , n 1, for which we define the approximation errors These numbers also depend on p, q, λ and d, but the rates in the estimates below will be independent of these parameters. By Theorem 5.14, E n,0 → 0 for (S) and (IE). For q = 2, (CN) is contractive by Proposition 5.12 and again we obtain E n,0 → 0. Moreover, we can give rates of convergence for each of these methods. These are given in Table 1 for the errors E n,β with β ∈ (0, 1] (up to constants depending on p, q). The assertions follow from Example 5.11, and Corollary 5.2 and Theorem 5.13 applied with Up to a logarithmic term the convergence rates are the same for the three schemes, independently of p ∈ (0, ∞). Although (S) and (CN) have better orders of convergence, the convergence rate of the approximation errors E n,β cannot exceed β due to limitations in Corollary 5.2 and Theorem 5.13. We next consider a simple non-parabolic equation. Here, higher order schemes give better rates of convergence. Other non-parabolic examples, including wave equation on R d (for q = 2), can be treated similarly.
Let us now assume that 2 q < ∞. As before, by Theorem 4.1, the mild solution u to the problem (5.9) has a continuous modification with values in H λ (R) which satisfies where may take C p,q = 10 √ p(q − 1) if 2 p, ∞. As before, for β 0 let By Theorem 5.14 we have E n,0 → 0 for (S) and (IE), and if q = 2 the same holds for (CN) by Proposition 5.12. Table 2 gives the estimates for the errors E n,β for suitable intervals for β (up to constants depending on p, q). The assertions follow from Example 5.10 (using Proposition 5.12 for (CN) if q = 2), Corollary 5.2, and Theorem 5.13 applied with X = H λ−β,q (R), D(A m ) = H λ−β+m,q (R) and Y = H λ,q (R) = [X, D(A m )] β/m for m = 1 for (S), m = 2 for (IE), and m = 3 for (CN). Note that φ(8/5) = 1; since the convergence rate cannot exceed 1, there is no point in considering values β > 8 5 .
Our final example concerns the Schrödinger equation.
Example 5.17 (Stochastic Schrödinger equation). Consider the following heat equation on R d : We assume that g ∈ L p P (Ω; L 2 (0, T ; H λ (R d ; ℓ 2 ))) for some 0 < p < ∞, where H λ (R d ) = H λ,2 (R d ). It is well known that i∆ generates a unitary C 0 -group on H λ (R d ) for all λ ∈ R. As before, by Theorem 4.1, the mild solution u to the problem (5.9) has a continuous modification with values in H λ (R d ) which satisfies where we may take C p = 10 √ p if 2 p < ∞. As before let By Theorem 5.14, E n,0 → 0 for (S), (IE), and (CN) (using Proposition 5.12 for the latter). Table 3 gives the estimates for the errors E n,β (up to constants depending on p) for suitable intervals for β. The assertions follow from Example 5.10, Corollary 5.2, and Theorem 5.13 applied with X = H λ−2β (R d ) and Y = H λ,q (R d ) = [X, D(A m )] β/m for m = 1 for (S), m = 2 for (IE), and m = 3 for (CN).
to be assumed and convergence in Hölder norms is obtained under L p -integrability conditions in time with p > 2. See Table 4 for a comparison of the convergence rates.
In [CvN13] (in the setting of UMD spaces) and [GM07] (in the setting of monotone operators on Gelfand triples V ֒→ X ֒→ V * ), the implicit Euler scheme was considered with uniform convergence in time, but these results seem not to be comparable to ours due to the fact that an additional discretisation of the noise term is allowed. In the latter reference, convergence rates of order n −ν are obtained under the assumption that the solution u belong to C ν ([0, T ]; L 2 (Ω; V )) ∩ L 2 (Ω; L ∞ (0, T ; V )). Results on uniform convergence in time (and sometimes even convergence in Hölder norms in time) for schemes involving space and time discretisation can be found in many papers, including [CH12, CH13, CHJ + 16, Gyö99, GM09, Yoo00, Jen09, PS05]. Results concerning uniform convergence in case of white noise and discretisation in time only can be found in [BCH19, BG19, GN95,GN97]. Some results are with explicit rates and some are not, but the schemes considered in these papers are different.
In the parabolic setting, results on convergence of the form (5.10) sup j=0,...,n (notice the reversed order of supremum and expectation) with explicit rates, which can even be faster than 1/n, can be found in [CvN10, JK11, LPS14] and references therein.
paper Scheme β g ∈ L r in time Error E n present splitting (0, 1] Table 4. Comparison of rates in the parabolic setting.
For non-parabolic problems no systematic results seem to be available on uniform convergence in time. In [Wan15] uniform convergence with explicit rates has been obtained for a nonlinear wave equation with the splitting scheme. The fact that the underlying semigroup is a group allows us to write and uniform convergence can be obtained from standard maximal estimates for martingales. In [FTT10] the authors obtain uniform convergence results in case the semigroup admits a dilation to a group. Our results do not rely on the above identity and therefore are applicable in the case of arbitrary contractive C 0 -semigroups, and the convergence holds with the same rate. Even more is true: for arbitrary C 0 -semigroups and general numerical schemes the same convergence rates can be obtained up to a logarithmic factor.

Maximal inequalities for random stochastic convolutions
In this section we consider the time-dependent problem with random operators A(t). More precisely we assume that (A(t, ω)) (t,ω)∈[0,T ]×Ω is an adapted family of closed operators acting in X which satisfy suitable conditions, to be made precise below, guaranteeing the generation of an adapted evolution family. We will assume throughout that W is an adapted H-cylindrical Brownian motion on Ω. and that g : [0, T ] × Ω → γ(H, X) is progressively measurable; recall that this is equivalent to the requirement that g(h) : [0, T ] × Ω → X is progressively measurable for all h ∈ H. Many of the results of this section are expected to extend to more general martingales.
6.1. The forward stochastic integral. In analogy with the non-random case one expects that (6.1) admits a mild solution given as before by the stochastic convolution process t 0 S(t, s)g s dW s . This stochastic integral, however, cannot be defined as an Itô stochastic integral because the random variables S(t, s)x are only assumed to be F t -measurable rather than F s -measurable and consequently the integrand will not be progressively measurable in general.
To overcome this problem we use the forward stochastic integral, introduced and studied by Russo and Vallois [RV93] in the scalar-valued setting. Following [LN98, PV14, PV15] we define its vector-valued analogue as follows. Fix an orthonormal basis (h k ) k 1 of H. For processes Φ ∈ L 0 (Ω; L 2 (0, T ; γ(H, X))) and n = 1, 2, . . . define The process Φ is forward stochastically integrable if the sequence (I − (Φ, n)) n 1 converges in probability. If this is the case, the limit is independent of the choice of orthonormal basis and is called the forward stochastic integral of Φ. We write Notice that Φ is not assumed to be progressively measurable. It is easy to see that if Φ is a finite rank step process, then Φ is forward integrable. In case Φ is progressively measurable and integrable in the Itô sense, then the forward stochastic integral exists and coincides with the Itô integral (see [PV15, Proposition 3.2]). In order to apply the forward integral to our problem we make following Hypothesis: Hypothesis 6.1. The family (S(t, s, ω)) 0 s t T, ω∈Ω is an adapted C 0 -evolution family of contractions on X, i.e., (i) (S(t, s, ω) 0 s t T is a C 0 -evolution family of contractions for every ω ∈ Ω; (ii) S(t, s, ·)x is strongly F t -measurable for all 0 s t T and x ∈ X.
We have the following sufficient condition for forward integrability (see [PV15,Corollary 5.3], which extends to the current setting).
Proposition 6.2. Suppose that Hypothesis 6.1 holds, with X a 2-smooth Banach space, and let g : [0, T ] × Ω → γ(H, Y ) be a finite rank adapted step process. Then process (S(t, s)g s ) s∈[0,t] is forward integrable on [0, t] and almost surely we have Moreover, the process ( t 0 S(t, s)g s dW − s ) t∈[0,T ] has a continuous modification. The right-hand side of (6.2) is well defined by the hypothesis and the assumption that g takes values in Y . By the almost sure pathwise continuity of · 0 g s dW s , the forward integral in (6.2) admits a continuous modification.
Remark 6.3. In the setting where S is generated by an adapted family (A(t)) t∈[0,T ] satisfying suitable parabolicity assumptions, the right-hand side of (6.2) is called the pathwise mild solution of (6.1). Pathwise mild solutions were introduced and extensively studied in [PV14]. In the parabolic case, ∂ s S(t, s) typically extends to a bounded operator on X and ∂ s S(t, s) C(t − s) −1 , where C depends on ω ∈ Ω. Since · 0 g r dW r is almost surely Hölder continuous under L p (0, T )-integrability assumptions on g with p > 2, the right-hand side of (6.2) exists pathwise as a Bochner integral.
It is quite difficult to prove estimates for the forward integral directly. A major advantage of using the right-hand side of (6.2) is that one can obtain estimates using only Itô and Bochner integrals.
6.2. The maximal inequality. We will now extend the maximal estimate of Theorem 4.1 to random evolution families, replacing the Itô stochastic integral of that theorem by the forward stochastic integral. The precise sense in which the forward integral constitutes a solution of the problem (6.1) will be addressed subsequently in Theorem 6.6. Even without the supremum on the left-hand side, the estimate in Theorem 6.4 is new.
Theorem 6.4. Suppose that Hypothesis 6.1 holds,with X a 2-smooth Banach space, and let g : [0, T ] × Ω → γ(H, Y ) be a finite rank adapted step process. Then for all 0 < p < ∞ we have where the constant C p,D only depends on p and D. For 2 p < ∞ the inequality holds with C p,D = 10D √ p.
Proof. The proof is similar to that of Theorem 4.1, but with some extra technicalities which justify a detailed presentation.
Define the process (v t ) t∈[0,T ] by v t := t 0 K(t, s)g s dW − s , (6.3) this forward integral being well defined since the integrand is a finite rank step process. For t ∈ [0, r 1 ] the above integral coincides with the Itô integral since K(t, s, ·) is strongly F 0 -measurable. By (iii), for r j−1 s t < r j we have v t = S δ (t, s)v s + t s K(t, r)g r dW r , (6.4) where the stochastic integral is again an Itô integral since the random variable K(t, r, ·) does not depend on r ∈ [s, t] ⊆ [r j−1 , r j ) by (i) and is strongly F rj−1measurable by (iv) and the inclusion F (t−δ) + ⊆ F rj−1 (using that (t − δ) + r j−1 ). Properties (i) and (ii) imply that v has a modification with continuous paths. Working with such a modification, we will first prove that for all 2 p < ∞ one has √ p g L p (Ω;L 2 (0,T ;γ(H,X))) .
Step 3. We will next show that lim δ↓0 t 0 S δ (t, s)g s dW − s = t 0 S(t, s)g s dW − s in L 0 (Ω; C([0, T ]; X)). This will be done by providing an alternative formula for t 0 S δ (t, s)g s dW − s in which we can let δ ↓ 0. Here it will be important that g takes values in Y .
Fix t ∈ (0, T ]. Since ∂ s (S(t, s)y) X C y Y with a constant C independent of 0 < s < t T , it follows from Proposition 6.2 that the forward stochastic convolution integral u t := Letting δ ↓ 0, by the piecewise strong continuity of t → ∂ s S(t, s) on Y and dominated convergence we obtain that v δ (t) → u(t) almost surely. By dominated convergence one also obtains that u has a continuous modification. To prove the maximal estimate for this modification it suffices to show that for any finite set π ∈ [0, T ], sup t∈π u t 10D √ p g L p (Ω;L 2 (0,T ;γ(H,X))) .
Step 4. The case 0 < p < 2 follows again by using Corollary 3.6 instead of Theorem 3.1, or by an extrapolation argument involving Lenglart's inequality.
If the embedding Y ֒→ X is dense we can use the maximal inequality of the theorem to see that for all 0 < p < ∞ the mapping g → t 0 S(t, s)g s dW − s has a unique extension to a continuous linear operator J p : L p P (Ω; L 2 (0, T ; γ(H, X))) → L p (Ω; C([0, T ]; X)). Moreover, by a standard localisation argument, J has a unique extension to a continuous linear operator J : L 0 P (Ω; L 2 (0, T ; γ(H, X))) → L 0 (Ω; C([0, T ]; X)). It is not guaranteed, however, that for general g ∈ L 0 P (Ω; L 2 (0, T ; γ(H, X))) the process Jg is given by a forward stochastic convolution again, nor is this clear if we replace L 0 and J by L p and J p . The same problem occurs if we use the right-hand side in the identity in Proposition 6.2.
Since J p satisfies the same estimate as in Theorem 6.4, we immediately obtain an extension of the exponential tail estimate of Corollary 4.2 in the current setting. As in Remark 4.3 under more restrictive conditions on the random evolution family, but with better bound on the variance σ 2 a similar result was obtained in [NV20, Remark 5.8].
The next theorem addresses the question in what sense J p g and Jg "solve" the problem (6.1). Some additional assumptions are needed to establish the precise relation between the random evolution family S and the random operator A.
Hypothesis 6.5. Hypothesis 6.1 is satisfied. Furthermore, the random operator family A : [0, T ] × Ω → L (Y, X) has the property that Ay is strongly progressively measurable for all y ∈ Y . Furthermore the following conditions hold: (i) For almost all ω ∈ Ω we have S(t, ·, ω)y ∈ W 1,1 (0, t; X) for all t ∈ [0, T ] and y ∈ Y , and for almost all s ∈ [0, t] we have ∂ s S(t, s)y = −S(t, s)A(s)y and S(t, s)A(s)y X C y Y , where C : Ω → [0, ∞) is independent of y ∈ Y and 0 s < t T . (ii) For almost all ω ∈ Ω we have S(·, s, ω)y ∈ W 1,1 (s, T ; X) for all s ∈ [0, T ] and y ∈ Y , and for almost all t ∈ [s, T ] we have ∂ t S(t, s)y = A(t)S(t, s)y and where C : Ω → [0, ∞) is independent of y ∈ Y and 0 s < t T . (iii) There exists a dense subspace F ⊆ X * such that F ⊆ D(A(t, ω) * ) for all (t, ω) ∈ [0, T ] × Ω, and almost surely the mapping (t, ω) → x, A(t, ω) * x * belongs to L ∞ (0, T ) for all x ∈ X and x * ∈ F .
In the proof below we will combine (iii) with the observation that if f : (0, T ) → X is integrable and g : (0, T ) → X * has the property that x, g ∈ L ∞ (0, T ) for all x ∈ X, then the function t → f (t), g(t) is integrable and x, g ∞ , the supremum on the right-hand side being finite by a closed graph argument. Indeed, this estimate is clear for simple functions f and the general case follows by approximation.
Under the above hypothesis a process u ∈ L 0 P (Ω; L 1 (0, T ; X)) is called a weak solution of (6.1) if for all x * ∈ F , a.s. for all t ∈ [0, T ], In many situations weak solutions are known to be unique. However, we will not address this issue here.
Theorem 6.6. Suppose that Hypothesis 6.5 holds, with X a 2-smooth Banach space, and assume in addition that the embedding Y ֒→ X is dense. Then for every g ∈ L 0 P (Ω; L 2 (0, T ; γ(H, X))) the process Jg is a weak solution to (6.1). Proof. We proceed in three steps.
Step 1. First let g : [0, T ] × Ω → L (H, Y ) be an adapted finite rank step processes and write v g t = t 0 g dW . From Proposition 6.2, Theorem 6.4 and Hypothesis 6.5(i) it is immediate that E sup t∈[0,T ] u g t p C p p,D g p L p (Ω;L 2 (0,T ;γ(H,X))) , (6.8) and u g t = S(t, 0)v g t − t 0 S(t, s)A(s)(v g t − v g s ) ds.
S(t, r)A(r)v g r , x * dr, which gives the required identity.
Step 2. Let g ∈ L p P (Ω; L 2 (0, T ; γ(H, X))) with 0 < p < ∞ and choose a sequence of Y -valued adapted finite rank step processes (g (n) ) n 1 such that g (n) → g in L p (Ω; L 2 (0, T ; γ(H, X))). Then from (6.8) applied to g (n) − g m we obtain that (u g (n) ) n 1 is a Cauchy sequence and therefore converges to some u in L p (Ω; C([0, T ]; X)). By Step 1, u g (n) is a weak solution and thus s ) * x * dW s .
Letting n → ∞ in this identity we conclude that u g is a weak solution. The maximal inequality is obtained by applying (6.8) with g n and letting n → ∞.
Remark 6.7. In [LN98, Proposition 5.3], restrictive conditions in terms of Malliavin differentiability of S are given under which the forward stochastic integral u t = t 0 S(t, s)g s dW − s exists, has a continuous modification, and is a weak solution. Inspection of the proof shows that that if one sets u (n) t := I − (1 [0,t] S(t, ·)g (n) ), one needs that sup t∈[0,T ] u t −u (n) t L 1 (Ω;X) → 0. Although this is likely to hold in many situations, such considerations can be avoided by using the right-hand side of (6.2).
Remark 6.8. Theorem 5.1 extend mutatis mutandis to random evolution families. The only required change is to use the forward integral in the proof and to apply Theorem 6.4 instead of Theorem 4.1. To obtain explicit decay rates under the assumption that g has spatial smoothness, i.e., g takes values in a Banach space Y continuously embedded in X, one requires estimates for S(s, σ n (s)) − I L (Y,X) . In some applications (e.g. [Paz83, Section 5.2]) such estimates are available.