Some approximation results for mild solutions of stochastic fractional order evolution equations driven by Gaussian noise

We investigate the quality of space approximation of a class of stochastic integral equations of convolution type with Gaussian noise. Such equations arise, for example, when considering mild solutions of stochastic fractional order partial differential equations but also when considering mild solutions of classical stochastic partial differential equations. The key requirement for the equations is a smoothing property of the deterministic evolution operator which is typical in parabolic type problems. We show that if one has access to nonsmooth data estimates for the deterministic error operator together with its derivative of a space discretization procedure, then one obtains error estimates in pathwise Hölder norms with rates that can be read off the deterministic error rates. We illustrate the main result by considering a class of stochastic fractional order partial differential equations and space approximations performed by spectral Galerkin methods and finite elements. We also improve an existing result on the stochastic heat equation.

the deterministic error rates. We illustrate the main result by considering a class of stochastic fractional order partial differential equations and space approximations performed by spectral Galerkin methods and finite elements. We also improve an existing result on the stochastic heat equation.

Introduction
Let H be a real separable Hilbert space and let W H be a H -cylindrical Wiener process on a complete, filtered probability space ( , F, (F t ) t≥0 , P) with respect to the filtration (F t ) t≥0 . To be more precise, we assume that (F t ) t≥0 satisfies the usual conditions, which are, (F t ) t≥0 is right-continuous and F 0 contains all P-nullsets of F. Let H i , i = 0, 1, 2 be real separable Hilbert spaces to be specified later on but they are typically associated with fractional powers of a linear operator. We consider integral equations of the form   A typical example where the integral Eq. (1) arises is when defining mild solutions of fractional order equations of the form [23], see Sect. 3.1 for more details. Here, A is a densely defined, possibly unbounded, non-negative operator on the Hilbert space H 0 , α ∈ (0, 2), β > 1 2 and κ > 0. The restriction β > 1 2 is needed otherwise the stochastic integral does not make sense even for constant G as for β ≤ 1 2 the function t → t β−1 is not square integrable on [0, T ]. For α ∈ (0, 1) (and U 1 = 0 in this case), Eq. (2) becomes a fractional stochastic heat equation, for α ∈ (1, 2) Eq. (2) becomes a fractional stochastic wave equation.
Time fractional stochastic heat type equations might be used to model phenomena with random effects with thermal memory [48]. In its simplest form, the fractional stochastic heat equation has the form Eq. (3) corresponds to (2) with β = κ = 1, α ∈ (0, 1) and U 1 = 0. Time fractional stochastic wave type equations may be used to model random forcing effects in viscoelastic materials which exhibit a simple power-law creep behaviour [13,38]. The simplest form of the fractional stochastic wave equation takes the form where and U 1 is the initial data forU . Equation (4) corresponds to (2) with β = κ = 1 and α ∈ (1, 2).
In both cases the parameters β and κ can be used to model the time-regularity of the stochastic, respectively, the deterministic feedback. For example, when β < 1, then the driving process is rougher than the Wiener process, while if β > 1, it is smoother. It is important to note that while the parameter choice in (2) corresponding to α = β = κ = 1 is the standard stochastic heat equation, the parameter choice β = κ = 1 and α = 2 does not result in the standard stochastic wave equation but in something much more irregular as in this case the noise will driveU and notÜ . The standard stochastic wave equation would correspond to the choice β = κ = 2 and α = 2. This case is not covered by our paper, since in our setting crucial estimates has constants blowing up as α → 2. In particular, the constants in the fundamental regularity estimates (20) and (21) will blow up, and therefore we cannot say anything for the limiting case α = 2 by taking α → 2.
Our aim is to approximate stochastic integro-differential equations of the type (1) and derive error estimates in pathwise Hölder norms in time. To this end we consider approximations of (1) given by the following integral equation

t − s)G(s, U n (s)) dW H (s).
Here, the approximation U n can typically be a spatial approximation derived via a spectral Galerkin or a continuous finite element method. The main purpose of our work is to derive rates of convergence of the strong error over C γ ([0, T ]; H 0 ); that is, we derive error estimates for U − U n in L p ( ; C γ ([0, T ]; H 0 )). Here, for a function f : [0, T ] → E, where E is a Banach space, the Hölder seminorm is defined by In particular, we derive a rate of convergence of the strong error over C([0, T ]; H 0 ) (the space of H 0 -valued continuous functions on [0, T ] equipped with the supremum norm); that is, an error estimate for U −U n in L p ( ; C([0, T ]; H 0 )). These error estimates are derived given that one has access to deterministic nonsmooth error estimates for S i − S i n and d dt (S i − S i n ). The main point is that the rate of convergence in these norms can be directly read off the deterministic error rates. While traditionally error estimates for d dt (S i − S i n ) are seldom considered they are not out of reach in many cases (see, for example, [55,Theorem 3.4], for finite elements for parabolic problems). We demonstrate by two examples how to obtain such estimates for fractional order equations both for spectral Galerkin and for a standard continuous finite element method. In general, when S i are resolvent families for certain parabolic integro-differential problems arising, for example, in viscoelasticity, [3,13,38,52], these nonsmooth data estimates are direct consequences of the smoothing property of the resolvent family of the linear deterministic problem, at least for the spectral Galerkin method.
Our motivation for considering estimates in such norms is twofold. Firstly, estimates with respect to the L p ( ; C([0, T ]; H 0 ))-norm are useful for using standard localization arguments [29,51] in order to extend approximation results for equations with globally Lipschitz continuous nonlinearities to results for equations with nonlinearities that are only Lipschitz continuous on bounded sets. We refer to [18,Section 4] for further details in the semigroup case. Secondly, as Remark 2.10 shows, the processes U and U n can be viewed as random variables in L p ( ; C γ ([0, T ]; H 0 )) and therefore it natural to consider the approximation error in the corresponding norm. For further applications of approximations in Hölder norms we refer to [18] and [4].
Finally, we would like to emphasize that the derivation of error estimates in the L p ( ; C([0, T ]; H 0 ))-norm is usually a nontrivial task even when the operator family S 1 is a semigroup. This is because, in general, the stochastic convolution appearing in (1) fails to be a semimartingale and hence Doob's maximal inequality cannot be applied to obtain estimates with respect to the L p ( ; C([0, T ]; H 0 ))-norm. In case the operator family S 1 is a semigroup one may employ the factorization method of Da Prato, Kwapien and Zabczyk [12] directly to obtain such estimates, for an instance, see, for example, [37]. However, when the semigroup property does not hold; that is, there is a nontrivial memory effect in the equation, then even this approach fails.

The state of the art
For analytical results, such as existence, uniqueness and regularity of various stochastic Volterra-type integro-differential equations driven by Wiener noise we refer to [3, 5, 8-10, 13, 14, 32-34] and for results concerning asymptotic behaviour of solutions to [7,24]. The particular case of fractional order equations driven by Wiener noise are considered in [11, 21-23, 35, 43, 48]. Various integro-differential equations driven by Lévy noise are analysed in [20,30] with the particular case of fractional order equations in [6]. Finally a class of linear Volterra integro-differential equations driven by fractional Brownian motion are investigated in [53,54].
The main purpose of our work is to derive rates of convergence for space approximations. Here, we consider the strong error over To our knowledge all existing work on the numerical analysis of stochastic fractional order differential equations are considering a much weaker error measure; that is, the H 0 (mainly for p = 2), see, for example, [26,27,31,38,40,56], or the weak error [1,31,39,41]. For similar works in the setting of abstract evolution equations without memory kernel we refer to [16,17], and [18].

The structure of the paper
The paper is organized as follows. In Sect. 2 we introduce the abstract setting and the assumptions which we will use. Here we illustrate the applicability of our setting by several examples. In Lemma 2.9 we state and prove a basic existence and uniqueness result for the solution of (1) and specify the time-regularity of the solution in Remark 2.10. Our main abstract approximation result estimating the difference U − U n in L p ( ; C γ ([0, T ]; H 0 )) is contained in Theorem 2.11 while in L p ( ; C([0, T ]; H 0 )) in Corollary 2.12. In Sect. 3 we apply Theorem 2.11 and Corollary 2.12 to two typical space discretization schemes. In particular, in Subsection 3.1 we consider the general fractional-order Eq. (2) (with time-independent coefficients, for simplicity) and its spectral Galerkin approximation and apply Theorem 2.11 and Corollary 2.12 to obtain rates of convergence depending on the parameters in the equation. In Examples 3.6, 3.7, and 3.8 we state the results in some simplified settings for the stochastic heat equation, the fractional stochastic heat and wave equations, respectively. In Subsection 3.2 we consider a fractional stochastic wave equation and its finite element approximation and apply Theorem 2.11 and Corollary 2.12 again to obtain rates of convergence. Here, in Remark 3.13, we point out in which way the stochastic heat equation fits in our abstract framework and we also show that using the setup of the paper one may remove some unnecessary smoothness assumption on G which was present in [16,Proposition 4.2]. Finally, in Sect. 4, we present some numerical experiments for a fractional stochastic wave equation to verify the theoretical rates obtained in Subsections 3.1 and 3.2. In particular, in Subsection 4.1, we present some numerical results for the spectral Galerkin approximation and space-time white noise, while in Subsection 4.2, we describe some experiments for the finite element method and trace-class noise.

Notation
We denote by R + 0 the set {t ∈ R : t ≥ 0}. For Banach spaces V and W we denote by L(V , W ) the space of bounded linear operators from V into W endowed with the norm and we denote the norm by · L(V ) . For an operator valued function S : for an orthonormal basis (e n ) ⊂ V . Let H be a real, separable, infinite-dimensional Hilbert space and let A : D(A) ⊆ H → H , be an unbounded, self-adjoint, and positive definite operator with compact inverse. For ξ ∈ R one defines the fractional power A ξ of A via the standard spectral functional calculus of A. For ξ ≥ 0 we equip We Let ( , F, P) be a probability space and let L p ( ; E) denote the space of random variables X :

The abstract approximation result
Let H and H 0 be two real separable Hilbert spaces. Let W H be a H -cylindrical Wiener process on a complete, filtered probability space ( , F, (F t ) t≥0 , P) with respect to the filtration (F t ) t≥0 , the latter satisfying the usual conditions. We consider the following integral equation To specify the assumptions on the coefficients F and G, let us fix two real separable Hilbert spaces H 1 and H 2 . Later on, we will see in the examples that these spaces will be interpolation spaces associated with a linear operator. and and (3) the H 0 -valued process {X 0 (t)} t∈[0,T ] is predictable and, for some p > 2, We note that the t-dependence of F and G may be weakened considerably and they may also be stochastic. Concerning the families S i we suppose the following.

Assumption 2.2
We assume that the linear operator families S i are strongly continuously differentiable as operators from H i to H 0 on (0, T ), i = 1, 2. Furthermore, we assume that (2) there exists a function s 2 ∈ L 1 ([0, T ]; R + 0 ) and a constant 0 < γ 2 < 1 such that We consider an approximation of (5) given by the following integral equation where {X 0 n (t)} t∈[0,T ] is H 0 -predictable and X 0 n ∈ L p ( ; L ∞ ([0, T ]; H 0 )). Concerning the approximation we make the following assumptions.
We assume that the linear operator families S i n are strongly continuously differentiable as operators from H i to H 0 on (0, T ), i = 1, 2. Furthermore, we assume that (2) there exists a function h 2 ∈ L 1 ([0, T ]; R + 0 ) such that for all n ∈ N we have Note that Assumptions 2.2 and 2.3 imply that for some C > 0, for all n ∈ N, for all x ∈ H 1 , t ∈ (0, T ), and for all x ∈ H 2 , t ∈ (0, T ).
for t > 0 and ξ ≥ 0. Then Assumption 2.2-(1) is satisfied for any 0 < γ 1 < 1 2 for s 1 (t) = Mt γ 1 −1 and Assumption 2.2-(2) is satisfied for any 0 < γ 2 < 1 for s 2 (t) = Mt γ 2 −1 . The simplest example of an approximation procedure that we have in mind is the spectral Galerkin method. We define a family of finite-dimensional subspaces {H n : n ∈ N} of H 0 by H n = span{e 1 , e 2 , . . . , e n } and define the orthogonal projection where (· , ·) H 0 denotes the inner product of H 0 . It is easy to see that The approximating operators become S 1 n (t) = S 2 n (t) = P n S(t) = S(t)P n := S n (t).
Using eigenfunctions and eigenvalues of A we can write For ν ≥ 0, we set r 1 (n) = r 2 (n) = λ −ν n+1 . We then have 1 Thus, Assumption 2.3 is satisfied with

Example 2.5 (Fractional stochastic heat Equation)
Here we consider the fractional stochastic heat Eq. (3) with mild solution given by (17) with parameters α ∈ (0, 1), β = κ = 1, u 0 = U 0 and u 1 = 0, where the operator family S α,β is defined by (18) via its Laplace transform. We use the setting of the previous example for A, F and G; that is, consider the global Lipschitz case. In this case we have S 1 (t) = S 2 (t) = S α,1 (t) with smoothing properties specified in Lemma 3.1. In particular, we have for ξ ∈ [0, 1] and t > 0. Then, Assumption 2.2-(1) is satisfied for any 0 < γ 1 < 1 2 for s 1 (t) = Mt γ 1 −1 and Assumption 2.2-(2) is satisfied for any 0 < γ 2 < 1 for s 2 (t) = Mt γ 2 −1 . The approximating operators in this case become Thus, Assumption 2.3 is satisfied with It is important to note that the additional restriction max(ν 1 , ν 2 ) ≤ 1 applies as S α,1 has only the limited smoothing properties shown in Lemma 3.1. This implies that the rates improve as α decreases to γ i , however, remain constant once we have α ≤ γ i . Example 2.6 (Fractional stochastic wave equation) Here we consider the fractional stochastic wave equation (4) with mild solution given by (17) with parameters α ∈ (1, 2), β = κ = 1, u 0 = U 0 and u 1 = U 1 , where the operator family S α,β is defined by (18) via its Laplace transform. We use the setting of the previous example for A, F and G; that is, consider the global Lipschitz case. In this case we have The approximating operators in this case become Thus, Assumption 2.3 is satisfied with We see here that the rate deteriorates with increasing α.
We will often make use of the following results on the Hölder regularity of deterministic and stochastic convolutions.
for all x ∈ Y 1 . Suppose, moreover, that there exists a function g ∈ L 1 ([0, T ]; R + 0 ) and a constant θ ∈ (0, 1) such that for all t ∈ (0, T ) we have Then, (a) the convolution and hence * is well defined almost surely. In [17,Proposition 3.6] it is shown, that under the assumptions of the theorem, almost surely, there isC > 0, depending only on θ , such that The estimate in (b) follows by taking the pth power and expected value of both sides of the inequality. Finally the estimate in (c) follows from the estimate in (b) by noting that ( * )(0) = 0. Lemma 2.8 Let Y 1 and Y 2 be real separable Hilbert spaces. Let T > 0 and suppose that the process : Then, (a) the stochastic convolution process whereC only depends on γ, θ, p and T ; (c) the modification of whereC only depends on γ, θ, p and T .
As Y 1 is a Hilbert space, the statement in (a) follows from [17, Lemma 3.2] by noting that in this case where the latter denotes the space of gamma radonifying operators from Then, by [17,Corollary 3.4], there exists a modification of and a constant C depending on γ, η, p such that whereC depends on γ, p , T and η, with the latter ultimately depending on γ, θ and p . We used [17,Corollary 3.4] in the first inequality and (10) in the second. Finally the estimate in (c) follows from the estimate in (b) by noting that ( )(0) = 0.
Next we state a basic existence and uniqueness result.

Proof
The proof is fairly standard as it uses Banach's fixed point theorem and therefore we only sketch a proof. Let T > 0 and set For λ > 0, later to be chosen appropriately, we endow X T with the norm Note, the latter definition is equivalent to the natural norm of In the first inequality above we used the fact that L i (U )(0) = 0. In the second inequality, for i = 1, we used Lemma 2.7 with Y 1 = H 2 , Y 2 = H 0 and the linear growth of F while, for i = 2, we used Lemma 2.8 with see the estimate of I 1 in the proof of Theorem 2.11 for more details. We similarly have that see the estimate of I 3 in the proof of Theorem 2.11 for more details. Thus, by choosing A similar argument shows, by defining the corresponding mapping L n in an obvious way, the existence and uniqueness of U n with the terms (7) and (8)) yielding a uniform in n contraction constant 0 < M < 1 of L n . The estimate (11) follows from the following simple estimate where λ is large enough so that 0 < M < 1 and similarly for U n .

Remark 2.10 (Regularity) Observe that under the assumptions of Lemma 2.9, if also
Indeed, we have that U = X 0 + L 2 (U ) + L 1 (U ) hence the claim for U follows from the second inequality in (12) and (11); with an analogous argument showing the claim for U n .
Now we are ready to present our main result.

Theorem 2.11 Let p > 2 and let Assumption 2.1, Assumption 2.2, and Assumption 2.3 be satisfied with
Then there exists a constant C > 0, depending on γ, γ 1 , γ 2 , p and T , such that for all n ∈ N we have where the rate functions r i , i = 1, 2, are introduced in Assumption 2.3.
Proof Let t ∈ [0, T ] and fix a parameter λ > 0. Then Note that, by Remark 2.10, we have that In the following we estimate term by term. Estimate of I 1 : To estimate I 1 , we will use Lemma 2.7 by setting Due to Assumption 2.2-(2), we know that the assumptions of Lemma 2.7 are satisfied with g(t) = (2 + λt)e −λt s 2 (t) and θ = γ 2 . Indeed, we have thaṫ Therefore, as γ 2 < 1, Thus, we can infer by Lemma 2.7 The Lipschitz continuity of F then gives Estimate of I 2 : To estimate I 2 we will use again Lemma 2.7 by setting and (t) := e −λt F(t, U n (t)). We have that ∈ L p ( ; L ∞ ([0, T ]; H 2 )) since F is of linear growth in the second variable, uniformly in t ∈ [0, T ], and since we also have that U n ∈ L p ( ; L ∞ ([0, T ]; H 0 )) by Lemma 2.9. Assumption 2.3-(2) implies that the assumptions of Lemma 2.7 are satisfied for all n ∈ N with g(t) = (2 + λt)e −λt h 2 (t) and θ = γ 2 as a similar calculation as in the case of I 1 shows. Therefore, we can infer from Lemma 2.7 that for all n ∈ N, The linear growth of F gives for all n ∈ N 1 r 2 (n)

Estimate of I 3 :
Here, we will apply Lemma 2.8 with We have that the process is predictable as G is Lipschitz continuous and U , U n are predictable by Lemma 2.9 and also that ∈ L p ( ; again by Lemma 2.9. Due to Assumption 2.2-(1) the assumptions of Lemma 2.8, with g(t) = (2 + λt)e −λt s 1 (t) and θ = γ 1 are satisfied, as a calculation similar to that in the case of I 1 shows. Hence, it follows from Lemma 2.8, that The Lipschitz continuity of G then gives By the same calculation used for estimating I 1 in (14), where s 2 is replaced by s 1 , we get

Estimate of I 4 :
To estimate I 4 again we will use Lemma 2.8 by setting and (t) := e −λt G(t, U n (t)). The process is predictable as G is Lipschitz continuous and U n is predictable by Lemma 2.9 and ∈ L p ( ; L ∞ ([0, T ]; L H S (H , H 1 ))) as G is of linear growth in the second variable, uniformly in [0, T ], and U n ∈ L p ( ; L ∞ ([0, T ]; H 0 )) by Lemma 2.9. Assumption 2.3-(1) implies that the assumption of Lemma 2.8 are satisfied for all n ∈ N with g(t) = (2 + λt)e −λt h 1 (t) and θ = γ 1 , as a calculation similar to that in the case of I 1 shows. Hence, we can infer that for all n ∈ N, The linear growth of G then gives, for all n ∈ N, In this way we get Note that, by Lemma 2.9 and by our assumption X 0 n L p ( ;L ∞ ([0,T ];H 0 )) < K for all n ∈ N, there exists a constant C > 0 such that Furthermore, using Lebesgue's Dominated Convergence Theorem, it follows that T 0 (2 + λt)e −λt (s 1 (t) + s 2 (t)) dt → 0 as λ → ∞, and, hence, with λ > 0 large enough, we may absorb the last term on the right hand side of (15) into the left hand side to conclude that Remembering that e λ (0) = e(0) a straightforward calculation shows that A similar calculation implies that, remembering that e 0 (0) = e(0), and the proof is complete in view of (16).
We end this section with the special important case γ = 0. Proof In a completely analogous fashion and using the same notation as in the proof of Theorem 2.11 using this time specifically item (c) from Lemma 2.7 and 2.8 one concludes that

Corollary 2.12 Let p > 2 and let Assumptions 2.1 -2.3 be satisfied with
Now the proof can be completed the same way as that of Theorem 2.11.

Applications
In this section we give two typical instances where our abstract results are applicable.
Proof It is shown in [23,Lemma 5.4] that for all x ∈ H 0 , where the contour is given by where ρ > 0, φ > π 2 and αφ < π. Therefore, as A ξ is a closed operator, one has that S α,β (t)x ∈ D(A ξ ) and It is shown in the proof of [23,Lemma 5.4] that and hence (22) holds and To show (21) note that, by [23,Lemma 5.4] the function t → S α,β (t)x can be extended analytically to a sector in the right half-plane for all x ∈ H 0 . In particular, the function t → S α,β (t)x is differentiable for t > 0. Hence, we have provided that for every t > 0 there is > 0 and K = K (t, ) > 0 such that In a completely analogous fashion as in the case of estimate (23), we get and thus (24) holds and which finishes the proof.

Remark 3.2
We would like to point out two crucial points concerning (20) and (21). First, unless α = β = 1, which correspond the heat equation, the estimates do not hold for ξ > 1. Furthermore, the constant M in the estimates blows up as α → 2 as in this case we must have φ → π/2 and hence we integrate on a path with infinite line segments approaching the imaginary axis.
Proof Estimate (28) follows from Theorem 2.11 using Assumption 3.3 and estimates (25) and (27). To estimate the initial terms first note that as S α,1 (0) = I and S α,2 (0) = 0 it follows that It is shown in [23,Lemma 5.4] that if x ∈ D(A ν+ γ α ) with ν + γ α ≤ 1, then the function v(t) := S α,1 (t)x − x admits a fractional derivative of order γ defined by A straightforward calculation shows that (see also, [57, Vol. II, p. 138], [15]) Therefore, Here, for the equality in the second row of the calculation, we used the fact that the

(t)x admits a fractional derivative in D(A ν ) and
Then, similarly as above we conclude that and the proof of (a) is complete. To show the claims in (b) first note that estimate (29) follows from Corollary 2.12 by choosing γ 1 sufficiently close to 1 2 − 1 p and γ 2 sufficiently close to 1 together with Assumption 3.3 and estimates (25) and (27). To estimate the initial terms first note that, by (20) with ξ = 0 and β = 1, using also the fact that S α,1 and A −ν commutes for ν ∈ [0, 1], for all t ≥ 0. Therefore, For the second initial term, let first ν ≤ 1 α . Then, by (20) with ξ = ν and β = 2, On the other hand, if 1 ≥ ν > 1 α , then by (20) with ξ = 1 α and β = 2, In summary, if u 1 ∈ L p ( ; D(A max(0,ν− 1 α ) )), then which finishes the proof of item (b) and hence that of the theorem.

Remark 3.5
In general, we may observe that larger values of the parameters β and κ allow for higher rates of convergence in Theorem 3.4 as larger values of these parameters correspond to better time-regularity of the stochastic, respectively, deterministic feedback. Note also, that the operator family S α,2 , which is S α,β with β = 2 has a stronger smoothing effect than S α,1 , which is S α,β with β = 1, as Lemma 3.1 shows. This explains why the regularity requirement in Theorem 3.4 is stricter on u 0 than that on u 1 for the same rate of convergence. for γ < min( 1 2 − 1 p − γ 1 , 1 − γ 2 ) and ν = min(ν 1 , ν 2 ) where ν 1 < γ 1 − δ G and ν 2 < γ 2 − δ F , provided that u 0 ∈ L p ( ; D(A ν+γ )). Furthermore, D(A ν )). This is consistent with [16,Proposition 3.1].

Example 3.8 (Fractional stochastic wave equation)
The simplest fractional stochastic wave equation, considered also in Example 2.6 in the standard global Lipschitz setting, corresponds to parameters β = κ = 1, α ∈ (1, 2). In this case, Theorem 3.4, yields . We see that the rates deteriorate with increasing α.

Finite element approximation of a stochastic fractional wave equation
In this section we give a more concrete example to demonstrate how the abstract framework can be used for finite element approximation. Let D ⊂ R d be a bounded convex polygonal domain and let A = − be the Dirichlet Laplacian in (H 0 , · Let p > 2 and consider a fractional stochastic wave equation [13,38] given by where W H is a cylindrical Wiener process in H = H 0 and Q : H → H is a linear, symmetric, positive semidefinite, trace class operator on H with an orthonormal basis of eigenfunctions {e k : k ∈ N} with e k L ∞ (D) ≤ M for all k = 1, 2, . . . . The initial data U 0 ∈ L p ( ; H 0 ) for some p > 2 is assumed to be F 0 -measurable. The kernel b is the Riesz kernel given by

S(t − s)G(U (s)) dW H (s). (31)
Here the resolvent family {S(t)} t≥0 is a strongly continuous family of bounded linear operators on H 0 , which is strongly differentiable on (0, ∞) such that the function t → S(t)x is the unique solution oḟ see [52,Corollary 1.2]. Remark 3. 9 In connection with Subsection 3.1, in particular (19), by integrating (32) from 0 to t, one sees that in fact S(t) = S α+1,1 (t).
The resolvent family S has the following smoothing properties [47]: For spatial approximation of (30) we consider a standard continuous finite element method. Let {T h } 0<h<1 denote a family of triangulations of D, with mesh size h > 0 and consider finite element spaces {V h } 0<h<1 , where V h ⊂ H 1 0 (D) consists of continuous piecewise linear functions vanishing at the boundary of D. We introduce the "discrete Laplacian" (see, for example, [55, page 10]) where (·, ·) H 0 denotes the inner product of H 0 , and the orthogonal projection We consider the approximated problem with mild solution given by Similarly to the resolvent family {S(t)} t≥0 , the resolvent family {S h (t)} t≥0 is a strongly continuous family of bounded linear operators on V h , which is strongly differentiable on (0, ∞) such that for χ ∈ V h the V h -valued function t → S h (t)χ is the unique solution ofu Let E h (t) := S(t) − S h (t)P h denote the deterministic error operator. We have the following error bounds.
Proof The error bound (36) is shown in [40,Proposition 3.3]. The error estimate (37) for β = 1 is proved in [46, Theorem 2.1] while for β ∈ [0, 1) it follows immediately using also the stability estimate E h (t) ≤ C; the latter is a consequence of (36) with β = 0. Thus, we have to prove (38). It is shown in [46,Eq. (2.2)] that the Laplace transform E(z) of E h satisfies the error estimate in a symmetric sectorial region containing the right half-plane. We write .
We can now prove an error estimate in Hölder norms.

Theorem 3.11
Let U and U h be given by (31) and ( (45) holds. If U 0 ∈ L p ( ; D(A β(1+ ) )) for some > 0, then and thus s 1 ∈ L 1 ((0, T ]; R + 0 ). Furthermore, by Proposition 3.10, for 0 < h < 1. We have that h 1 ∈ L 1 ((0, T ]; R + 0 ) if and only if −β(α + 1) − 1 + γ 1 > −1; that is, when β < γ 1 α+1 . Then, the error bound in (43) follows from Theorem 2.11 for each fixed 0 < h < 1 with 1 n = h −2β E h , r 1 (n) = h 2β for all n ∈ N and 2 n = 0, r 2 (n) = h 2β for all n ∈ N using (47) and (48) and noting that the function h 1 in (48) is independent of h and then so is the constant C in the error estimate (13) of Theorem 2.11. To show (44) note that, by a standard finite element estimate, we have Furthermore, for x ∈ H 0 , by choosing χ = P h x in (35), we see that the function t → S h (t)P h x is the unique solution of (35); that is, and therefore it follows that where we used the stability estimate S h (t)P h L(H 0 ) ≤ C, t ≥ 0. Using also (41) we conclude, by interpolation that Therefore, using (33) and (50) with μ = 1 1+α , it follows that Here we also used that, by the self-adjointness of A −1 , A h and P h , where the last inequality holds for δ ∈ [0, 1] for quasi-uniform meshes, see, for example, the proof of Theorem 4.4 (iv) in [36]. Thus, using interpolation in Hölder spaces and the smooth data estimate from (36), for all β < γ 1 α+1 and the proof of (44) is complete in view of (43) and (49). Next, the estimate (45) follows from Corollary 2.12 in view of (47) and (48). Finally, using (36), we immediately conclude that which finishes the proof of (46) in view of (45) and the proof of the theorem is complete.

Remark 3.12
If U 0 is deterministic, then we may take p arbitrarily large in Theorem 3.11 (similarly, in Theorem 3.4 in case u 0 and u 1 are deterministic). We also point out that the estimate on e 0 C γ ([0,T ];H 0 ) in the proof of Theorem 3.11 is not sharp in terms of the regularity of the initial data. This follows from the fact that we estimate the γ -Hölder norm by interpolation and not directly and hence more regularity on U 0 is assumed than what is necessary. However, a sharp, direct estimate on e 0 C γ ([0,T ];H 0 ) is not available in the finite element literature, and a derivation would be beyond the scope of this paper.

Remark 3.13 (Stochastic heat equation)
Here we briefly comment on the stochastic heat equation which also fits in our abstract framework. Suppose that F and G are as above and S(t) := e −t A is the heat semigroup and S h (t) := e −t A h P h , t ≥ 0. In this case the well-known error estimates, see [55,Chapter 3], hold for 0 < h < 1. Note that these are essentially (36)(37)(38) for α = 0. Then, similarly as in the proof of Theorem 3.11 we get, for p > 2, 0 < γ 1 < 1 2 − 1 p , γ < 1 2 − 1 p − γ 1 and β < γ 1 that the error estimate This result is consistent with [16,Proposition 4.2] but less smoothness on the noise is assumed here; that is, we may take δ G = 0.
This is consistent with Theorem 3.11 as in this case we may first take U 0 = 0 and hence take p in (45) arbitrarily large and then add the estimate for the initial term.

Numerical experiments
In this section, we will illustrate our theoretical results by some numerical experiments. The underlying equation we consider is the fractional stochastic wave Eq. (30), where D = [0, 1], F = 0, (U ) = I , and A = − is the Laplacian with Dirichlet boundary conditions in H 0 = (L 2 (D), · ) with inner product denoted by (·, ·). In particular, we will implement the numerical solution for the following equation: where W H is a H -cylindrical Wiener process with H = H 0 , b(t) = t α−1 / (α), α ∈ (0, 1), and Q : H → H is symmetric, bounded, and positive semidefinite.
In Subsection 4.1, we apply the spectral Galerkin method based on the eigenvalues λ k = k 2 π 2 and the orthonormal basis of corresponding eigenfunctions {e k : k ∈ N}. For the driving noise we take space time white noise; that is, Q = I . In particular, we take W H to be given by the formal series W H (t, x) . . } is a family of mutually independent standard scalar Brownian motions. To perform the integration in time, we use the Mittag-Leffler Euler integrator (MLEI) method, developed for semilinear problems in [38]. In the present linear setting this method is exact, that is, no additional time-discretization error is introduced and we may simulate the spatially approximated process exactly on a timegrid.
In Subsection 4.2, we approximate the solution of (51) by finite elements. We consider a Wiener process which is of trace class given by where β is a scalar Brownian motion and 1 [0,0.5] is the characteristic function of the interval [0, 0.5]. That is, the Fourier expansion of the driving Wiener process contains a single term only and thus its covariance operator is of rank 1 and hence trace class. The motivation for the particular choice of the Wiener process is to consider trace class noise which does not possess additional spatial smoothness. This is needed so that we do not observe higher convergence rate, due to additional regularity, in the numerical experiments than predicted by the theory for trace class noise. To perform the time integration we implement a Lubich Convolution Quadrature (LCQ) method, for details see [44,45]. This method was successfully applied to a similar problem of the third author in [40]. The LCQ method is easier to implement than the MLEI in case of finite elements and a correlated noise.

The spectral Galerkin method and the MLEI-method
The mild solution of (51) with space time white noise can be written as where, as shown in [38], the resolvent family {S(t)} t≥0 can be represented as where E ρ (z), ρ > 0, is the one parameter Mittag-Leffler function (MLF) defined by For more details about Mittag-Leffler function and their application, we refer to the paper [49] and the book [25]. Moreover, in order to implement the MLF, we use the Matlab function mlf.m, see [50]. For the discretization in space, we introduce the finite dimensional subspaces H N = span{e k : k = 1, 2, . . . , N } of H and the orthogonal projection P N : H → H N given by Using (54) we then get This way we obtain for the approximationŪ N m of U (t m ) given by (53)

by the Galerkin methodŪ
with initial valueŪ N 0 = P N U 0 . Let us defineŪ N m,k bȳ whereŪ N 0,k = (U (0), e k ) and To simulate the stochastic convolution process let us observe that is a M-dimensional Gaussian random variable with zero mean and covariance matrix Thus, N can be represented as K χ , where χ is an M-dimensional standard Gaussian random variable and K is the solution of equation K K T = R (see Theorem 2.2 of [28]); the equation K K T = R can be solved by the Cholesky factorization. In our numerical experiment, we simulated 100 sample paths to verify the rate of convergence in the L 2 ( ; C γ ([0, T ]; H 0 ))-norm for different γ and α. According to Example 3.8 we expect theoretical rate of ν < γ 1 1+α −δ G in the L p ( ; C γ ([0, T ]; H 0 ))norm for appropriately smooth and integrable initial data U 0 , where γ < 1 2 − 1 p − γ 1 and p > 2. Note that the parameter α in Example 3.8 corresponds to α + 1 in the present example, see Remark 3.9. Taking into account that λ N = N 2 π 2 and that Q = I and hence δ G > 1 4 we obtain a rate in N of almost 2( 1 2 − 1 p − γ )/(1 + α) − 1 2 . Note that since U 0 is a deterministic eigenfunction of A and thus U 0 ∈ L p ( ; D(A s )) for any p > 2 and s ≥ 0, we may bound the L 2 ( ; C γ ([0, T ]; H 0 ))-norm by the  Here, one may observe that if γ decreases, then the rate of convergence increases. Moreover, Figures 1-3 also show that the numerical rate of convergence is close to the theoretical rate.

The finite element method and the LCQ-method
We first perform a time discretization of (51) with Q 1 2 W H given by (52) by the first order LCQ-method, for more details see e.g. [40]. To describe the first order LCQ method, let = {0 = t 0 < t 1 < t 2 , · · · < t M = 1} be an equidistant partition of the time interval This is a first order quadrature; that is, it has an approximation order of O( t).
The above system can be rewritten in the following form  In our numerical experiment, we used 500 sample paths to verify the dependence of the rate of convergence in the L 2 ( ; C γ ([0, T ]; H 0 ))-norm on γ and α. The theoretical rate of convergence is almost (1−2γ )/(1+α) according to Theorem 3.11. Note again that, similarly to the previous example, since U 0 is a deterministic eigenfunction of A, ; that is, we take h = 1 N i , i = 1, 2, . . . , 6. Then, in order to measure the error, we computed a reference solution with a mesh size h = 1 2 11 . In Fig. 4, we present the error of the numerical approximation in the L 2 ( ; C γ ([0, T ]; H 0 ))-norm for α = 0.2 with varying values of γ (see also Fig. 5 and Fig. 6 for α = 0.25 and α = 0.3, respectively). Similarly to Sect. 4.1, in this section we also compute the numerical rate of convergence according to (56) for γ = 0, 0.025, 0.05, 0.075, 0.1. Here, one may observe that if γ decreases, then the rate of convergence again increases. Moreover, the figures also show that the numerical rate of convergence is close to the theoretical rate.
Acknowledgements The authors would like to thank the anonymous referees for their careful reading of the manuscript and for their useful comments that helped them to improve the presentation of the paper significantly.
Funding Open access funding provided by Pázmány Péter Catholic University.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. 56. Wu, X., Yan, Y., Yan, Y.: An analysis of the L1 scheme for stochastic subdiffusion problem driven by integrated space-time white noise. Appl. Numer. Math. 157, 69-87 (2020) 57. Zygmund, A.: Trigonometric series, vol. I, II (2nd edition). Cambridge University Press, New York (1959) Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.