Iterated Stochastic Integrals in Infinite Dimensions - Approximation and Error Estimates

Higher order numerical schemes for stochastic partial differential equations that do not possess commutative noise require the simulation of iterated stochastic integrals. In this work, we extend the algorithms derived by Kloeden, Platen, and Wright (1992) and by Wiktorsson (2001) for the approximation of two-times iterated stochastic integrals involved in numerical schemes for finite dimensional stochastic ordinary differential equations to an infinite dimensional setting. These methods clear the way for new types of approximation schemes for SPDEs without commutative noise. Precisely, we analyze two algorithms to approximate two-times iterated integrals with respect to an infinite dimensional $Q$-Wiener process in case of a trace class operator $Q$ given the increments of the $Q$-Wiener process. Error estimates in the mean-square sense are derived and discussed for both methods. In contrast to the finite dimensional setting, which is contained as a special case, the optimal approximation algorithm cannot be uniquely determined but is dependent on the covariance operator $Q$. This difference arises as the stochastic process is of infinite dimension.


Introduction
In order to obtain higher convergence rates in the approximation of stochastic differential equations, in general, we need to incorporate the information contained in iterated integrals. However, in general these integrals can not be simulated directly. Therefore, we need to replace these terms by an approximation. We illustrate this statement in a finite dimensional setting first, although, we are concerned about the approximation of iterated Itô integrals in infinite dimensions in this work.
In the numerical approximation of stochastic ordinary differential equations (SODEs) that do not possess commutative noise, iterated stochastic integrals have to be simulated to achieve a high order of convergence, see [3], [8]. One example of such a higher order scheme is the Milstein scheme developed in [6], which we present below to illustrate the issue. For some fixed d, K ∈ N, we consider a d-dimensional SODE of type dX t = a(X t ) dt + tm s tm dβ i r dβ j s for m ∈ {0, . . . , M − 1} using the notation Y m = Y tm . Under suitable assumptions, we obtain the following error estimate see [3]. If SODE (1) does not possess commutative noise, see [3] for details, the Milstein scheme cannot be simplified and one has to approximate the iterated stochastic integrals involved in the method. We denote these iterated Itô integrals by for some t ≥ 0, h > 0, and for all i, j ∈ {1, . . . , K}, where K ∈ N is the number of independent Brownian motions driving the SODE. The research by Kloeden, Platen and Wright [4] and by Wiktorsson [8] suggests different methods for an approximation of these integrals, the main ideas are outlined in Section 2. We denote byĪ (D) (i,j) (h) the approximation of I (i,j) (h) with the algorithm derived in [4] for i, j ∈ {1, . . . , K}, D, K ∈ N, h > 0. In [4], the authors proved that for all i, j ∈ {1, . . . , K} and h > 0, it holds where D ∈ N denotes the index of the summand at which the series representation of the stochastic double integral is truncated to obtain the approximationĪ (D) (i,j) (h). If we use the algorithm derived in [8] instead, we denote the approximation of I (i,j) (h) byÎ (D) (i,j) (h) for all i, j ∈ {1, . . . , K}, h > 0. This scheme employs the same series representation as proposed in [4] but incorporates an approximation of the truncated term additionally. The error resulting from this scheme is estimated as where D is again the index of the summand at which the series is truncated to obtain the approximation and K is the number of independent Brownian motions, see [8]. For fixed h and K, both approximations converge in the mean-square sense as D goes to infinity -with a different order of convergence, however. In the numerical approximation of SODEs, the integer D is determined such that the overall order of convergence in the time step is not distorted. For the Milstein scheme, for example, error estimate (2) is considered, that is, a strong order of convergence of 1 can be achieved. Therefore, D ≥ C h is chosen for the method derived in [4], whereas D ≥ √ is selected for the algorithm developed in [8], see also [3,Cor. 10.6.5]. This shows that if we decrease the step size h, the value for D has to increase faster for the scheme developed in [4]. Note that the error estimate (4) depends on the number of Brownian motions K as well. As this number is fixed in the setting of finite dimensional SODEs, this factor is not crucial but simply a constant. Therefore, the algorithm proposed by Wiktorsson [8] is superior to the one derived in [4] in terms of the computational effort when a given order of convergence in the step size h is to be achieved.
The same issue arises in the context of higher order numerical schemes designed for infinite dimensional stochastic differential equations that need not have commutative noise. There, we also have to approximate the involved iterated stochastic integrals in order to implement the scheme. This time, however, the stochastic process is infinite dimensional, in general. In this work, we aim at devising numerical algorithms for the simulation of iterated integrals which arise, for example, in the approximation of the mild solution of stochastic partial differential equations (SPDEs) of type where the commutativity condition from [2] for all y ∈ H β , u, v ∈ U 0 is not assumed to hold. Here, H β = D((−A) β ) denotes a separable Hilbert space for some β ∈ [0, 1). The operators A, F , B, and the initial value ξ are assumed to fulfill the conditions imposed for the existence of a unique mild solution, see [1], and are not specified further. The spaces are introduced in Section 2 and (W t ) t≥0 denotes a Q-Wiener process taking values in some separable Hilbert space U for some trace class operator Q. In order to approximate the mild solution of SPDEs of type (5) with a higher order scheme, we need to simulate iterated stochastic integrals of the form for t ≥ 0, h > 0, and some operators Ψ, Φ specified in Section 2. These terms arise if condition (6) is not fulfilled, for example, in the Milstein scheme for SPDEs [2]. In this Milstein scheme, it holds Ψ = B ′ (Y t ) and Φ = B(Y t ) for some B : H → L HS (U 0 , H) and an approximation Y t ∈ H β with t ≥ 0 and β ∈ [0, 1), where L HS (U 0 , H) denotes the space of all Hilbert-Schmidt operators from U 0 to H. For more details, we refer to [2].
We want to emphasize that the algorithms developed for the approximation of iterated stochastic integrals in the setting of SODEs are designed for some fixed finite number K of driving Brownian motions and that the approximation error (4) even involves this number K as a constant. In contrast, when approximating the solution of SPDEs driven by an infinite dimensional Q-Wiener process, this number corresponds to the dimension of the finite-dimensional approximation subspace where the Q-Wiener process is projected in. Thus, the dimension K of the approximation subspace has to increase, in general, to attain higher accuracy, i.e., K is not constant anymore but has to increase as well; see the error estimate of the Milstein scheme for SPDEs in [2], for example. Therefore, this aspect has to be taken into account in order to identify an appropriate approximation algorithm. In the following, we derive two algorithms for the approximation of iterated integrals of type (7) based on the methods developed for the finite dimensional setting by Kloeden, Platen, and Wright [4] and by Wiktorsson [8]. These algorithms allow for the first time to implement higher order schemes for SPDEs that do not possess commutative noise and include the algorithms that can be used for finite dimensional SODEs as a special case. We show that the algorithm that is superior in the setting of an infinite dimensional Q-Wiener process cannot be uniquely determined in general but is dependent on the covariance operator Q. In the analysis of the approximation error, we need to incorporate the eigenvalues of the covariance operator Q. For the algorithm based on the approach by Kloeden, Platen, and Wright [4], we obtain a similar estimate as in (3) in the mean-square sense, see Corollary 2.2. For the method derived in the work of Wiktorsson [8], we can prove two differing error estimates for the case of infinite dimensional Q-Wiener processes by different means. One is the same, apart from constants, as estimate (4). Moreover, the fact that we integrate with respect to a Q-Wiener process with a trace class operator Q allows for an alternative proof to the one given in [8]. The result allows a possibly superior convergence in K -this depends on the rate of decay of the eigenvalues of Q. Details can be found in Theorem 2.4 and Theorem 2.6.

Approximation of Iterated Stochastic Integrals
Throughout this work, we fix the following setting and notation. Let H and U be separable realvalued Hilbert spaces. In the following, let (Ω, F, P, (F t ) t≥0 ) be a probability space, let (W t ) t≥0 denote a U -valued Q-Wiener process with respect to (F t ) t≥0 where Q is a trace class covariance operator, and let U 0 := Q 1/2 U . We define L(U, H) U 0 := {T | U 0 : T ∈ L(U, H)} which is a dense subset of the space of Hilbert-Schmidt operators L HS (U 0 , H) [7]. Moreover, we assume that the operators Φ and Ψ in (7) fulfill for some α ∈ (0, ∞). The parameter α determines the rate of convergence for the approximation of the Q-Wiener process, see Theorem 2.1 or Theorem 2.4. Note that assumption (A1), needed to prove the convergence of the approximation algorithms for iterated integrals in Theorem 2.1, Theorem 2.4, and Theorem 2.6, is less restrictive than the condition imposed on the operator B in SPDE (5) to obtain the error estimate for some numerical scheme to approximate its mild solution, e.g., in [5]. However, for the Milstein scheme in [2], assumption (A2) does not need to be fulfilled for the error analysis of the Milstein scheme to hold true.
If we are interested in the approximation of, for example, the mild solution of (5), a combination of the error estimate for a numerical scheme to obtain this process and the error from the approximation of the iterated integrals has to be analyzed. In this case, we impose the following assumptions instead For the convergence results in this case, we refer to Corollary 2.2 and Corollary 2.5, which have to be combined with estimates on the respective numerical scheme. These weaker conditions are sufficient as in the proof the Q-Wiener process is approximated before the iterated integral is compared to the approximation.
Let Q ∈ L(U ) be a nonnegative and symmetric trace class operator with eigenvalues η j and corresponding eigenfunctionsẽ j for j ∈ J where J is some countable index set. The eigenfunctions {ẽ j , j ∈ J } constitute an orthonormal basis of U , see [ Here, (β j t ) t≥0 with j ∈ {k ∈ J | η k > 0} are independent real-valued Brownian motions. As the Q-Wiener process (W t ) t≥0 is an infinite dimensional stochastic process, it has to be projected to some finite dimensional subspace by truncating the series (8) such that it can be simulated in a numerical scheme. For K ∈ N, we denote by (W K t ) t≥0 the projected Q-Wiener process, which is defined as for some finite index set J K ⊂ J with |J K | = K. This expression allows to write the iterated integral with respect to the projected Q-Wiener process (W K t ) t≥0 for any t ≥ 0 and h > 0 as for i, j ∈ J K . Therefore, we aim at devising a method to approximate the iterated stochastic integrals I Q (i,j) (t, t + h) for all i, j ∈ J K . Below, we introduce two such algorithms and analyze as well as discuss their convergence properties. For simplicity of notation, we assume, without loss of generality,

Algorithm 1
In the following, we mainly adapt the method introduced by Kloeden, Platen, and Wright [4] to the setting of infinite dimensional stochastic processes. Here, we additionally have to take into account the error arising from the projection of the Q-Wiener process to a finite dimensional subspace.
For some t ≥ 0, the coefficients of the projected Q-Wiener process w j t := W t ,ẽ j U are independent real valued random variables that are N (0, η j t) distributed for j ∈ J . Thus, the increments ∆w j h := W t+h − W t ,ẽ j U can be easily simulated since ∆w j h is N (0, η j h) distributed for j ∈ J and h ≥ 0. Our goal is to obtain an approximation of the iterated integrals I Q (i,j) (h) for all i, j ∈ J K , K ∈ N, h > 0 given the realizations of the increments ∆w j h for j ∈ J K . The following derivation of the approximation method follows the representation in [4] closely. Below, let K ∈ N be arbitrarily fixed and let us introduce the scaled Brownian bridge process (w j s − s h w j h ) 0≤s≤h for j ∈ J K and some h ∈ (0, T ]. We consider its series expansion which converges in L 2 (Ω). The coefficients are given by the following expressions for all j ∈ J K , r ∈ N 0 , and all 0 ≤ s ≤ h ≤ T , see also [4]. All coefficients a j r and b j r are independent and N (0, η j h 2π 2 r 2 ) distributed for r ∈ N and j ∈ J K and it holds a j 0 = −2 ∞ r=1 a j r . In contrast to [4], the distributions of the coefficients additionally depend on the eigenvalues η j of the covariance operator Q. In order to obtain an approximation of the scaled Brownian motion (w j s ) 0≤s≤h for some h ∈ (0, T ], we truncate expression (10) at some integer R ∈ N and define In fact, we are interested in the integration with respect to this process. According to Wong and Zakai [9,10], or [3, Ch. 6.1], an integral with respect to process (11) converges to a Stratonovich integral J(h) as R → ∞. We are, however, interested in the Itô stochastic integral. Following [3, [3, p. 171]. This implies that we only have to approximate I Q (i,j) (h) for i, j ∈ J K with i = j. Thus, we obtain the desired approximation of the Itô stochastic integral directly by integrating with respect to process (11). Without loss of generality let t = 0. By (10), we obtain the following expression for the iterated stochastic integral for all i, j ∈ J K , i = j, and h > 0. Here, we employed the fact that Expression (12) involves some scaled Lévy stochastic area integrals which are defined as for all i, j ∈ J K , i = j, h > 0. We approximate these terms instead of the iterated stochastic integrals, as proposed in [4] and [8]. Due to the relations ,j≤K in order to relate to the derivation in [8]. This representation entails the random variables that are all independent for i ∈ J K , r ∈ N. As described above, we only need Therefore, we write and introduce the selection matrix which defines the integrals that have to be computed, compare to [8]. Further, we define the matrix This allows to express the vectorÃ Q (h) as . . , Z Q rK ) T that are independent and identically N (0 K , Q K ) distributed for all r ∈ N. As expression (18) contains an infinite sum, we need to truncate it in order to compute this vector. For some D ∈ N, this approximation is denoted as and we specify the remainder denote the matrix containing the standard Lévy stochastic area integrals that correspond to the case that Q K = I K , i.e., η j = 1 for all j ∈ J K . Therewith, we obtain the relationship whereÃ I (h) := H K vec(A I (h) T ) and where we employed and the fact that we are interested in indices i, j ∈ J K with i < j only. We denotẽ , such that the vector of interest is given bỹ  (19) and (20), the truncated part ofÃ I (h) and its truncation error, respectively. Note thatÃ I (h) and especiallyÃ I,(D) (h) correspond to the case where η j = 1 for all j ∈ J K , i.e., Q K = I K in (18) and (19), respectively. This also corresponds to the setting in [4] if J is finite.
We summarize the representation above to formulate Algorithm 1 for some h > 0, t, t+h ∈ [0, T ], and D, K ∈ N:

ApproximateÃ Q (h) as
, where e i denotes the i-th unity vector.
We obtain the following error estimate for this approximation method; the mean-square error converges with order 1/2 in D while the convergence in K is determined by the operator Q.
The first term results from the approximation of the Q-Wiener process by (W K t ) t≥0 , whereas the second term is due to the approximation of the iterated integral with respect to this truncated process by Algorithm 1.
Note that in the convergence analysis of numerical schemes for SPDEs, we compare the approximation of the iterated stochastic integrals to integrals with respect to (W K t ) t≥0 , K ∈ N, see the proofs in [2] and [5], for example, that is, the analysis involves the error estimate stated in Corollary 2.2 below. We want to emphasize that this estimate is independent of the integer K.
for some C Q > 0 and all h > 0, t, t + h ∈ [0, T ], D, K ∈ N, and J K ⊂ J with |J K | = K.
Proof. If we set η i = 0 for all i ∈ J \ J K , the result follows directly from Theorem 2.1.
Next, we outline an alternative algorithm to approximate integrals of type (7). In contrast to the method presented above, the vector of tail sumsR Q,(D) (h) is approximated and included in the computation.

Algorithm 2
The following derivation is based on the scheme developed by Wiktorsson [8] for SODEs. In the finite dimensional setting, the error estimate (4) depends on the number of Brownian motions K additionally to the time step size h. This suggests that the computational cost involved in the simulation of the stochastic double integrals is much larger in the setting of SPDEs as the number of independent Brownian motions is, in general, not finite, see also expression (8). The eigenvalues of the Q-Wiener process are, however, not incorporated in the error estimate (4). For example, if we assume η j = O(j −ρ Q ) for some ρ Q > 1, C > 0, j ∈ J ⊂ N, we obtain -for ρ Q ∈ (1, 3) -an improved error estimate which depends on the rate of decay of the eigenvalues instead of some fixed exponent of K. This results from the fact that we integrate with respect to a Q-Wiener process in our setting, where Q is a nonnegative, symmetric trace class operator. For ρ Q ≥ 3, we can show that the exponent of K is bounded by 3.
As before, we truncate the series (18) at some integer D ∈ N and obtain the approximatioñ A Q,(D) (h) in (19). The vector of tail sumsR Q,(D) (h) in (20), however, is not discarded but approximated by a multivariate normally distributed random vector instead, as described in [8] for Q K = I K and |J | = K. First, we determine the distribution of the tail sums; for r ∈ N, we compute the covariance matrix of where e i denotes the i-th unity vector. This expression can be reformulated without using the operator S K by taking into account that Analogously to [8], by taking the expectation, we define Taking into consideration that K e i H T K = 0, it follows that expression (23) can be rewritten as This implies that, given Z Q = (Z Q r ) r∈N and ∆w Q h , the vector of tail sumsR Q,(D) (h) is conditionally Gaussian distributed with the following parameters Hence, given Z Q and ∆w Q h , we can approximate the tail sums by simulating a conditionally standard Gaussian random vector Υ Q,(D) and, therewith, obtain the vector of tail sums It remains to examine, how the covariance matrix evolves as D → ∞. For D ∈ N, we define the matrix By the proof of Theorem 2.4 below, we get convergence in the following sense where · F denotes the Frobenius norm. Thus, it follows as D → ∞, see also [8].
Combining the above, we obtain an algorithm very similar to the one in [8], where steps 1, 2, and 4 equal Algorithm 1. Additionally, we approximate the vector of tail sums in step 3. For some h > 0, t, t + h ∈ [0, T ], and D, K ∈ N the Algorithm 2 is defined as follows: 1. For j ∈ J K , simulate the Fourier coefficients ∆w j where V ∼ N (0 K , I K ).

Compute the approximation vec((Î
Note that the matrix Σ Q ∞ in step 3 is the Cholesky decomposition of Σ Q ∞ . This expression, which is specified in the following theorem, can be obtained in closed form and does not have to be computed numerically.
Proof. For a proof, we refer to Section 3. Now, we analyze the error resulting from Algorithm 2. In the following theorem, the first term is the same as in the error estimate of Algorithm 1, see Theorem 2.1. Due to the second term, the approximations converge with order 1 in D, which is twice the order that Algorithm 1 attains. However, this expression is dependent on K as well. Below, we state an alternative estimate -there, the exponent of K is not fixed but dependent on the eigenvalues η j , j ∈ J K , see Theorem 2.6. The algorithm that is superior can only be determined in dependence on the operator Q.
for some C Q > 0 and all h > 0, t, t + h ∈ [0, T ], D, K ∈ N, and J K ⊂ J with |J K | = K.
Proof. For a proof, we refer to Section 3.
For completeness, we state the following error estimate. Again, this is the estimate that we employ when incorporating the approximation of the iterated integrals into a numerical scheme; see also the notes on Corollary 2.2.
Proof. The proof of this corollary is detailed in the proof of Theorem 2.4.
If we assume η j ≤ Cj −ρ Q for C > 0, ρ Q > 1, and all j ∈ {1, . . . , K}, we can improve the result in Theorem 2.4 in the case ρ Q < 3. Precisely, we obtain an error term that involves the factor The main difference is that the alternative proof works with the entries of the covariance matrices explicitly. A statement along the lines of Corollary 2.5 can be obtained analogously.
Theorem 2.6 (Convergence for Algorithm 2). Assume that Q is a nonnegative and symmetric trace class operator and i.e., assumptions (A1) and (A2) are fulfilled. Then, it holds for some C Q > 0 and all h > 0, t, t + h ∈ [0, T ], D, K ∈ N, and J K ⊂ J with |J K | = K.
Proof. For a proof, we again refer to Section 3.
Remark 2.1. Note that if (W t ) t≥0 is a cylindrical Wiener process, we get the same estimate (4) as in the finite dimensional case.
In general, for h = T M , we obtain convergence of this algorithm for K, M → ∞ if we choose D > (min j∈J K η j ) − 1 2 h 1−θ or, respectively, D > K 2 (K − 1)h 1−θ for some θ > 0. For Algorithm 1, we require D > h 2−2θ , instead. However, we need a more careful choice of D to maintain the order of convergence in the mean-square sense in h for a given numerical scheme -we call this convergence rate γ > 0. Precisely, we have to choose D ≥ h 1−2γ for Algorithm 1 and

Convergence for Algorithm 1
Proof of Theorem 2.1. We determine the error resulting from the approximation of the iterated stochastic integral (7) by Algorithm 1 which also contains the projection of the Q-Wiener process in (9). Below, we employ error estimates of the following form several times, see also the proof in [2]. It holds where we used the expression for all s ∈ [0, T ], K ∈ N in the first step. We fix some arbitrary h > 0, t, t + h ∈ [0, T ], and K ∈ N throughout the proof and decompose the error into several parts For now, we neglect the last term in (29) and estimate the other parts. By Itô's isometry, the properties (A1) and (A2) of the operators Φ, Ψ, and estimate (28), we get for all D ∈ N. As in [4], we finally estimate and, in total, we obtain for this part

Cholesky Decomposition
Proof of Theorem 2.3. It holds Σ Q ∞ =Q K Σ I ∞Q T K , where Σ I ∞ is given by (23) for Q K = I K , and holds and compute for a : The idea in [8] is to show that the first term, which slightly differs in [8], is zero, i.e., which proves that the expression for Σ Q ∞ is correct. In the proof of Theorem 4.1 in [8], the author shows arguing by the eigenvalues of the minimal polynomial of this equation. We do not repeat this ideas here but refer to [8] for further details.

Convergence for Algorithm 2
Proof of Theorem 2.4. We split the error term as in the proof of Theorem 2.1, see equation (29), and obtain the same expression (30) from the approximation of the Q-Wiener process by (W K t ) t∈[0,T ] , K ∈ N. Further, we get as in equation (31) for all h > 0, t, t+h ∈ [0, T ], K ∈ N. The following part also proves Corollary 2.5. Let · F denote the Frobenius norm. With the expressions forR Q, , Σ I ∞ are given by (26) and (23) for Q K = I K , respectively, and the definition of the algorithm (27), we obtain for all h > 0, t, t + h ∈ [0, T ], D, K ∈ N. Here, we used the fact that Υ Q,(D) for h > 0, D, L ∈ N and thatQ K is a diagonal matrix. Precisely, for G := √ Σ I,(D) − Σ I ∞ with G := (g ij ) 1≤i,j≤L and Υ Q, In order to relate to the proof in [8], we write In total, we obtain Proof of Theorem 2.6. We split the error term as in the proof of Theorem 2.1 and Theorem 2.4, see equation (29), and obtain the same expression (30) from the approximation of the Q-Wiener process by (W K t ) t∈[0,T ] , K ∈ N. Moreover, as in the previous proof, we get from (33) that with (i, j) ∈ I A and i = j. The off-diagonal entries of the matrix with i, j, m, n ∈ {1, . . . , K}, i < j and m < n. Therewith, it is easy to see that for and for the off-diagonal entries, it holds with k, l ∈ {1, . . . , L}, l = k, i, j, m, n ∈ {1, . . . , K}, i < j, and m < n. Next, we employ the following lemma from [8] in order to rewrite (34). For simplicity, we assume η 1 ≥ η 2 ≥ . . . ≥ η K for all K ∈ N. We decompose Σ Q ∞ as Σ Q ∞ = 2η K−1 η K I L + Σ Q ∞ to determine its smallest eigenvalue. The matrix Σ Q ∞ is defined as follows: For the diagonal elements, we get values with k ∈ {1, . . . , L}, (i, j) ∈ I A , and h > 0. For the off-diagonal elements, we get Σ Q ∞ (k,l) = Σ Q ∞ (k,l) for all k, l ∈ {1, . . . , L}, k = l. As the matrix Σ Q ∞ is symmetric and positive semidefinite, the smallest eigenvalue λ min of Σ Q ∞ fulfills λ min ≥ 2η K−1 η K ≥ 2η 2 K .
Below, we use the notation c D = ∞ r=D+1 1 r 2 for legibility. The matrices Σ Q,(D) and Σ Q ∞ are symmetric positive definite. By Lemma 3.1 and the definitions of Σ Q,(D) , Σ Q ∞ in (26) and (23), respectively, we obtain from (34) for h > 0, t, t + h ∈ [0, T ], D, K ∈ N. Following ideas from [8], we get Next, we compute the conditional expectation involved in this estimate. We insert the expressions detailed above for H K Σ Q (V Q r ) |Z Q r ,∆w Q h H T K , r ∈ N, and Σ Q ∞ and split the sum into diagonal entries and off-diagonal elements of the matrix. This yields for h > 0, t, t + h ∈ [0, T ], D, L ∈ N L k,l=1 that is, more generally, The statement of the theorem follows by combing this estimate with (30).