On the stochastic Magnus expansion and its application to SPDEs

We derive the stochastic version of the Magnus expansion for linear systems of stochastic differential equations (SDEs). The main novelty with respect to the related literature is that we consider SDEs in the It\^o sense, with progressively measurable coefficients, for which an explicit It\^o-Stratonovich conversion is not available. We prove convergence of the Magnus expansion up to a stopping time {\tau} and provide a novel asymptotic estimate of the cumulative distribution function of t. As an application, we propose a new method for the numerical solution of stochastic partial differential equations (SPDEs) based on spatial discretization and application of the stochastic Magnus expansion. A notable feature of the method is that it is fully parallelizable. We also present numerical tests in order to asses the accuracy of the numerical schemes.


Introduction
The Magnus expansion (hereafter referred to as ME) is a classical tool to solve non-autonomous linear differential equations. Generalizations of the ME to Stratonovich SDEs are well-known and were proposed by several authors (see for instance [2], [3], [5], [32] and the references given in Section 1.2). In this paper we derive, for the first time to the best of our knowledge, the ME for Itô SDEs under general assumptions which do not guarantee an explicit Itô-Stratonovich conversion, namely progressively measurable stochastic coefficients. Our main results are the convergence of the stochastic ME up to a stopping time τ and a novel asymptotic estimate of the cumulative distribution function of τ . The latter improves some previous estimates obtained in purely Markovian settings and is based on an application 1 Introduction of Morrey's inequality. We also explore possible applications to the numerical solution of stochastic partial differential equations (SPDEs).
In (1.1), as well as anywhere throughout the paper, we use Einstein summation convention to imply summation of terms containing W j , over the index j from 1 to q.
In the deterministic case, i.e. A (j) ≡ 0, j = 1, . . . , q, (1.1) reduces to the matrix-valued (1.2) which admits an explicit solution, in terms of matrix exponential, in the time-homoge-neous case. Namely, if B t ≡ B, the unique solution to (1.2) reads as However, in the non-autonomous case, the ODE (1.2) does not admit an explicit solution.
In particular, if B t is not constant, the solution X t typically differs from e t 0 Bsds . This is due to the fact that, in general, B t and B s do not commute for t = s. As it turns out, a representation of the solution in terms of a matrix exponential is still possible, at least for short times, i.e. instance, the excellent survey paper [3] and the references given therein).
In the stochastic case, when j = 1, B t ≡ 0 and A is constant, i.e. A t (ω) ≡ A, the Itô equation (1.1) reduces to    dX t = AX t dW t , whose explicit solution can be easily proved to be of the form (1.3), with In general, when the matrices A s , B t , B s with t = s do not commute, an explicit solution to (1.1) is not known. For instance, in the non-commutative case, neither the equation (1.5) nor the equation admit an explicit solution, save some particular cases (see for instance the example in Section 3.3). Among the approximation tools that were developed in the literature to solve stochastic differential equations, including (1.1), some Magnus-type expansions that extend (1.3)- (1.4) were derived in different contexts. We now go on to describe our contribution to this stream of literature, and then to firm our results within the existing ones. In particular, a detailed comparison with existing stochastic MEs previously derived by other authors will be provided below, in the last subsection.

Description of the main results.
In this paper we derive a Magnus-type representation formula for the solution to the Itô SDE (1.1), which is (1.3) together with for τ suitably small, strictly positive stopping time. In analogy with the deterministic ME, the general term Y (n) can be expressed recursively, and contains iterated stochastic integrals of nested Lie commutators of the processes B, A (j) at different times.
In the case j = 1, the first two terms of the expansion read as For example, in the case of SDE (1.5) the latter can be reduced to Notice that the last expressions do not contain stochastic integrals. In fact, in the general autonomous case, and if j = 1, all the iterated stochastic integrals in Y (n) can be solved for any n (see Corollary 5.2.4 in [16]). Therefore, in this case the expansion becomes numerically computable by only approximating Lebesgue integrals, as opposed to stochastic Runge-Kutta schemes, which typically require the numerical approximation of stochastic integrals. As we shall see in the numerical tests in Section 3, this feature allows us to choose a sparser time-grid in order to save computation time. This feature is also preserved in some non-autonomous cases as illustrated in Section 3.
A notable feature of the expansion is the possibility of parallelizing the computation of its terms. In contrast to standard iterative methods, which require the solution at a given time-step in order to go through the next step in the iteration, the discretization of the integrals in the terms Y (n) can be done simultaneously for all the time steps. Conclusively, this entails the possibility of parallelizing over all times in the time-grid and makes the numerical implementation of the stochastic ME perfectly GPU applicable.
As it often happens when deriving convergent (either asymptotically or absolutely) expan- In the deterministic case, the convergence of the ME (1.4) to the exact logarithm of the solution to (1.2) was studied by several authors, who proved progressively sharper lower bounds on the maximumt such that the convergence to the exact solution is assured for any t ∈ [0,t]. At the best of our knowledge, the sharpest estimate was given in [26], namelȳ the unique strong solution to (1.1) (see Lemma 2.10). There exists a strictly positive stopping time τ ≤ T such that: (ii) the following representation holds P -almost surely: where Y (n) is the n-th term in the stochastic ME as defined in (2.16) and (2.20)-(2.22); (1.10) The proof of (i) relies on the continuity of X together with a standard representation for the matrix logarithm. The key point in the proof of (ii) consists in showing that X ε,δ t and its logarithm Y ε,δ t are holomorphic as functions of (ε, δ), where X ε,δ t represents the solution of (1.1) when A (j) and B are replaced by εA (j) and δB, respectively. Once this is established, the representation (1.9) follows from observing that, by construction, the series in (1.9) is exactly the formal power series of Y ε,δ t at (ε, δ) = (1, 1). To prove the holomorphicity of X ε,δ t we follow the same approach typically adopted to prove regularity properties of stochastic flows. Namely, in Lemma 2.10 we state some maximal L p and Hölder estimates (with respect to the parameters) for solutions to SDEs with random coefficients and combine them with the Kolmogorov continuity theorem. Finally, the proof of (iii) owes one more time to the L p estimates in Lemma 2.10 and to a Sobolev embedding theorem to obtain pointwise estimates w.r.t. the parameters (ε, δ) above. Theorem 1.1 has been used in the recent paper [34] (cf. Lemma 1) where a semi-linear noncommuative Itô-SDEs is studied and Euler, Milstein and derivative-free numerical schemes are developed, with a convergence analysis for those schemes.
In the last part of the paper we perform numerical tests with the Magnus expansion. In particular, Section 3.2 is devoted to the application of the stochastic ME to the numerical solution of parabolic stochastic partial differential equations (SPDEs). The idea is to discretize the SPDE only in space and then approximate the resulting linear matrix-valued SDE by truncating the series in (1.8)-(1.9). The goal here is to propose the application of stochastic MEs as novel approximation tools for SPDEs; we study the error of this approximating procedure only numerically, in a case where an explicit benchmark is available, and we defer the theoretical error analysis to further studies.

Review of the literature and comparison.
Stochastic generalizations of the MEs were proposed by several authors. To the best of our knowledge, we recognize mainly two streams of research.
The beginning of the first one can be traced back to the work [2], where the author derived exponential stochastic Taylor expansions (see also [1], [16] for general stochastic Taylor series) of the solution of a system of Stratonovich SDEs with values on a manifold M, i.e.
with B, A (j) being smooth, deterministic and autonomous vector fields on M. The stochastic flow of (1.11) is represented in terms of the exponential map of a stochastic vector field Y , i.e.
the vector field Y being expressed by an infinite series of iterated stochastic integrals multiplying nested commutators of the vector fields B, A (j) . This representation is proved up to a strictly positive stopping time and extends some previous results in [11], [31] for the commutative case and in [33], [19], [13] for the nilpotent case. Refinements of [2] were proved in [6] making the expansion of Y more explicit. Later, numerical methods based on these representations were proposed in [7] and [8]. Such techniques, known as Castell-Gaines methods, require the approximation of the solution to a time-dependent ODE besides the approximation of iterated stochastic integrals. Truncating the expansion of Y at a specified order, these schemes turn out to be asymptotically efficient in the sense of Newton [28].
If M = M d×d and the vector fields are linear, then (1.11) reduces to the Stratono-vich version of (1.1) with B, A (j) constant matrices, and the representation of X given in [2] can be seen as a stochastic ME, in that the exponential map of Y reduces to the multiplication by a matrix exponential. In fact, in this case the expansion in [2] becomes explicit in terms of iterated stochastic integrals, and can be shown to coincide with the expansion in this paper by applying Itô-Stratonovich conversion formula. In the very interesting paper [22], the authors study several computational aspects of numerical schemes stemming from the truncated ME, in which the iterated stochastic integrals are approximated by their conditional expectation. Besides showing that asymptotic efficiency holds for an arbitrary number of Brownian components, they compare the theoretical accuracy with the one of analogous schemes based on Dyson (or Neumann) series, which are obtained by applying stochastic Taylor expansion directly on the equation. They find that, although the theoretical accuracy of Magnus schemes is not superior, Magnus-based approximations seem more accurate than their Dyson counterparts in practice. They also discuss the computational cost deriving from approximating the iterated stochastic integrals and the matrix exponentiation, in relation to different features of the problem such as the dimension and the number of Brownian motions, as well as to the order of the numerical scheme.
The second stream of literature is explicitly aimed at extending the original Magnus results to stochastic settings and can be traced back to [5] where the ME is derived via formal arguments for a linear system of Stratonovich SDEs with deterministic coefficients. Clearly, in the autonomous case such expansion coincides with the one obtained by Ben Arous ( [2]), whereas in the non-autonomous case, B ≡ 0 and j = 1, it is formally equivalent to the deterministic ME (1.4) with all the Lebesgue integrals replaced by Stratonovich ones. The authors of [5] do not address the convergence of the ME, but rather study computational aspects of the resulting approximation, in particular in comparison with Runge-Kutta stochastic schemes.
The authors of [24] consider the Ito SDE (1.1) with constant coefficients, and propose to resolve via Euler method the SDE (2.14) for the logarithm of the solution. In [32] the ME for the Stratonovich version of (1.1) with deterministic coefficients is applied to solve non-linear SDEs; however, the error analysis of the truncated expansion seems flawed, since the fact that the Magnus series converges only up to a positive stopping time is overlooked. In [27], a general procedure for designing higher strong order methods for Itô SDEs on matrix Lie groups is outlined.
We now go on to discuss the contribution of this paper with respect to the existing literature. In the first place, Theorem 1.1 on the convergence of the ME requires very weak conditions on the coefficients, which are stochastic processes satisfying the sole assumption of progressive measurability. This is a novel aspect compared to the results in [2], [6], which surely cover a wider class of SDEs, but under the assumption of time-independent deterministic coefficients. We point out that this feature is also relevant in light of the fact that our result is stated for Itô SDEs as opposed to Stratonovich ones. Indeed, while this difference might appear as minor in the Markovian case, where a simple conversion formula exists (cf. [10] and [21]), it becomes substantial in the case of progressively measurable coefficients. We also point out that, even in the Markovian non-autonomous case, convergence issues were not discussed in [5] and [22].
Another novel aspect of our result concerns the estimate (1.10) for the cumulative distribution function of the stopping time τ up to which the Magnus series converges to the real logarithm of the solution: this kind of estimate was unknown even in the autonomous case. Theorem 11 in [2] (see also [6]) provides an asymptotic estimate for the truncation error of the logarithm, which in the linear case studied in this paper would read as with R bounded in probability. Although this type of result holds for the general SDE (1.11), it is weaker than Theorem 1.1 in the linear case. In fact, it can be obtained by (1.10) together 2 , but not the other way around.
A rigorous error analysis of the ME is left for future research, as well as applications to non-linear SDEs (see [32] for a recent attempt in this direction).
The rest of the paper is structured as follows. In Section 2 we derive the ME and prove Theorem 1.1. In particular, Section 2.1 contains the key Lemma 2.1 with a representations for the first and second order differentials through which the terms Y (n) in (1.8)-(1.9) will be defined, and some preliminary results that will be used to derive the expansion. Section 2.2 contains a formal derivation of (1.8)-(1.9). Section 2.3 is entirely devoted to the proof of Theorem 1.1.
In Section 3 we first introduce a numerical test for an SDE with constant, non-commuting coefficients. The formulas for the first three orders of the ME in this test will also be used

Itô-Stochastic ME
In this section we define the terms in the expansion (1.9) and prove Theorem 1.1.

Preliminaries
Let M d×d be the vector space of (d × d) real-valued matrices. For the readers' convenience we recall the following notations. Throughout the paper we denote by [·, ·] the standard Lie and by · the spectral norm on M d×d . Also, we denote by β k , k ∈ N 0 , the Bernoulli numbers defined as the derivatives of the function x → x/(e x − 1) computed at x = 0. For sake of convenience we report the first three Bernoulli numbers: We now define the operators that we will use in the sequel. For a fixed Σ ∈ M d×d , we let: To ease notation we also set ad Σ := ad 1 Σ ; • e ad Σ : M d×d → M d×d be the linear operator defined as In the next lemma we provide explicit expressions for the first and second order differentials of the exponential map M d×d M → e M . We recall that this map is smooth and in particular, it is continuously twice differentiable. We point out that this result, though very basic, is novel and of independent interest (for instance it was recently employed in [14]).
Proof. The first part of the statement, concerning the first order differential, is a classical result; its proof can be found in [3, Lemma 2] among other references.
We prove the second part. Fix M ∈ M d×d and denote by ∂ M e Σ the first order directional By the first part, we have We now show that, for any M, N ∈ M d×d , the second order directional derivative is given by We use the definition (2.2) and exchange the differentiation and integration signs to obtain (by employing the two expressions in (2.5) for the first-order differential) This, together with (2.7), proves (2.6).
To conclude, we prove equality (2.4). It is enough to observe that

Proposition 2.2 (Itô formula). Let Y be an M d×d -valued Itô process of the form
Then we have Proof. The statement follows from the multi-dimensional Itô formula (see, for instance, [29]) combined with Lemma 2.1 and applied to the exponential process e Yt .
We also have the following inversion formula for the operator L Σ .
For a proof to Lemma 2.3 we refer the reader to [3].

Formal derivation
In this section we perform formal computations to derive the terms Y (n) appearing in the ME (1.9). Although such computations are heuristic at this stage, they are meant to provide the reader with an intuitive understanding of the principles that underlie the expansion procedure. Their validity will be proved a fortiori, in Section 2.3, in order to prove Theorem 1.1.
Let (Ω, F, P, (F t ) t≥0 ) be a filtered probability space. Assume that, for any ε, δ ∈ R, the process X ε,δ = X ε,δ t t≥0 solves the Itô SDE and that it admits the exponential representation Inverting now (2.12)-(2.13), in accord with (2.9), one obtains We now assume that Y ε,δ admits the representation for a certain family (Y (r,n−r) ) n,r∈N 0 of stochastic processes. In particular, setting (ε, δ) = according to any arbitrary choice, for the latter will be proved to be absolutely convergent. The above choice for Y (n) contains all the terms of equal order by weighing ε and δ in the same way. A different choice, which respects the probabilistic relation corresponds to weighing δ as ε 2 . This would lead to setting Remark 2.5. Observe that, if the function (ε, δ) → Y ε,δ 0 is assumed to be continuous P -almost surely, then the initial condition in (2.14) implies We now plug (2.15) into (2.14) and collect all terms of equal order in ε and δ. Up to order 2 we obtain for any t ≥ 0, where we used, one more time, Einstein summation convention to imply summation over the indexes i, j and Remark 2.5 to set all the initial conditions equal to zero. Proceeding by induction, one can obtain a recursive representation for the general term Y (r,n−r) in (2.15), namely: where the terms σ r,n−r,j , µ r,n−r are defined recursively as and with the operators S being defined as and the terms in the ME (2.16) read as In particular, the ME coincides with the exact solution with the first two terms.

Convergence analysis
In this section we prove Theorem 1.1. To avoid ambiguity, only in this section, we denote by We start with two preliminary lemmas.
Note that, again by (2.14), R 0 ≡ 0 P -a.s. Moreover, representation (2.15) implies continuity of ε → Y ε,0 t near ε = 0, which in turn implies the continuity of ε → R ε t . Thus we have lim which in turn implies that, if λ is a real eigenvalue of M and v is one of its normalized eigenvectors, then We have one last preliminary lemma, containing some technical results concerning the solutions to (2.10). These are semi-standard, in that they can be inferred by combining and adapting existing results in the literature.
Up to modifications, (X ε,δ t ) ε,δ∈C, t∈[0,T ] is a continuous process such that: iii) for any p ≥ 1 and h > 0 there exists a positive constant κ only dependent on Part (i): as X ε,δ 0 = I d , by continuity the random time defined as is strictly positive. Furthermore, again by continuity, where Q t,h is a countable, dense subset of Q t,h , which implies that τ is a stopping time.
Notice that X ε,δ t (and therefore also Y ε,δ t ) is real for ε, δ ∈ R: in particular, Y t = Y 1,1 t is real and this proves Part (i).
By definition (2.28), we have and therefore (1.10) follows by suitably estimating E M 2 t . To prove such an estimate we will show in the last part of the proof that f t belongs to the Sobolev space W 1,2p (B h (0)) for any p ≥ 1 and we have where the positive constant C depends only on A (1) T , . . . , A (q) T , B T , d, T , h and p. Since f t ∈ W 1,2p (B h (0)) and B h (0) ⊆ R 4 , by Morrey's inequality (cf., for instance, Corollary 9.14 in [4]) for any p > 2 we have where c 0 is a a positive constant, dependent only on p and h (in particular, c 0 is independent of ω). Combining (2.31) with (2.32), for a fixed p > 2 we have This last estimate, combined with (2.30), proves (1.10).
To conclude, we are left with the proof of (2.31). First we have where we used the estimate (2.26) of Lemma 2.10 in the last inequality. Fix now t ∈ ]0, T ], Note that the arg max above do exist in that the process g s (ε, δ) := X ε,δ s − I d is continuous and yields the key inequality where we used the estimate (2.27) of Lemma 2.10 in the last inequality. This, together with (2.33), proves (2.31) and conclude the proof.

Numerical tests and applications to SPDEs
We present here some numerical tests in order to confirm the accuracy of the approximate solutions to (1.1) stemming from the truncation of the series (1.9). We also show how this approximation can be applied to approximate the solutions to stochastic partial differential equations (SPDEs) of parabolic type.
We consider two examples of SDEs (one in Section 3.1 and one in Section 3.3), for which we compute the first three terms of the ME given by (2.16)-(2.20) and present numerical experiments to test the accuracy of the approximate solutions to (1.1) stemming from it. In both cases we consider j = 1 in (1.1) and replace A (1) with A to shorten notation. The first example will be for constant matrices A and B. In the second one we will consider B ≡ 0 and a deterministic upper diagonal A t . For each numerical test we will implement the exponential of the truncated ME up to order n = 1, 2 and 3, i.e.
and compare it with a benchmark solution to (1.1). In Section 3.2 we turn our attention to the application of the ME to the numerical resolution of SPDEs. In particular, in the numerical tests we will make use of the ME for constant matrices discussed in Section 3.1.
Error and notations. Throughout this section we will employ the following tags: namely a discretization of the time-averaged relative error on the interval [0, t]. This is a way to measure the error on the whole trajectory as opposed to the error at a specific given time.
Then we use Monte Carlo simulation, with M independent realizations of the discretized Brownian trajectories, to approximate the distribution of Err t .
The matrix norm above is the Frobenius norm. In the following tests, m1, m2 and m3 will always play the role of X app , exact always the role of X ref , whereas euler will be either X app or X ref depending on whether exact is available or not.
We used for the calculations Matlab R2021a with Parallel Computing Toolbox running on Windows 10 Pro, on a machine with the following specifications: processor Intel(R) Core(TM) i7-8750H @ 2.20 GHz, 2x32 GB (Dual Channel) Samsung SODIMM RAM @ 2667 MHz, and a NVIDIA GeForce RTX 2070 with Max-Q Design (8 GB GDDR6 RAM). Also, we will make use of the Matlab built-in routine expm for the computation of the matrix exponential. As it turns out, this represents the most expensive step in the implementation of the Magnus approximation. However important, the pursue of optimized method for the matrix exponentiation is an extended topic of separate interest, which goes beyond the goals of this paper. Therefore, here we will limit ourselves to pointing out, separately, the computational times for the approximations of the logarithm and of the matrix exponential.
In the implementation we simulate the Brownian motion first and use it as an input for each scheme to be able to compare the trajectories of each scheme amongst each other.

Example: constant A and B.
With a slight abuse of notation, we consider A t ≡ A and B t ≡ B. Recall that, if A and B do not commute, there is in general no closed-form solution to (1.1). The first three terms of the ME read as We point out that, in this case, all the stochastic integrals appearing in the ME can be solved in terms of Lebesgue integrals by using Itô's formula. Therefore, in order to discretize Y (n) it is not necessary to approximate stochastic integrals. This allows to use a sparser time grid compared to the Euler method, for which the discretization of stochastic integrals is necessary. In particular, the theoretical speed of convergence with respect to the time-step is of order √ ∆ for Euler-Maruyama scheme and of order ∆ for deterministic Euler, which is the scheme used to discretize the Lebesgue integrals in the Magnus expansion above. In the following numerical tests, we discretize in time with mesh ∆ equal to 10 −4 for euler and equal to √ ∆ = 10 −2 for m1, m2 and m3. Note that, as it is confirmed by the results in Table   1, choosing a finer time-discretization for euler (our reference method here) is essential in order to make it comparable with m3. Furthermore, in the example of Section 3.3, where an explicit solution is available, we show (see Tables 7 and 9) that choosing a sparser time-grid (say ∆ = 10 −3 ) the Euler-Maruyama method incurs a sensitive loss of precision.
It is also clear that the implementation is totally parallelizable, in that Y (1) , Y (2) and Y (3) do not depend on each other and thus they can be computed in parallel. More importantly, the discretization of the integrals in each Y (n) can be parallelized as the latter are explicit and not implicitly defined through a differential equation.
We choose A and B at random and normalize them by their spectral norms. In particular, the results below refer to In Figure 1 we plot one realization of the trajectories of the top-left component (X t ) 11 , computed with the methods above, up to time t = 0.75. In Table 1 Table 2. For the Magnus methods we separate the time to compute the approximate logarithm from the one to compute the matrix exponential.
Remark 3.1. We can see from Table 2     to the different rates of convergence.
In our numerical experiments we already use 6 CPU cores to parallelize the computation of the matrix exponential on the CPU, while we use one GPU to compute the Magnus logarithm.
For euler we speed up in each iteration the matrix multiplications by using pagefun on a GPU to parallelize over all samples. As for m1, m2 and m3, if we were to increase the number of CPU cores to, say, 12, we could see an approximate reduction in the computation time of  Table 2 for the moment. In this particular experiment it would mean that we can divide the computational time of the matrix exponentiation by approximately ∆ −1 = 10 2 without increasing the CPU core count. Hence, the Magnus methods would require approximately only 0.04 seconds plus effects from distributing the memory to the different processors. The euler method, in contrast, does not benefit from this because, as an iterative method, it must fully evaluate the trajectories.
Such situations are not uncommon; for example, in mathematical finance pricing a European call option depends only on the terminal time of the underlying process, giving the Magnus methods a tremendous advantage even without increasing CPUs or GPUs. We will illustrate such a situation in Example 3.2 together with Table 3. In calibration procedures, such as fitting a model to data at few points in time, the Magnus method also excels for the same reason.
Example 3.2. In this example we want to demonstrate the benefit, explained in Remark 3.1, of using the Magnus methods compared to iterative schemes, such as the Euler method, when calculating the first, second and third element-wise moments of the terminal value of a matrix-valued SDE. Precisely, we evaluate E ((X t ) ij ) k for i, j = 1, . . . , d and k = 1, 2, 3. We will keep the same parameters as in Table 1. Table 3: A and B constant. Computational times and values of first, second and third moment at the terminal time t = 1 for m1, m2, m3, using ∆ = 10 −2 , and euler, using ∆ = 10 −4 , obtained with 10 3 samples.

Applications to SPDEs
The results of this example are summarized in Table 3. In this table columns 2-4 contain   the values of the element-wise moments at the terminal time of the solution to the SDE   with constant coefficients starting with the upper left corner of the solution matrix, then the upper right, lower left and lower right, respectively. In the last column we present the computational times in seconds.
The values of the moments do not differ significantly between euler and m3, and remarkably the Magnus methods are roughly 35 times as fast in this particular example. We stress again at this point that a coarser time-grid for euler would not be comparable to the accuracy of m3.
In the interesting paper [12] a non-linear extension in the case of commuting A and B can be found and applications to SPDEs via space discretizations are discussed, which is the same approach we take in the next subsection with the ME.

Applications to SPDEs
The aim of this subsection is to apply the previously derived ME for the numerical solution of parabolic stochastic partial differential equations (SPDEs). We derive an approximation scheme for the general case of variable coefficients, which we only test in the case of the stochastic heat-equation (Example 3.4), for which an exact solution is available.

Stochastic Cauchy problem and fundamental solution
Let (Ω, F, P, (F t ) t≥0 ) be a filtered probability space endowed with a real Brownian motion W . We consider the stochastic Cauchy problem where L t is the elliptic linear operator acting as and G t is the first-order linear operator acting as The coefficients (a, b, c, g, σ) are random fields indexed by (t, x) ∈ [0, ∞[×R and the initial datum ϕ is a random field on R. A classical solution to (3.3) is understood here as a predictable and almost-surely continuous random field u = u t (x) over [0, ∞[×R, such that u t ∈ C 2 (R) a.s. for any t > 0 and There is a vast literature on stochastic SPDEs and problems of the form (3.3), under suitable measurability, regularity and boundedness assumptions on the coefficients and on the initial datum: see, for instance, [18], [25], [9], [30] and the references therein.
Note that, in analogy with deterministic PDEs, the solution of the Cauchy problem (3.3) can be written, in some cases, as a convolution of the initial datum with a stochastic funda- with p(t, x; 0, ξ) being a random field that solves the SPDE in (3.3) with respect to the variables (t, x) and which approximates a Dirac delta centered at ξ as t approaches 0.

Finite-difference Magnus scheme
We employ the stochastic ME to develop an approximation scheme for the Cauchy problem (3.3). Our goal here is only to hint at the possibility that the stochastic ME is a useful tool for the numerical solution of SPDEs. Therefore, we keep the exposition at a heuristic level and postpone the rigorous study of the problem for further research.
The idea is to apply finite-difference space-discretization for the operators L and G, and then ME to solve the resulting linear (matrix-valued) Itô SDE. We fix a bounded interval [a, b] and use the following notation: for a given d ∈ N, we denote by ς d a mesh of d + 2 and for any random field f (x), x ∈ R, we denote by f d = (f d 0 , . . . , f d d+1 ) the random vector whose components correspond to f evaluated at the points of the mesh, namely Following the classical centered finite-difference discretization, we approximate the spatial derivatives in each point as to obtain the system of Itô SDEs for i = 1, · · · , d, where L d t and G d t are now the operators acting as By imposing some boundary conditions, for instance    In doing this, we shall keep in mind that the difference between the latter quantities can be decomposed into two errors, namely: the one between X d and its approximation, and the one between X d and I d . In turn, the latter is the result of both space-discretization and the error that stems by imposing null boundary conditions (see (3.5)). In particular, this last error cannot be reduced by refining the space-grid. Therefore, the analysis should be restricted , respectively. The matrix norm above is the Frobenius norm. The role of X d,app will be played by the time-discretization of the truncated ME (1.8)-(1.9). In particular, we will denote by m1 and m3 the discretized first and third-order MEs of X d , respectively. We will not consider the second-order Magnus approximation m2 as it appears less stable than the others. Note that, being A d t and B d t constant matrices, the first three terms of the ME are given explicitly by (3.2).
In the numerical experiments we set a = −2, b = 2, a = 0.2, σ = 0.15. (3.11) Setting the parameter κ in (3.10), which determines the number of rows that are taken into account to asses the error, as κ = d/2 , we study the expectation of Err d t up to t = 0.5. Such choice for κ and t allows us to study the error in a region that is suitably away from the boundary. Indeed, choosing κ as above implies x d i in (3.8) ranging roughly from −1 to 1. On the other hand, the standard-deviation parameter associated to the Gaussian density (3.9) at t = 0.5 is roughly 0.30, while the mean parameter is 0.15 × W 0.5 , whose standard deviation is in turn roughly 0.10. Therefore, both ( I d,κ t ) i,1 and ( I d,κ t ) i,d are likely to be very close to zero, thus meeting the null boundary condition implied by (3.5).
In Tables 4, 5, 6, we report the approximate values of E[Err d t ] for d = 50, 100 and 200, respectively. These were obtained via simulation of 50 trajectories of W with time step-size ∆ = 10 −4 . Now, let us inspect the Tables 4-6 in more detail. As a reminder, these results were obtained by using the exact solution as a reference, which is available in this particular example. In Table 4 it is noticeable that euler and m3 can exhibit worse results for small times compared to m1. This is due to the coarse space approximation with only 52 space grid points. Increasing the number d of grid points improves the error of euler and m3 for all displayed times, which can be seen in Tables 5 and 6 by comparing each column for the same final time. Finally, notice that the third-order Magnus expansion has the same magnitude of error as the euler scheme for all final times.

Example: j = 1, B = 0 and A t upper triangular.
We now test the ME on an SDE with time-dependent coefficients and with known explicit solution. Set (3.12) In this case (1.1) admits an explicit solution, which can be obtained by using Itô's formula, given by The first three terms of the ME read as Again, all the stochastic integrals appearing in the ME can be solved in terms of Lebesgue integrals by using Itô's formula, which allows us to use one more time a sparser time grid compared to the Euler method and the discretized exact solution. In the following numerical tests, we discretize in time with mesh ∆ equal to 10 −4 for exact and equal to 10 −2 for m1, m2 and m3. For euler we run two experiments with mesh equal to 10 −4 and 10 −3 .
Note that euler serves here as an alternative approximation and that choosing a finer timediscretization for euler and exact (our reference method here) is again essential in order to make them comparable with m3.
In Table 7 Figure 3 to plot the empirical CDF of Err t . It is clear from the results that the time-step size ∆ = 10 −3 is not small enough in order for euler to yield accurate results. Also note that m3 outperforms euler with ∆ = 10 −4 up to t = 0.75.
The computational time for 10 3 sampled trajectories, up to time t = 1, which is given in Table 8, is approximately 0.7 seconds for exact, 4.6 seconds for euler and 0.6 seconds for either m1, m2 or m3. The latter, however, is divided as follows: nearly 0.05 seconds to compute the ME and nearly 0.55 seconds to compute the matrix exponential with the Matlab function expm. Let us recall Remark 3.1 and note that the computation of the logarithm via  (1) , Y (2) and Y (3) .
As it appears in the results above, the accuracy of the ME quickly deteriorates as the time increases. This is largely due the fact that the spectral norm A t = 1 2 t 2 + t 4 + 10t 2 + 9 + 5 is an increasing function of t. This behavior shall not come as a surprise, since the proof of Theorem 1.1 already uncovered the relation between the convergence time τ and the spectral norms of A t and B t . Such relation is also consistent with the convergence condition (1.7) that holds in the deterministic case. In order to asses numerically the impact of the spectral norm of A t on the quality of the Magnus approximation, we now repeat the experiments on the equation obtained by normalizing A t as in (3.12) with respect to A t . As it turns out, the accuracy of m1, m2 and m3 improves considerably with this normalization. Note that, in this case, (1.1) no longer admits a closed-form solution, while the representation for the terms Y (1) , Y (2) and Y (3) in the ME is omitted for it becomes rather tedious to write. In Figure   4 we plot one realization of the trajectories of the top-right component (X t ) 12 , computed with all the methods above, up to time t = 10. In this case we did not plot a diagonal component of the solution because the latter are exact for m2 and m3, up to discretization errors of Lebesgue integrals. Table 9 and Figure 5 are analogous to Table 7 and Figure 3 and are obtained again with 10 3 independent samples. The computational times, reported in Table 8, are comparable with those of the non normalized case. The same can said about the accuracy results reported in Table 9, which are comparable with those in Table 7.   Figure 3: B ≡ 0 and A t as in (3.12). Empirical CDF of Err t , at t = 0.75, for euler, m1, m2, m3, with exact as benchmark solution, obtained with 10 3 samples.

Conflicts of interests
The authors have no relevant financial or non-financial interests to disclose.

Data availability
All data generated or analysed during this study are included in this published article. In particular the code to produce the numerical experiments is available at  Figure 5: B t and A t as in (3.12) normalized by its spectral norm. Empirical CDF of Err t , at t = 0.75, for euler, m1, m2, m3, with exact as benchmark solution, obtained with 10 3 samples.