1 Introduction

The analysis and numerical computation of Zakai equations and other types of stochastic partial differential equation (SPDE) have been extensively studied in recent years. A general form of Zakai equation (see [1, 17]) is given by

$$\begin{aligned} \mathrm {d}v(t,x)= & {} \bigg (\frac{1}{2}\sum _{i,j=1}^d\frac{\partial ^2}{\partial x_i\partial x_j}\big [a_{ij}(x)v(t,x)\big ] - \sum _{i=1}^d\frac{\partial }{\partial x_i}\big [b_i(x)v(t,x)\big ]\bigg )\,\mathrm {d}t\nonumber \\&- \sum _{l=1}^m\sum _{i=1}^d \frac{\partial }{\partial x_i}(\gamma _{i,l}(x)v(t,x))\,\mathrm {d}M_t^l, \end{aligned}$$
(1.1)

where \(M_t = (M_t^1,\ldots ,M_t^m)'\) is an m-dimensional standard Brownian motion, a is a \(d\times d\) matrix-valued function, b is a \(\mathbb {R}^d\)-valued function, and \(\gamma = (\gamma _{i,l}(x))\) is a \(d\times m\) matrix-valued function. This Zakai equation arises from a nonlinear filtering problem: given an m-dimensional observation process M and a d-dimensional signal process Z, the goal is to estimate the conditional distribution of Z given M. If Z satisfies

$$\begin{aligned} Z_t = Z_0 + \int _0^t\beta (Z_s)\,\mathrm {d}s + \int _0^t\sigma (Z_s)\,\mathrm {d}B_s + \int _0^t\gamma (Z_s)\,\mathrm {d}M_s, \end{aligned}$$
(1.2)

where B is a d-dimensional standard Brownian motion independent of M, \(\sigma \) is a \(d\times d\)-matrix valued function, and \(\beta \) is a \(\mathbb {R}^d\)-valued function, then under appropriate conditions, the conditional distribution function of Z given M has a density \(v(t,\cdot )\in L_2(\mathbb {R}^d)=\{f:\int _{\mathbb {R}^d}|f(x)|^2\,\mathrm {d}x<\infty \}\) almost surely (see Corollary 3.1 in [29]), and, from Theorem 3.1 in [29], v satisfies (1.1) in a weak sense with

$$\begin{aligned} a = \sigma \sigma ^\top + \gamma \gamma ^\top ,\qquad b = \beta . \end{aligned}$$
(1.3)

Moreover, the solution v to (1.1) can be interpreted as the density—if it exists—of the limit empirical measure \(\nu _t = \lim _{N\rightarrow \infty } N^{-1} \sum _{i=1}^N \delta _{Z_t^i}\) for

$$\begin{aligned} Z_t^i = Z_0 + \int _0^t\beta (Z^i_s)\,\mathrm {d}s + \int _0^t\sigma (Z^i_s)\,\mathrm {d}B^i_s + \int _0^t\gamma (Z^i_s)\,\mathrm {d}M_s, \end{aligned}$$
(1.4)

where \(B^i\) and N are independent Brownian motions, independent of M, and the rest as above.

There are two major approaches to the numerical approximation of the Zakai equation. One is by simulating the particle system (1.4) with Monte Carlo method, for instance as in [10,11,12, 17]. The other approach is to directly solve the Zakai SPDE by spatial approximation methods and time stepping schemes, coupled again with Monte Carlo sampling, which is the subject of this paper.

Within this second class of methods, several schemes were developed in earlier works for different types of SPDEs, including finite differences [14, 18,19,20], finite elements [27, 37], and stochastic Taylor schemes [24, 25], but these were restricted to types of SPDEs not including Zakai equations of the type (1.1).

More recently, methods have been developed and analysed for parabolic SPDEs of the generic form

$$\begin{aligned} \mathrm {d}v = \mathscr {L}v\,\mathrm {d}t + G(v)\,\mathrm {d}M_t, \end{aligned}$$
(1.5)

where \(\mathscr {L}\) is a second order elliptic differential operator, and G is a functional mapping v onto a linear operator from martingales M into a suitable function space.

Under suitable regularity, for equations of type (1.5), mean-square convergence of order 1/2 is shown for an Euler semi-discretisation in [30] for square-integrable (not necessarily continuous), infinite-dimensional martingale drivers. In contrast, [3] allows only for continuous martingales but prove convergence of higher order in space and up to 1 in time, in \(L^p\) and almost surely, for a Milstein scheme and spatial Galerkin approximation of sufficiently high order; this is extended to advection-diffusion equations with possibly discontinuous martingales in [2].

Giles and Reisinger [16] use an explicit Milstein finite difference approximation to the solution of the following one-dimensional SPDE, a special case of (1.1) for \(d=1\) and constant coefficients,

$$\begin{aligned} \mathrm {d}v = -\mu \frac{\partial v}{\partial x}\,\mathrm {d}t + \frac{1}{2}\frac{\partial ^2 v}{\partial x^2}\,\mathrm {d}t - \sqrt{\rho }\frac{\partial v}{\partial x}\,\mathrm {d}M_t,\qquad (t,x)\in (0,T)\times \mathbb {R}, \end{aligned}$$
(1.6)

where \(T>0\), M is a standard Brownian motion, and \(\mu \) and \(0\le \rho <1\) are real-valued parameters. This is extended in [35] to an approximation of (1.6) with an implicit method on the basis of the \(\sigma \)\(\theta \) time-stepping scheme, where the finite variation parts of the double stochastic integral are taken implicit. This is further applied in [36] to Multi-index Monte Carlo estimation of expectations of a functional of the solution.

A finite difference scheme for a filtered jump-diffusion process resulting in a stochastic integro-differential equation is studied in [13], where convergence of order 1 in space and 1/2 in time, in \(L_2\) and \(L_\infty \) in space, is proven for an Euler time stepping scheme.

The theoretical results in this paper are an extension from those in [16] to the multi-dimensional case. They are more specific than those in [13] in that we analyse only the case of constant coefficient local SPDEs. In contrast to [2, 4], we consider only finite-dimensional Brownian motions, as is relevant in our applications. But we specifically include the case of Dirac initial data and extend the results to a practically attractive, semi-implicit alternating direction implicit factorisation in the context of the Milstein scheme.

We want to allow for Dirac initial data because they correspond to the natural situation where all particles in (1.4) start from the same initial position, or a filtering problem with known current state \(Z_0\) in (1.2).

Specifically, we study first the two-dimensional stochastic partial differential equation

$$\begin{aligned} \mathrm {d}v= & {} -\mu _x\frac{\partial v}{\partial x}\,\mathrm {d}t -\mu _y\frac{\partial v}{\partial y}\,\mathrm {d}t + \frac{1}{2}\bigg (\frac{\partial ^2 v}{\partial x^2} + 2\sqrt{\rho _x\rho _y}\rho _{xy}\frac{\partial ^2v}{\partial x\partial y} + \frac{\partial ^2 v}{\partial y^2} \bigg )\,\mathrm {d}t \nonumber \\&- \sqrt{\rho _x}\frac{\partial v}{\partial x}\,\mathrm {d}M_t^x - \sqrt{\rho _y}\frac{\partial v}{\partial y}\,\mathrm {d}M_t^y, \end{aligned}$$
(1.7)

for \(x,y\in \mathbb {R},\ 0<t\le T\), where \(\mu _x,\mu _y\) and \(0\le \rho _x,\rho _y< 1\), \(-1\le \rho _{xy}\le 1\) are real-valued parameters, subject to the Dirac initial data

$$\begin{aligned} v(0,x,y) = \delta (x-x_0)\otimes \delta (y-y_0), \end{aligned}$$
(1.8)

with \(x_0\) and \(y_0\) given. It is derived from the special case where the signal processes \(Z = (X,Y)'\) satisfies (1.2) with

$$\begin{aligned} \beta = \begin{bmatrix} \mu _x \\ \mu _y \end{bmatrix},\quad \sigma = \begin{bmatrix} \sqrt{1-\rho _x}&\quad 0 \\ 0&\quad \sqrt{1-\rho _y} \end{bmatrix},\quad \gamma = \begin{bmatrix} \sqrt{\rho _x}&\quad 0 \\ \sqrt{\rho _y}\rho _{xy}&\quad \sqrt{\rho _y(1-\rho _{xy}^2)} \end{bmatrix}. \end{aligned}$$

A classical result states that, for a class of SPDEs including (1.7), with initial condition in \(L_2\), there exists a unique solution \(v\in L_2(\varOmega \times (0,T), \mathscr {F}, L_2(\mathbb {R}^2))\) (see [28]). This does not include Dirac initial data (1.8), but in fact, the solution to (1.7) and (1.8) can be found analytically, similar to the heat equation, by solving the SDE (3.2) below (compared to an ODE in the case of the heat equation) in the Fourier space and transforming back, which yields a smooth (in x and y) function

$$\begin{aligned} v(T,x,y) = \frac{\exp \Big (-\frac{\big (x-x_0-\mu _x T-\sqrt{\rho _x}M_T^x\big )^2}{2(1-\rho _x)T}-\frac{\big (y-y_0-\mu _y T-\sqrt{\rho _y}M_T^y\big )^2}{2(1-\rho _y)T}\Big )}{2\pi \sqrt{(1-\rho _x)(1-\rho _y)}\,T}\,. \end{aligned}$$
(1.9)

The availability of a closed-form solution in this case helps us check the validity of our numerical scheme and its convergence rate, whereas the scheme itself is more widely applicable.

For the SPDE (1.7), we consider both explicit and implicit Milstein schemes. We study the mean-square stability, and the strong convergence of the second moment. This can give us an error bound for the expected error. The advantage over the simpler Euler scheme is that the strong convergence order is improved from 1/2 to 1. As expected, we find that the explicit scheme is stable in the mean-square sense only under a strong CFL-type condition on the timestep, \(k\le C h^2\) for timestep k, mesh size h, and a constant C, while the implicit scheme is mean-square stable under the very mild and somewhat unusual CFL condition \(k \le C |\log h|^{-1}\) (provided also some constraints on \(\rho _x,\rho _y,\rho _{xy}\)).

We therefore focus on the implicit scheme, for which we prove first order convergence in the timestep and second order in the spatial mesh size. The analysis is made more difficult by the Dirac initial datum compared to, say, \(L_2\) initial data. We adapt the approach used in [7] for the heat equation by studying the convergence for different wave number regions in Fourier space and then assemble the contributions to the error by the inverse transform.

Furthermore, we use an Alternating Direction Implicit (ADI) scheme to approximately factorise the discretisation matrix for the implicit elliptic part in (1.7). This concept is well established for PDEs (see, e.g., [9, 23, 34]). It is well known that in the multi-dimensional case standard implicit schemes result in sparse banded linear systems, which cannot be solved by direct elimination in a computational cost which scales linearly with the number of unknowns like in the one-dimensional, tridiagonal case. An alternative to advanced iterative linear solvers such as multigrid methods is to reduce the large sparse linear system approximately to a sequence of tridiagonal linear systems, which are computationally easier to handle, by ADI factorisation. To our knowledge, the present work is the first application of ADI to SPDEs. We show that the ADI approximation is also mean-square stable under the same conditions as the original implicit scheme and has the same convergence order.

We note that published analysis of ADI schemes for parabolic PDEs in the presence of mixed spatial derivative terms is currently restricted to constant coefficients (through the use of von Neumann stability analysis; see e.g. [39]). Notwithstanding this, the empirical evidence overwhelmingly suggests that the conclusions drawn there extend to most cases of variable coefficients.

We give a natural extension of the proposed scheme to the SPDE (1.1). In that case, additional iterated stochastic integrals (the Lévy area) appear in the Milstein approximation. The efficient, accurate simulation has been studied in the context of SDEs in [15, 26] and invariably leads to relatively complicated schemes. As the computational effort in the context of the SPDE (1.1) is dominated by the matrix computations from the finite difference scheme, it is sufficient to perform a simple approximation of the stochastic integrals \(\int _t^{t+k} (W_s-W_t) \, \mathrm{d}B_s\), for correlated Brownian motions W and B, by simple Euler integration with step \(k^2\), without adversely affecting the convergence.

As a specific application, we approximate the equation

$$\begin{aligned} \begin{aligned} \mathrm {d}u&= \bigg [\kappa _1 u - \Big (r_1 - \frac{1}{2}y - \xi _1\rho _3\rho _{1,1}\rho _{2,1}\Big ) \frac{\partial u}{\partial x} - \Big ( \kappa _1(\theta _1-y) - \xi _1^2 \Big )\frac{\partial u}{\partial y} \\&\quad + \frac{1}{2}y \frac{\partial ^2 u}{\partial x^2} + \xi _1\rho _3\rho _{1,1}\rho _{2,1}y\frac{\partial ^2 u}{\partial x\partial y} + \frac{\xi _1^2}{2}y\frac{\partial ^2 u}{\partial y^2} \bigg ]\,\mathrm {d}t \\&\quad - \rho _{1,1}\sqrt{y}\frac{\partial u}{\partial x}\,\mathrm {d}W_t - \xi _1\rho _{2,1}\frac{\partial }{\partial y}(\sqrt{y}u)\,\mathrm {d}B_t, \end{aligned} \end{aligned}$$
(1.10)

taken from [21], with the scheme presented in this paper. Although our analysis (based on Fourier transforms) does not directly apply in this case, the scheme preserves first order convergence in time and second order convergence in space in our numerical tests.

The sharp estimates we derive give a precise description of the error given Dirac initial data. This is achieved by a Fourier analysis originating from Carter and Giles in [7], where they estimated the error arising from explicit and implicit approximations of the constant-coefficient 1-d convection–diffusion equation with Dirac initial data.

Moreover, the stability and error analysis in the constant coefficient case provide an accurate description of the local behaviour at \((t,x)=(0,x_0)\) in the variable coefficient case.Footnote 1 This connection is established by “freezing” the coefficients at the point \((0,x_0)\) in a more general SPDE of the form (1.7), such that we obtain

$$\begin{aligned} \mathrm {d}v_0(t,x)&= \bigg (\frac{1}{2}\sum _{i,j=1}^d a_{ij}(x_0) \frac{\partial ^2}{\partial x_i\partial x_j}v_0(t,x) - \sum _{i=1}^d b_i(x_0)\frac{\partial }{\partial x_i}v_0(t,x)\bigg )\,\mathrm {d}t \nonumber \\&\quad - \sum _{i=1}^d \Big (\sum _{l=1}^m\gamma _{i,l}(x_0)\Big )\frac{\partial }{\partial x_i}v_0(t,x)\,\mathrm {d}M_t^l. \end{aligned}$$
(1.11)

It can be seen by elementary calculus that (1.11) in the case \(d=2\) can be brought in the form (1.7) by a linear coordinate transformation, noting that \(a- \gamma \gamma ^\top \) is symmetric positive definite by (1.3). This reduction of a broader class of problems to a test equation with a known solution is in the spirit of the well-established paradigm of stability analysis for ODEs described in [22].

We give a numerical justification for this reduction in Sect. 6, by presenting numerical tests for the SPDEs (1.10) and (1.11) to compare the error at \((x_0,y_0)\) between the original SPDE and SPDE with frozen coefficients at \((x_0,y_0)\). We find that the behaviour is extremely close, especially for small time intervals, which confirms the broader usefulness of the method.

Summarising, the novel contributions of this paper are as follows. We

  • Give a rigorous stability and error analysis for a Milstein finite difference scheme for the SPDE (1.7) in terms of \(L_2\) in probability, pointwise as well as \(L_2\) in space, deriving sharp leading order error terms;

  • Derive pointwise errors for Dirac initial data, which reveal a mild instability for large implicit timesteps and small spatial mesh sizes in this case, not seen in previous studies for \(L_2\) data;

  • Extend the analysis to an alternating direction implicit (ADI) factorisation, which, to our knowledge, is the first application of an ADI scheme to stochastic PDEs;

  • Propose a modification for the more general equation (1.1) through sub-simulation of the Lévy area, which is empirically shown to be of first order.

The rest of this article is structured as follows. We define the approximation schemes in Sect. 2. Then we analyse the mean-square stability and \(L_2\)-convergence in Sects. 3 and 4 in the constant coefficient case of (1.7). Section 5 shows numerical experiments confirming the above findings. Section 6 extends the scheme to variable coefficients as in (1.1) and presents tests for the example (1.10). Section 7 offers conclusions and directions for further research.

2 Approximation and main results

2.1 Semi-implicit Milstein finite difference scheme

First, we introduce the numerical scheme to the SPDE (1.7), repeated here for convenience,

$$\begin{aligned} \mathrm {d}v&= -\mu _x\frac{\partial v}{\partial x}\,\mathrm {d}t -\mu _y\frac{\partial v}{\partial y}\,\mathrm {d}t + \frac{1}{2}\bigg (\frac{\partial ^2 v}{\partial x^2} + 2\sqrt{\rho _x\rho _y}\rho _{xy}\frac{\partial ^2v}{\partial x\partial y} + \frac{\partial ^2 v}{\partial y^2} \bigg )\,\mathrm {d}t \\&\quad - \sqrt{\rho _x}\frac{\partial v}{\partial x}\,\mathrm {d}M_t^x - \sqrt{\rho _y}\frac{\partial v}{\partial y}\,\mathrm {d}M_t^y, \end{aligned}$$

with Dirac initial \(v(0,x,y) = \delta (x-x_0)\otimes \delta (y-y_0)\). We use a spatial grid with uniform spacing \(h_x,\,h_y>0\), and, for \(T>0\) fixed, N time steps of size \(k = T/N\). Let \(V_{i,j}^{n}\) be the approximation to \(v(nk,ih_x,jh_y)\), \(n=1,\ldots ,N\), \(i,j\in \mathbb {Z}\), where \(i_0:=[x_0/h_x],\ j_0:=[y_0/h_y]\), the closest integers to \(x_0/h_x\) and \(y_0/h_y\). We approximate v(0, xy) by

$$\begin{aligned} V_{i,j}^0 = h_x^{-1}h_y^{-1}\delta _{(i_0,\,j_0)} = {\left\{ \begin{array}{ll} h_x^{-1}h_y^{-1}, &{}\quad i=i_0,\ j=j_0,\\ 0,&{}\quad \text {otherwise}. \end{array}\right. } \end{aligned}$$
(2.1)

To improve the accuracy of the approximation of v in the present case of Dirac initial data, we subsequently choose \(h_x\) and \(h_y\) such that \(x_0/h_x\) and \(y_0/h_y\) are integers and therefore \(x_0\) and \(y_0\) are on the grid.

Extending the implicit Euler scheme in [35] for the 1-d case, it is natural to take the drift term

$$\begin{aligned} -\mu _x\frac{\partial v}{\partial x} -\mu _y\frac{\partial v}{\partial y} + \frac{1}{2}\bigg (\frac{\partial ^2 v}{\partial x^2} + 2\sqrt{\rho _x\rho _y}\rho _{xy}\frac{\partial ^2v}{\partial x\partial y} + \frac{\partial ^2 v}{\partial y^2} \bigg ) \end{aligned}$$

implicit, and the terms driven by \(M^x\) and \(M^y\) explicit. We will prove later that in this way we obtain better stability (compare Proposition 3.2 to Theorem 3.1). For computational simplicity, in the following, we take the mixed derivative term therein explicit. This is also in preparation for the ADI splitting schemes we will study later.

Using such a semi-implicit Euler scheme, the SPDE (1.7) can be approximated by

$$\begin{aligned} V^{n+1}&= V^n - \frac{\mu _x k}{2h_x}D_xV^{n+1} - \frac{\mu _y k}{2h_y}D_yV^{n+1} + \frac{k}{2h_x^2}D_{xx}V^{n+1} + \frac{k}{2h_y^2}D_{yy}V^{n+1}\\&\quad + \sqrt{\rho _x\rho _y}\rho _{xy}\frac{k}{4h_xh_y}D_{xy}V^n - \frac{\sqrt{\rho _x k}Z_{n,x}}{2h_x}D_xV^{n} - \frac{\sqrt{\rho _y k}\widetilde{Z}_{n,y}}{2h_y}D_yV^{n}, \end{aligned}$$

where

$$\begin{aligned}&(D_xV)_{i,j} = V_{i+1,j}-V_{i-1,j},\qquad \qquad \qquad \qquad \quad (D_yV)_{i,j} = V_{i,j+1}-V_{i,j-1},\\&(D_{xx}V)_{i,j} = V_{i+1,j}-2V_{i,j}+V_{i-1,j},\qquad \, (D_{yy}V)_{i,j} = V_{i,j+1}-2V_{i,j}+V_{i,j-1},\\&(D_{xy}V)_{i,j} = V_{i+1,j+1}-V_{i-1,j+1}-V_{i+1,j-1}+V_{i-1,j-1}, \end{aligned}$$

and \(\widetilde{Z}_n^y = \rho _{xy}Z_n^x + \sqrt{1-\rho _{xy}^2}Z_n^y\), with \(Z_n^x,Z_n^y\sim N(0,1)\) being independent normal random variables. To achieve a higher order of convergence, we introduce the Milstein scheme. Integrating (1.7) over the time interval \([nk, (n+1)k]\),

$$\begin{aligned} \begin{aligned} v\big (nk+k,x,y\big )&= v(nk,x,y) + \int _{nk}^{nk+k}\bigg (-\mu _x\frac{\partial v}{\partial x} -\mu _y\frac{\partial v}{\partial y} \\&\quad + \frac{1}{2}\frac{\partial ^2 v}{\partial x^2} + \sqrt{\rho _x\rho _y}\rho _{xy}\frac{\partial ^2v}{\partial x\partial y} + \frac{1}{2}\frac{\partial ^2 v}{\partial y^2}\bigg )\mathrm {d}s\\&\quad - \int _{nk}^{nk+k}\sqrt{\rho _x}\frac{\partial v}{\partial x}\,\mathrm {d}M^x_s -\int _{nk}^{nk+k}\sqrt{\rho _y}\frac{\partial v}{\partial y}\,\mathrm {d}M^y_s. \end{aligned} \end{aligned}$$

In the Euler scheme, we approximate all integrands by their value at time nk or \((n+1)k\), which is a zero-order expansion in time. By contrast, in the Milstein scheme, we use a first-order expansion for the stochastic integrals, such that we approximate \(v(s,x,y)\approx v(nk,x,y)\) for \(nk<s<(n+1)k\) in the first integral and

$$\begin{aligned} v(s,x,y)\approx & {} v(nk,x,y) - \sqrt{\rho _x}\frac{\partial v}{\partial x}(nk,x,y)(M_s^x - M_{nk}^x)\\&-\sqrt{\rho _y}\frac{\partial v}{\partial y}(nk,x,y)(M_s^y - M_{nk}^y) \end{aligned}$$

in the second and third. We denote nk as t, and it follows

$$\begin{aligned}&- \int _t^{t+k}\sqrt{\rho _x}\frac{\partial v}{\partial x}(s,x,y)\,\mathrm {d}M^x_s -\int _t^{t+k}\sqrt{\rho _y}\frac{\partial v}{\partial y}(s,x,y)\,\mathrm {d}M^y_s\\&\quad \approx -\sqrt{\rho _x}\frac{\partial v}{\partial x}(t,x,y)\varDelta M_n^x - \sqrt{\rho _y}\frac{\partial v}{\partial y}(t,x,y)\varDelta M_n^y\\&\qquad + \; \rho _x\frac{\partial ^2v}{\partial x^2}(t,x,y)\int _t^{t+k}(M_s^x-M_t^x)\,\mathrm {d}M_s^x\\&\qquad + \rho _y\frac{\partial ^2v}{\partial y^2}(t,x,y)\int _t^{t+k}(M_s^y-M_t^y)\,\mathrm {d}M_s^y + \sqrt{\rho _x\rho _y}\frac{\partial ^2 v}{\partial x\partial y}(t,x,y)\\&\quad \bigg (\int _t^{t+k}(M_s^x-M_t^x)\,\mathrm {d}M_s^y + \int _t^{t+k}(M_s^y-M_t^y)\,\mathrm {d}M_s^x\bigg ), \end{aligned}$$

where

$$\begin{aligned} \varDelta M_n^x= M_{t+k}^x - M_t^x = \sqrt{k}Z_n^x,\quad \varDelta M_n^y= M_{t+k}^y - M_t^y = \sqrt{k}\widetilde{Z}_n^y. \end{aligned}$$

From standard Itô calculus, we have

$$\begin{aligned}&\int _t^{t+k}(M_s^x-M_t^x)\,\mathrm {d}M_s^x = \frac{1}{2}\Big ((\varDelta M_n^x)^2-k\Big ),\qquad \quad \\&\int _t^{t+k}(M_s^y-M_t^y)\,\mathrm {d}M_s^y = \frac{1}{2}\Big ((\varDelta M_n^y)^2-k\Big ),\\&\int _t^{t+k}(M_s^x-M_t^x)\,\mathrm {d}M_s^y + \int _t^{t+k}(M_s^y-M_t^y)\,\mathrm {d}M_s^x = \varDelta M_n^x \varDelta M_n^y - \rho _{xy}k. \end{aligned}$$

We see that the mixed-derivative terms cancel, and we derive the implicit Milstein scheme as follows,

$$\begin{aligned} \begin{aligned}&\bigg (I + \frac{\mu _x k}{2h_x}D_x + \frac{\mu _y k}{2h_y}D_y - \frac{k}{2h_x^2}D_{xx} - \frac{k}{2h_y^2}D_{yy}\bigg )V^{n+1}\\&\quad = \bigg (I - \frac{\sqrt{\rho _x k}\,Z_{n,x}}{2h_x}D_x - \frac{\sqrt{\rho _y k}\,\widetilde{Z}_{n,y}}{2h_y}D_y + \frac{\rho _xk(Z_{n,x}^2-1)}{8h_x^2}D_x^2 \\&\qquad + \frac{\rho _yk(\widetilde{Z}_{n,y}^2-1)}{8h_y^2}D_y^2 + \frac{\sqrt{\rho _x\rho _y}\,k\,Z_{n,x}\widetilde{Z}_{n,y}}{4h_xh_y}D_{xy}\bigg )V^n. \end{aligned} \end{aligned}$$
(2.2)

To facilitate its implementation, we combine the scheme with an Alternating Direction Implicit (ADI) factorisation, which has been introduced in [34] for parabolic PDEs to approximately factorise the system matrix by matrices which correspond to derivatives in individual directions and which can thus more easily be inverted, while the consistency order is maintained. Applying this principle to the implicit terms on the left-hand side of (2.2),

$$\begin{aligned}&\bigg (I + \frac{\mu _x k}{2h_x}D_x + \frac{\mu _y k}{2h_y}D_y - \frac{k}{2h_x^2}D_{xx} - \frac{k}{2h_y^2}D_{yy}\bigg )V\\&\quad = \bigg (I + \frac{\mu _x k}{2h_x}D_x - \frac{k}{2h_x^2}D_{xx}\bigg )\bigg (I + \frac{\mu _y k}{2h_y}D_y - \frac{k}{2h_y^2}D_{yy}\bigg )V \\&\qquad - k^2\bigg (\frac{\mu _x}{2h_x}D_x - \frac{1}{2h_x^2}D_{xx}\bigg ) \bigg (\frac{\mu _y}{2h_y}D_y - \frac{1}{2h_y^2}D_{yy}\bigg )V. \end{aligned}$$

Since the second term on right-hand side proportional to \(k^2\) is of higher order than the overall scheme (which is of first order), it is justifiable in terms of the consistency order to neglect it. This yields the ADI scheme as follows,

$$\begin{aligned} \begin{aligned}&\bigg (I + \frac{\mu _x k}{2h_x}D_x - \frac{k}{2h_x^2}D_{xx}\bigg )\bigg (I + \frac{\mu _y k}{2h_y}D_y - \frac{k}{2h_y^2}D_{yy}\bigg )V^{n+1}\\&\quad =\bigg ( I - \frac{\sqrt{\rho _x k}Z_{n,x}}{2h_x}D_x - \frac{\sqrt{\rho _y k}\widetilde{Z}_{n,y}}{2h_y}D_y + \frac{\rho _xk(Z_{n,x}^2-1)}{8h_x^2}D_x^2 \\&\qquad + \frac{\rho _yk(\widetilde{Z}_{n,y}^2-1)}{8h_y^2}D_y^2 + \frac{\sqrt{\rho _x\rho _y}kZ_{n,x}\widetilde{Z}_{n,y}}{4h_xh_y}D_{xy}\bigg )V^n. \end{aligned} \end{aligned}$$
(2.3)

Note that there is no substantial benefit in second order accurate splitting schemes (such as Craig–Sneyd [9] or Hundsdorfer–Verwer [23]) as the overall order is limited to 1 by the Milstein approximation to the stochastic integral.

We approximate the second derivative on the right-hand side with \(D_x^2\) and \(D_y^2\), but the results for \(D_{xx}\) and \(D_{yy}\) would be similar.

We can also use the explicit Milstein finite difference scheme to approximate the SPDE (1.7)

$$\begin{aligned} \begin{aligned} V^{n+1}&= \bigg ( I -\frac{\mu _x k + \sqrt{\rho _x k}Z_{n,x}}{2h_x}D_x - \frac{\mu _y k + \sqrt{\rho _y k}\widetilde{Z}_{n,y}}{2h_y}D_y + \frac{k}{2h_x^2}D_{xx} + \frac{k}{2h_y^2}D_{yy}\\&\quad + \frac{\rho _xk(Z_{n,x}^2-1)}{8h_x^2}D_x^2 + \frac{\rho _yk(\widetilde{Z}_{n,y}^2-1)}{8h_y^2}D_y^2 + \frac{\sqrt{\rho _x\rho _y}kZ_{n,x}\widetilde{Z}_{n,y}}{4h_xh_y}D_{xy}\bigg )V^n, \end{aligned} \end{aligned}$$
(2.4)

but, as we will see, this scheme is stable only under a restrictive condition on the timestep.

2.2 Main convergence results

The following theorems describe the mean-square stability and convergence of the implicit finite difference scheme (2.2) and the ADI scheme (2.3). We make the following assumption:

Assumption 2.1

Let \(0\le \rho _x,\rho _y < 1,\, -1\le \rho _{xy}\le 1\) such that

$$\begin{aligned}&2\rho _x^2(1+2|\rho _{xy}|) < 1, \end{aligned}$$
(2.5a)
$$\begin{aligned}&2\rho _y^2(1+2|\rho _{xy}|) < 1, \end{aligned}$$
(2.5b)
$$\begin{aligned}&2\rho _x\rho _y(3\rho _{xy}^2 + 2|\rho _{xy}|+1 ) < 1. \end{aligned}$$
(2.5c)

In Sect. 3 we show that Assumption 2.1 is a sufficient condition for stability of the schemes (2.2) and (2.3).Footnote 2 If \(\rho _{xy}=0\), these conditions reduce to \(2\rho _x^2\le 1\) and \(2\rho _y^2\le 1\), which is analogous to the condition for mean-square stability in the 1-dimensional case in [35]. In the worst case, \(|\rho _{xy}|=1\), sufficient conditions are \(\rho _x,\rho _y\le 1/\sqrt{6}\), and \(\rho _x\rho _y\le 1/12\).

First, we recall the setting of our numerical schemes. For \(T>0\) fixed, we discretise [0, T] by N steps, and let \(k=T/N\) be the timestep. We discretise space \(\mathbb {R}^2\) with mesh sizes \(h_x\) and \(h_y\).

The following theorem shows the convergence of the implicit Milstein difference scheme (2.2). The constant \(\theta \in (0,1)\) therein is determined by the parameters \(\rho _x,\rho _y\) and \(\rho _{xy}\). The proof of Lemma 4.5 gives an explicit value; though not sharp, this is sufficient to highlight the divergence for \(h_x, h_y \rightarrow 0\) when k is fixed.

Theorem 2.1

Let \(T>0\), \(k=T/N\), \(h_x>0\) and \(h_y>0\) be mesh sizes. Then, under Assumption 2.1, there exists \(\theta \in (0,1)\), independent of \(h_x,h_y\) and k, such that the implicit Milstein finite difference scheme (2.2) has the error expansion

$$\begin{aligned} \begin{aligned} V_{i,j}^N-v(T,x_i,y_j)&= k\,E_1(T,x_i,y_j) + h_x^2\,E_2(T,x_i,y_j) \\&\quad + h_y^2\,E_3(T,x_i,y_j) + \theta ^{N}h_x^{-2} \,E_4(T,x_i,y_j)\\&\quad + \theta ^{N}h_y^{-2} \,E_5(T,x_i,y_j) \\&\quad + o(k,h_x^2,h_y^2,\theta ^{N}h_x^{-2},\theta ^{N}h_y^{-2})\,R(T,x_i,y_j), \end{aligned} \end{aligned}$$
(2.6)

where \(x_i = ih_x\), \(y_j = jh_y\), \(E_1,\ldots ,E_5,\) and R are random variables with bounded first and second moments, all independent of \(h_x\), \(h_y\) and k.

Proof

See Sect. 4. \(\square \)

Remark 2.1

In the setting of Theorem 2.1 with \(\widehat{h} = \min \{h_x,h_y\}\), if

$$\begin{aligned} \theta ^{\frac{T}{k}} \le 2^{-C_0}\cdot \widehat{h}^{4+\beta }, \end{aligned}$$

for some \(\beta , C_0>0\) independent of \(h_x\), \(h_y\) and k, or, equivalently,

$$\begin{aligned} k\le \frac{T\log _2(\theta ^{-1})}{C_0+(4+\beta )\log _2(\widehat{h}^{-1})}, \end{aligned}$$
(2.7)

then the implicit Milstein scheme (2.2) has the error expansion

$$\begin{aligned} V_{i,j}^N-v(T,x_i,y_j)= & {} k\,E_1(T,x_i,y_j) + h_x^2\,E_2(T,x_i,y_j)\\&+\, h_y^2\,E_3(T,x_i,y_j) + o(k,h_x^2,h_y^2)\,R(T,x_i,y_j). \end{aligned}$$

Corollary 2.1

Under the conditions of Theorem 2.1 and Remark 2.1, the error of the implicit Milstein scheme (2.2) at time T satisfies, for all \(i,j \in \mathbb {Z}\),

$$\begin{aligned} \sqrt{\mathbb {E}\big [| V_{i,j}^N-v(T,ih_x,jh_y)|^2\big ]}= O(h_x^2) + O(h_y^2) + O(k). \end{aligned}$$
(2.8)

For the ADI discretisation scheme (2.3), a similar convergence result holds.

Theorem 2.2

Under the conditions of Remark 2.1, the error of the ADI scheme (2.3) has the same order as for the implicit Milstein scheme,

$$\begin{aligned} V_{i,j}^N-v(T,x_i,y_j)= & {} k\,E_1(T,x_i,y_j) + h_x^2\,E_2(T,x_i,y_j)\\&+\, h_y^2\,E_3(T,x_i,y_j) + o(k,h_x^2,h_y^2)\,R(T,x_i,y_j), \end{aligned}$$

where \(E_1\), \(E_2\), \(E_3\) and R are random variables with bounded first and second moments.

Proof

See Sect. 4. \(\square \)

Theorems 2.1 and 2.2 state the convergence pointwise in space and \(L_2\) in probability. The discrete \(L_2\) error in space and (continuous) in probability is defined as

$$\begin{aligned}&\sqrt{\mathbb {E}\Bigg [\ \Bigg |\sqrt{\sum _{i,j}\Big [\big | V_{i,j}^N-v(T,ih_x,jh_y)\big |^2\Big ]h_xh_y}\ \Bigg |^2\ \Bigg ]} \nonumber \\&\quad = \sqrt{\sum _{i,j}\mathbb {E}\Big [\big | V_{i,j}^N-v(T,ih_x,jh_y)\big |^2\Big ]h_xh_y}, \end{aligned}$$
(2.9)

where the term within the outer \(|\cdot |\) is the discrete spatial \(L_2\) norm. Applying Parseval’s theorem to (2.9), we get:

Corollary 2.2

Under the conditions of Theorem 2.1, the discrete \(L_2\) error in space and \(L_2\) in probability of the implicit Milstein scheme (2.2) at time T, defined in (2.9), satisfies,

$$\begin{aligned}&\sqrt{\sum _{i,j}\mathbb {E}\Big [\big | V_{i,j}^N-v(T,ih_x,jh_y)\big |^2\Big ]h_xh_y}\nonumber \\&\quad = O(h_x^2) + O(h_y^2) + O(k) + O(\theta ^N h_x^{-1/2}h_y^{-1/2}). \end{aligned}$$
(2.10)

If the initial condition lies in \(L_2\), then

$$\begin{aligned} \sqrt{\sum _{i,j}\mathbb {E}\Big [\big | V_{i,j}^N-v(T,ih_x,jh_y)\big |^2\Big ]h_xh_y}= O(h_x^2) + O(h_y^2) + O(k). \end{aligned}$$

Proof

See Sect. 4. \(\square \)

3 Fourier analysis of mean-square stability

Recall the SPDE (1.7),

$$\begin{aligned} \mathrm {d}v= & {} \left[ -\mu _x\frac{\partial v}{\partial x} - \mu _y\frac{\partial v}{\partial y} + \frac{1}{2}\bigg (\frac{\partial ^2 v}{\partial x^2} + 2\sqrt{\rho _x\rho _y}\rho _{xy}\frac{\partial ^2v}{\partial x\partial y} + \frac{\partial ^2 v}{\partial y^2} \bigg )\right] \mathrm {d}t\nonumber \\&\quad - \sqrt{\rho _x}\frac{\partial v}{\partial x}\mathrm {d}M_t^x - \sqrt{\rho _y}\frac{\partial v}{\partial y}\mathrm {d}M_t^y . \end{aligned}$$
(3.1)

Define the Fourier transform pair

$$\begin{aligned} \widetilde{v}(t,\xi ,\eta )&= \int _{-\infty }^\infty \int _{-\infty }^\infty v(t,x,y)\mathrm {e}^{-\mathrm {i}\xi x -\mathrm {i}\eta y}\,\mathrm {d}x\,\mathrm {d}y,\\ v(t,x,y)&=\frac{1}{4\pi ^2}\int _{-\infty }^\infty \int _{-\infty }^\infty \widetilde{v}(t,\xi ,\eta )\mathrm {e}^{\mathrm {i}\xi x + \mathrm {i}\eta y}\,\mathrm {d}\xi \,\mathrm {d}\eta . \end{aligned}$$

The Fourier transform of (3.1) yields

$$\begin{aligned} \mathrm {d}\widetilde{v} {=} -\bigg (\big (\mathrm {i}\mu _x\xi + \mathrm {i}\mu _y\eta + \frac{1}{2}\xi ^2 + \sqrt{\rho _x\rho _y}\rho _{xy}\xi \eta + \frac{1}{2}\eta ^2 \big )\,\mathrm {d}t + \mathrm {i}\sqrt{\rho _x}\xi \,\mathrm {d}M_t^x + \mathrm {i}\sqrt{\rho _y}\eta \,\mathrm {d}M_t^y \bigg )\widetilde{v}, \end{aligned}$$
(3.2)

subject to the initial data \(\widetilde{v}(0) = \mathrm {e}^{-\mathrm {i}\xi x_0 -\mathrm {i}\eta y_0}.\) For the remainder of the analysis, we take \(\mu _x = \mu _y =0\). This does not alter the results (see Remark 2.3 in [35] for the 1d case).

The solution to (3.2) is

$$\begin{aligned} \widetilde{v}(t) = X(t)\mathrm {e}^{-\mathrm {i}\xi x_0 -\mathrm {i}\eta y_0}, \end{aligned}$$
(3.3)

where

$$\begin{aligned} X(t) = \exp \bigg (-\frac{1}{2}(1-\rho _x)\xi ^2t -\frac{1}{2}(1-\rho _y)\eta ^2t -\mathrm {i}\xi \sqrt{\rho _x}M_t^x - \mathrm {i}\eta \sqrt{\rho _y}M_t^y\bigg ). \end{aligned}$$
(3.4)

From this we see that \(\mathbb {E}\big [|X(t)|^2\big ]\) goes to zero exponentially as \(t\rightarrow \infty \), as well as \(\mathbb {E}\big [\Vert \tilde{v}(t,\cdot ,\cdot )\Vert _{L_2}^2\big ]\) and, by isometry, \(\mathbb {E}\big [\Vert {v}(t,\cdot ,\cdot )\Vert _{L_2}^2\big ]\). The latter holds not only for Dirac initial data, but also for initial data in \(L_2\). We therefore say that \(v=0\) is a mean-square stable equilibrium solution.

For the numerical solution, we can use a discrete-continuous Fourier decomposition (note that we approximate \(v(nk,ih_x,jh_y)\) by \(V_{i,j}^n\))

$$\begin{aligned} V_{i,j}^0 = \frac{1}{4\pi ^2 h_xh_y}\int _{-\pi }^{\pi }\int _{-\pi }^{\pi } \widetilde{V}^0(u,v)\mathrm {e}^{\mathrm {i}\big ((i-i_0)u + (j-j_0)v\big )}\,\mathrm {d}u\,\mathrm {d}v, \end{aligned}$$

where \(i_0 = x_0/h_x\), \(j_0 = y_0/h_y\), and

$$\begin{aligned} \widetilde{V}^0(u,v) = h_xh_y\sum _{i=-\infty }^\infty \sum _{j=-\infty }^\infty \ V_{i,j}^0\mathrm {e}^{\mathrm {i}\big (-(i-i_0)u-(j-j_0)v\big )}. \end{aligned}$$

From (2.1), \(V_{i,j}^0 = h_x^{-1}h_y^{-1}\delta _{(i_0,\,j_0)}\), we have \(\widetilde{V}^0(u,v) = 1\) for all \((u,v)\in \mathbb {R}^2\). Similarly for n-th time-step,

$$\begin{aligned} \begin{aligned} V_{i,j}^n&= \frac{1}{4\pi ^2 h_xh_y}\int _{-\pi }^{\pi }\int _{-\pi }^{\pi } \widetilde{V}^n(u,v)\mathrm {e}^{\mathrm {i}\big ((i-i_0)u + (j-j_0)v\big )}\,\mathrm {d}u\,\mathrm {d}v\\&= \frac{1}{4\pi ^2 }\int _{-\frac{\pi }{h_y}}^{\frac{\pi }{h_y}}\int _{-\frac{\pi }{h_x}}^{\frac{\pi }{h_x}} \widetilde{V}^n(\xi ,\eta )\mathrm {e}^{\mathrm {i}\big ((i-i_0)\xi h_x + (j-j_0)\eta h_y\big )}\,\mathrm {d}\xi \,\mathrm {d}\eta . \end{aligned} \end{aligned}$$
(3.5)

In the last step, we integrate by substitution, \(\xi =u/h_x,\ \eta =v/h_y\).

By analogy with the theoretical solution \(\widetilde{v}(t) = X(t)\widetilde{v}(0)\), we make the ansatz

$$\begin{aligned} \widetilde{V}^n(\xi ,\eta ) = X_n(\xi ,\eta )\widetilde{V}^0(\xi ,\eta ), \end{aligned}$$
(3.6)

but as \(\widetilde{V}^0(\xi ,\eta )=1\) we simply have \(\widetilde{V}^n(\xi ,\eta ) = X_n(\xi ,\eta )\). We can regard \(X_n(\xi ,\eta )\) as the numerical approximation to X(nk) in (3.4).

We say that the equilibrium solution zero of the scheme is asymptotically mean-square stable, provided for any \((\xi ,\eta ) \in [-\pi /h_x,\pi /h_x]\times [-\pi /h_y,\pi /h_y]\),

$$\begin{aligned} \lim _{n\rightarrow \infty }\mathbb {E}\left[ |X_n(\xi ,\eta )|^2\right] = 0. \end{aligned}$$
(3.7)

This concept has been defined in the context of systems of SDEs in Definition 2.2, 3., in [5], and we apply it here to a fixed wave number in the Fourier domain. A generalisation to SPDEs is analysed in [31] (see Definition 2.1 therein). We will show convergence in \(L_2\) (for fixed T) directly under the same conditions; see [35] for mean-square stability and convergence of a 1-d parabolic SPDE. If (3.7) holds without any restriction between \(h_x\), \(h_y\) and k, we call it unconditionally stable. This leads to three conditions summarised in Assumption 2.1, as shown by the following.

Theorem 3.1

The implicit Milstein finite difference scheme (2.2) is unconditionally stable in the mean-square sense of (3.7) provided Assumption 2.1 holds.

Proof

By inserting (3.5) and (3.6) in (2.2), we have

$$\begin{aligned} \begin{aligned} X_{n+1}(\xi ,\eta )&= \frac{1}{1-(a_x+a_y)k}\bigg (1 -\mathrm {i}c_x\sqrt{\rho _xk}Z_{n,x} -\mathrm {i}c_y\sqrt{\rho _yk}\widetilde{Z}_{n,y}\\&\quad + b_x\rho _xk(Z_{n,x}^2-1) + b_y\rho _yk(\widetilde{Z}_{n,y}^2-1) \\&\quad + d\sqrt{\rho _x\rho _y}kZ_{n,x}\widetilde{Z}_{n,y}\bigg )X_n(\xi ,\eta ), \end{aligned} \end{aligned}$$
(3.8)

where

$$\begin{aligned} a_x&= -\frac{2\sin ^2\frac{\xi h_x}{2}}{h_x^2},\qquad b_x = -\frac{\sin ^2\xi h_x}{2h_x^2},\qquad c_x = \frac{\sin \xi h_x}{h_x},\qquad d = -\frac{\sin \xi h_x\sin \eta h_y}{h_xh_y}, \end{aligned}$$
(3.9a)
$$\begin{aligned} a_y&= -\frac{2\sin ^2\frac{\eta h_y}{2}}{h_y^2},\qquad b_y = -\frac{\sin ^2\eta h_y}{2h_y^2},\qquad c_y = \frac{\sin \eta h_y}{h_y}. \end{aligned}$$
(3.9b)

Given the time-homogeneity of (3.8) and noting that the bracketed term in (3.8) and \(X_n\) are independent, \(\mathbb {E}|X_{n+1}|^2/\mathbb {E}|X_n|^2\) is independent of n. Hence, to ensure mean-square stability, it is necessary and sufficient that for any \((\xi ,\eta )\) we have

$$\begin{aligned} \mathbb {E}\left| \frac{1 -\mathrm {i}c_x\sqrt{\rho _xk}\,Z_{n,x} - \mathrm {i}c_y\sqrt{\rho _yk}\,\widetilde{Z}_{n,y} + b_x\rho _xk(Z_{n,x}^2-1) + b_y\rho _yk(\widetilde{Z}_{n,y}^2-1) + d\sqrt{\rho _x\rho _y}\,k\,Z_{n,x}\widetilde{Z}_{n,y}}{1-k(a_x+a_y)} \right| ^2 < 1. \end{aligned}$$

This is equivalent to

$$\begin{aligned} \begin{aligned}&\mathbb {E}\left[ \left( 1+b_x\rho _xk(Z_{n,x}^2-1) + b_y\rho _yk(\widetilde{Z}_{n,y}^2-1) + d\sqrt{\rho _x\rho _y}kZ_{n,x}\widetilde{Z}_{n,y} \right) ^2 \right. \\&\qquad \left. + \left( c_x\sqrt{\rho _xk}Z_{n,x} + c_y\sqrt{\rho _yk}\widetilde{Z}_{n,y} \right) ^2 \right] \\&\quad < \left( 1-k(a_x+a_y)\right) ^2. \end{aligned} \end{aligned}$$
(3.10)

Note that \(\widetilde{Z}_n^y = \rho _{xy}Z_n^x + \sqrt{1-\rho _{xy}^2}Z_n^y,\) with \(Z_n^x,Z_n^y\sim N(0,1)\) being independent normal random variables, hence

$$\begin{aligned} \mathbb {E}\big [Z_{n,x}^2\widetilde{Z}_{n,y}^2\big ] = 1+2\rho _{xy}^2,\quad \mathbb {E}\big [Z_{n,x}\widetilde{Z}_{n,y}\big ] = \rho _{xy},\quad \mathbb {E}\big [Z_{n,x}^3\widetilde{Z}_{n,y}\big ] = \mathbb {E}\big [Z_{n,x}\widetilde{Z}_{n,y}^3\big ] = 3\rho _{xy}\;, \end{aligned}$$

and

$$\begin{aligned}&\mathbb {E}\left[ \left( 1 + b_x\rho _xk(Z_{n,x}^2-1) + b_y\rho _yk(\widetilde{Z}_{n,y}^2-1) + d\sqrt{\rho _x\rho _y}kZ_{n,x}\widetilde{Z}_{n,y} \right) ^2\right] \\&\quad = 1 + b_x^2\rho _x^2k^2\,\mathbb {E}\big [(Z_{n,x}^2-1)^2\big ] + b_y^2\rho _y^2k^2\,\mathbb {E}\big [(\widetilde{Z}_{n,y}^2-1)^2\big ]+ d^2\rho _x\rho _yk^2\,\mathbb {E}\big [Z_{n,x}^2\widetilde{Z}_{n,y}^2\big ]\\&\qquad + 2b_x\rho _xk\,\mathbb {E}\big [Z_{n,x}^2-1\big ] + 2b_y\rho _yk\,\mathbb {E}\big [\widetilde{Z}_{n,y}^2-1\big ] + 2d\sqrt{\rho _x\rho _y}k\,\mathbb {E}\big [Z_{n,x}\widetilde{Z}_{n,y}\big ]\\&\qquad + 2b_xb_y\rho _x\rho _yk^2\,\mathbb {E}\big [(Z_{n,x}^2-1)(\widetilde{Z}_{n,y}^2-1) \big ] + 2b_xd\rho _x\sqrt{\rho _x\rho _y}k^2\,\mathbb {E}\big [(Z_{n,x}^2-1)Z_{n,x}\widetilde{Z}_{n,y} \big ]\\&\qquad + 2b_yd\rho _y\sqrt{\rho _x\rho _y}k^2\,\mathbb {E}\big [(\widetilde{Z}_{n,y}^2-1)Z_{n,x}\widetilde{Z}_{n,y} \big ]\\&\quad = 1 + 2b_x^2\rho _x^2k^2 + 2b_y^2\rho _y^2k^2 + d^2\rho _x\rho _y(1+2\rho _{xy}^2)k^2 + 2d\sqrt{\rho _x\rho _y}\rho _{xy}k + 4b_xb_y\rho _x\rho _y\rho _{xy}^2k^2\\&\qquad + 4b_xd\rho _x\sqrt{\rho _x\rho _y}\rho _{xy}k^2 + 4b_yd\rho _y\sqrt{\rho _x\rho _y}\rho _{xy}k^2, \end{aligned}$$

and \(\mathbb {E}\left[ \left( c_x\sqrt{\rho _xk}Z_{n,x} + c_y\sqrt{\rho _yk}\widetilde{Z}_{n,y} \right) ^2\right] = c_x^2\rho _xk + c_y^2\rho _yk + 2c_xc_y\sqrt{\rho _x\rho _y}\rho _{xy}k.\)

Note that \(c_xc_y + d=0,\ 4b_xb_y = d^2\), and

$$\begin{aligned} b_x&= \cos ^2\frac{\xi h_x}{2} a_x,\quad b_y = \cos ^2\frac{\eta h_y}{2}a_y,\quad c_x^2 = -2\cos ^2\frac{\xi h_x}{2} a_x,\\ c_y^2&= -2\cos ^2\frac{\eta h_y}{2}a_y,\quad d^2 = 4 \cos ^2\frac{\xi h_x}{2}\cos ^2\frac{\eta h_y}{2} a_x a_y. \end{aligned}$$

Therefore

$$\begin{aligned}&\mathbb {E}\left[ \Big ( 1 + b_x\rho _xk(Z_{n,x}^2-1) + b_y\rho _yk(\widetilde{Z}_{n,y}^2-1) + d\sqrt{\rho _x\rho _y}kZ_{n,x}\widetilde{Z}_{n,y} \Big )^2 \right. \\&\qquad \left. + \Big ( c_x\sqrt{\rho _xk}Z_{n,x} + c_y\sqrt{\rho _yk}\widetilde{Z}_{n,y} \Big )^2 \right] \\&\quad = 1 + 2\cos ^4\frac{\xi h_x}{2} a_x^2\rho _x^2k^2 + 2\cos ^4\frac{\eta h_y}{2} a_y^2\rho _y^2k^2 \\&\qquad + 4 \cos ^2\frac{\xi h_x}{2}\cos ^2\frac{\eta h_y}{2}\rho _x\rho _y(1+3\rho _{xy}^2) a_x a_y k^2\\&\qquad + 4d\sqrt{\rho _x\rho _y}\rho _{xy}k^2\big (\cos ^2\frac{\xi h_x}{2} a_x\rho _x + \cos ^2\frac{\eta h_y}{2}a_y\rho _y\big ) \\&\qquad -2\cos ^2\frac{\xi h_x}{2}a_x \rho _xk -2\cos ^2\frac{\eta h_y}{2}a_y\rho _y k\\&\quad \le 1 + 2\cos ^4\frac{\xi h_x}{2} a_x^2\rho _x^2k^2 + 2\cos ^4\frac{\eta h_y}{2} a_y^2\rho _y^2k^2 \\&\qquad + 4 \cos ^2\frac{\xi h_x}{2}\cos ^2\frac{\eta h_y}{2}\rho _x\rho _y(1+3\rho _{xy}^2) a_x a_y k^2\\&\qquad +4\vert \rho _{xy}\vert k^2 \big (\cos ^2\frac{\xi h_x}{2} a_x\rho _x + \cos ^2\frac{\eta h_y}{2}a_y\rho _y\big )^2\\&\qquad -2\cos ^2\frac{\xi h_x}{2}a_x \rho _xk -2\cos ^2\frac{\eta h_y}{2}a_y\rho _y k\\&\quad \le 1 + 2\rho _x^2(1+2\vert \rho _{xy}\vert )a_x^2k^2 + 2\rho _y^2(1+2\vert \rho _{xy}\vert )a_y^2k^2 \\&\qquad + 4 \rho _x\rho _y(3\rho _{xy}^2 + 2\vert \rho _{xy}\vert + 1) a_x a_y k^2\\&\qquad -2\cos ^2\frac{\xi h_x}{2}a_x \rho _xk -2\cos ^2\frac{\eta h_y}{2}a_y\rho _y k, \end{aligned}$$

and \(\left( 1-k(a_x+a_y)\right) ^2 = 1 + k^2a_x^2 - 2ka_x + k^2a_y^2 - 2ka_y + 2k^2a_xa_y. \)

One sufficient condition for (3.10) to hold is

$$\begin{aligned}&2\rho _x^2(1+2\vert \rho _{xy}\vert )a_x^2k^2 - 2\cos ^2\frac{\xi h_x}{2}a_x \rho _xk< k^2a_x^2 - 2ka_x,\\&2\rho _y^2(1+2\vert \rho _{xy}\vert )a_y^2k^2 -2\cos ^2\frac{\eta h_y}{2}a_y\rho _y k < k^2a_y^2 - 2ka_y,\\&4 \rho _x\rho _y(3\rho _{xy}^2 + 2\vert \rho _{xy}\vert + 1) a_x a_y k^2 \le 2k^2a_xa_y. \end{aligned}$$

Replacing \(a_x,a_y\) with their expressions in (3.9), the above is equivalent to

$$\begin{aligned}&\frac{k}{h_x^2}\Big (2\rho _x^2(1+2|\rho _{xy}|)-1\Big ) \sin ^2\frac{\xi h_x}{2}+ \rho _{x}\cos ^2 \frac{\xi h_x}{2}< 1,\\&\frac{k}{h_y^2}\Big (2\rho _y^2(1+2|\rho _{xy}|)-1\Big ) \sin ^2\frac{\eta h_y}{2}+ \rho _{y}\cos ^2 \frac{\eta h_y}{2} < 1,\\&2 \rho _x\rho _y(3\rho _{xy}^2 + 2\vert \rho _{xy}\vert + 1) \le 1. \end{aligned}$$

A sufficient condition for this is that \(\rho _x,\rho _y,\rho _{xy}\) satisfy (2.5). \(\square \)

Now we prove the stability of the ADI scheme.

Proposition 3.1

Under Assumption 2.1, mean-square stability (3.7) also holds for the ADI scheme (2.3).

Proof

By insertion in (2.3), we have

$$\begin{aligned} \begin{aligned} X_{n+1}(\xi ,\eta )&= \frac{1}{(1-a_xk)(1-a_yk)}\bigg (1 -\mathrm {i}c_x\sqrt{\rho _xk}Z_{n,x} -\mathrm {i}c_y\sqrt{\rho _yk}\widetilde{Z}_{n,y}\\&\quad + b_x\rho _xk(Z_{n,x}^2-1) + b_y\rho _yk(\widetilde{Z}_{n,y}^2-1) \\&\quad + d\sqrt{\rho _x\rho _y}kZ_{n,x}\widetilde{Z}_{n,y}\bigg )X_n(\xi ,\eta ). \end{aligned} \end{aligned}$$
(3.11)

As in the proof of Theorem 3.1, we need \(\mathbb {E}|X_{n+1}|^2/ \mathbb {E}|X_n|^2 <1\) to achieve stability. Since \(|1-(a_x+a_y)k|\le |(1-a_xk)(1-a_y k)|\) for \(a_x,a_y\le 0\), the stability also holds for the ADI scheme. \(\square \)

Proposition 3.2

The explicit Milstein (finite difference) scheme (2.4) is stable in the mean-square sense provided

$$\begin{aligned} \frac{k}{h_x^2}&\le \big (2+2\rho _x^2 + 2\rho _x\rho _y +\big (3\rho _x+\rho _y+4\rho _x^2 +4\rho _x\rho _y\big )|\rho _{xy}| + 6\rho _x\rho _y\rho _{xy}^2 \big )^{-1}, \end{aligned}$$
(3.12a)
$$\begin{aligned} \frac{k}{h_y^2}&\le \big ( 2+2\rho _y^2 + 2\rho _x\rho _y +\big (\rho _x+3\rho _y+4\rho _y^2 +4\rho _x\rho _y\big )|\rho _{xy}| + 6\rho _x\rho _y\rho _{xy}^2\big )^{-1}. \end{aligned}$$
(3.12b)

Proof

By insertion in (2.4), we have

$$\begin{aligned} \begin{aligned} X_{n+1}(\xi ,\eta )&= \bigg (1 + (a_x+a_y)k -\mathrm {i}c_x\sqrt{\rho _xk}Z_{n,x} -\mathrm {i}c_y\sqrt{\rho _yk}\widetilde{Z}_{n,y} + b_x\rho _xk(Z_{n,x}^2-1) \\&\qquad + b_y\rho _yk(\widetilde{Z}_{n,y}^2-1) + d\sqrt{\rho _x\rho _y}kZ_{n,x}\widetilde{Z}_{n,y}\bigg )X_n(\xi ,\eta ). \end{aligned} \end{aligned}$$
(3.13)

To ensure mean-square stability in this case, similar to before, we need

$$\begin{aligned}&\mathbb {E}\Big [\Big |1 + (a_x+a_y)k -\mathrm {i}c_x\sqrt{\rho _xk}Z_{n,x} -\mathrm {i}c_y\sqrt{\rho _yk}\widetilde{Z}_{n,y} + b_x\rho _xk(Z_{n,x}^2-1)\\&\quad + b_y\rho _yk(\widetilde{Z}_{n,y}^2-1) + d\sqrt{\rho _x\rho _y}kZ_{n,x}\widetilde{Z}_{n,y}\Big |^2\Big ]< 1. \end{aligned}$$

For simplicity, we denote \(u = |\sin \frac{\xi h_x}{2}|\), \(v = |\sin \frac{\eta h_y}{2}|\), then we have

$$\begin{aligned}&\mathbb {E}\Big [\left| 1 + (a_x+a_y)k -\mathrm {i}c_x\sqrt{\rho _xk}Z_{n,x} - \mathrm {i}c_y\sqrt{\rho _yk}\widetilde{Z}_{n,y} + b_x\rho _xk(Z_{n,x}^2-1) \right. \\&\qquad \left. + b_y\rho _yk(\widetilde{Z}_{n,y}^2-1) + d\sqrt{\rho _x\rho _y}kZ_{n,x} \widetilde{Z}_{n,y}\right| ^2\Big ] \\&\quad = 1 + \big (a_x^2k^2 + 2a_xk + 2b_x^2\rho _x^2k^2 + c_x^2\rho _xk\big ) + \big (a_y^2k^2 + 2a_yk + 2b_y^2\rho _y^2k^2 + c_y^2\rho _yk\big ) \\&\qquad + \big (2a_xa_yk^2 + \rho _x\rho _y(1+3\rho _{xy}^2)d^2k^2\big ) \\&\qquad + 2d(a_x+2b_x\rho _x + a_y+2b_y\rho _y)\sqrt{\rho _x\rho _y}\rho _{xy}k^2\\&\quad \le 1- 4\frac{k}{h_x^2}u^2\Big (1-\rho _x(1-u^2)-\frac{k}{h_x^2}u^2(1+2\rho _x^2)\Big )\\&\qquad - 4\frac{k}{h_y^2}v^2\Big ( 1-\rho _y(1-v^2)-\frac{k}{h_y^2}v^2(1+2\rho _y^2) \Big )\\&\qquad +8\frac{k^2}{h_x^2h_y^2}u^2v^2\Big (1+2\rho _x\rho _y(1+3\rho _{xy}^2)\Big ) + 8|\rho _{xy}|\Big (\frac{k}{h_x^2}u^2\rho _ x + \frac{k}{h_y^2}v^2\rho _y\Big )\\&\qquad \times \, \Big (\frac{k}{h_x^2}u^2(1+2\rho _x) + \frac{k}{h_y^2}v^2(1+2\rho _y)\Big )\\&\quad \le 1- 4\frac{k}{h_x^2}u^2\Big [1-\rho _x(1-u^2)-\frac{k}{h_x^2}u^2\Big (2+2\rho _x^2 + 2\rho _x\rho _y \\&\qquad +\big (3\rho _x+\rho _y+4\rho _x^2 +4\rho _x\rho _y\big )|\rho _{xy}| + 6\rho _x\rho _y\rho _{xy}^2 \Big )\Big ]\\&\qquad - 4\frac{k}{h_y^2}v^2\Big [ 1-\rho _y(1-v^2)-\frac{k}{h_y^2}v^2 \Big ( 2+2\rho _y^2 + 2\rho _x\rho _y +\big (\rho _x+3\rho _y+4\rho _y^2 \\&\qquad + 4\rho _x\rho _y\big )|\rho _{xy}| + 6\rho _x\rho _y\rho _{xy}^2\Big ) \Big ]\\&\quad< 1,\qquad \text {for all }0\le u,v < 1. \end{aligned}$$

This leads to the two sufficient conditions in (3.12). \(\square \)

It follows that if \(\rho _{xy}=0\), the stability conditions are

$$\begin{aligned}&\frac{k}{h_x^2}\le (2+2\rho _x^2+2\rho _x\rho _y)^{-1}, \\&\frac{k}{h_y^2}\le (2+2\rho _y^2+2\rho _x\rho _y)^{-1}. \end{aligned}$$

So it is sufficient that \(k/h_x^2\le 1/6\), and \(k/h_y^2\le 1/6\). If \(|\rho _{xy}|=1\), the worst case in (3.12), the stability conditions are

$$\begin{aligned}&\frac{k}{h_x^2}\le (2+3\rho _x+\rho _y+6\rho _x^2+12\rho _x\rho _y)^{-1},\\&\frac{k}{h_y^2}\le (2+\rho _x+3\rho _y+6\rho _y^2+12\rho _x\rho _y)^{-1}. \end{aligned}$$

So it is sufficient to ensure \(k/h_x^2\le 1/24\), and \(k/h_y^2\le 1/24\).

4 Fourier analysis of \(L_2\)-convergence

Extending the analysis in [7] for the standard 1-d (deterministic) heat equation to our 2-d SPDE, we compare the numerical solution to the exact solution in Fourier space first by splitting the Fourier domain into two wave number regions. Assume p is a constant satisfying \(0<p<\frac{1}{4}\). Then we define the low wave number region by

$$\begin{aligned} \varOmega _{\text {low}} = \big \{(\xi ,\eta ):\vert \xi \vert \le \min \{h_x^{-2p},\,k^{-p}\}\text { and } \vert \eta \vert \le \min \{h_y^{-2p},\,k^{-p}\}\big \}, \end{aligned}$$

and the high wave number region by

$$\begin{aligned} \varOmega _{\text {high}}= & {} \big \{(\xi ,\eta ):\vert \xi \vert>\min \{h_x^{-2p},\,k^{-p}\}\text { or } \vert \eta \vert \\&>\min \{h_y^{-2p},\,k^{-p}\}\big \}\cap [-\pi h_x^{-1},\pi h_x^{-1}]\times [-\pi h_y^{-1},\pi h_y^{-1}]. \end{aligned}$$

Note that both \(X_n\) and X(nk) are functions of \(\xi \) and \(\eta \). The idea of the convergence proof is that \(X_n\) is a good approximation to X(nk) in the low wave region, and they both damp exponentially in the high wave region.

Lemma 4.1

For \((\xi ,\eta )\in \varOmega _{\text {low}}\), we have

$$\begin{aligned} X_N - X(T)= & {} X(T)\cdot \Big (h_x^2f_1(\xi ) + h_y^2\,f_2(\eta ) + k\,f_3(\xi ,\eta ) \\&+ o(k,h_x^2,h_y^2\,)\cdot \varphi (T,h_x\xi ,h_y\eta )\Big ), \end{aligned}$$

where \(f_1(\xi ),f_2(\eta ),f_3(\xi ,\eta ),\varphi (T,h_x\xi ,h_y\eta )\) are random variables such that after multiplication by X(T), the integral over \(\varOmega _{\text {low}}\) has bounded first and second moments independent of N.

Proof

See Sect. 4.1. \(\square \)

Lemma 4.2

Under Assumption 2.1, there exists \(C>0\) independent of \(h_x\), \(h_y\), and k, such that

$$\begin{aligned} \sqrt{\mathbb {E}\bigg [\,\bigg |\iint _{\varOmega _{\text {high}}} X(T,\xi ,\eta )-X_N(\xi ,\eta )\,\mathrm {d}\xi \mathrm {d}\eta \bigg |^2\,\bigg ]} \le C h_x^{-2}\theta ^{N} + C h_y^{-2}\theta ^{N}, \end{aligned}$$

where \(0<\theta <1\) is independent of \(h_x\), \(h_y\), and k.

Proof

See Sect. 4.2. \(\square \)

The following proof of Theorem 2.1 now deduces mean square convergence of the implicit finite difference scheme (2.2).

Proof of Theorem 2.1

By Lemma 4.1 and Lemma 4.2, the inverse Fourier transform gives

$$\begin{aligned}&V_{i,j}^N - v(T,ih_x,jh_y)\\&\quad = \frac{1}{4\pi ^2}\iint _{\varOmega _{\text {low}}\cup \varOmega _{\text {high}}}\big (X_N - X(T)\big )\mathrm {e}^{\mathrm {i}\big ((i-i_0)\xi h_x + (j-j_0)\eta h_y\big )}\,\mathrm {d}\xi \,\mathrm {d}\eta + o(k) \\&\quad =k\,E_1(T,x_i,y_j) + h_x^2\,E_2(T,x_i,y_j) + h_y^2\,E_3(T,x_i,y_j) + \theta ^{N}h_x^{-2} \,E_4(T,x_i,y_j)\\&\qquad + \theta ^{N}h_y^{-2} \,E_5(T,x_i,y_j) + o(k,h_x^2,h_y^2,\theta ^{N}h_x^{-2},\theta ^{N}h_y^{-2})\,R(T,x_i,y_j), \end{aligned}$$

where \(x_i = ih_x\), \(y_j = jh_y\), \(E_1,\ldots ,E_5,\) and R are random variables with bounded first and second moments, \(N = T/k\), and \(0<\theta <1\) is independent of \(h_x\), \(h_y\) and k. \(\square \)

Next we give a proof of Corollary 2.2, the \(L_2\) convergence in space and probability of the implicit finite difference scheme (2.2).

Proof of Corollary 2.2

We apply Parseval’s theorem to \(V_{i,j}^N-v(T,ih_x,jh_y)\) and its Fourier transform. It follows

$$\begin{aligned}&\sum _{i,j}\Big | V_{i,j}^N-v(T,ih_x,jh_y)\Big |^2h_xh_y \\&\quad = \iint |\widetilde{v}(0,\xi ,\eta )|^2|X(T,\xi ,\eta ) - X_N(\xi ,\eta )|^2\,\mathrm {d}\xi \,\mathrm {d}\eta + O(h_x^4) + O(h_y^4). \end{aligned}$$

In Lemma 4.1, we have proved \(|X(T,\xi ,\eta ) - X_N(\xi ,\eta )| = X(T)\big (1 + O(h_x^2) + O(h_y^2) + O(k)\big )\) for \((\xi ,\eta )\) in the low wave region. In Lemma 4.2, we have proved \(|X(T,\xi ,\eta ) - X_N(\xi ,\eta )|^2 \le C \theta ^N\), for some \(C>0\) and \(\theta \in (0,1)\), \((\xi ,\eta )\) in the high wave region.

As for Dirac initial datum, \(|\widetilde{v}(0,\xi ,\eta )| = 1\), we have

$$\begin{aligned}&\iint |X(T,\xi ,\eta ) - X_N(\xi ,\eta )|^2\,\mathrm {d}\xi \,\mathrm {d}\eta \\&\quad = \iint _{\varOmega _{\text {low}}} |X(T,\xi ,\eta ) - X_N(\xi ,\eta )|^2\,\mathrm {d}\xi \,\mathrm {d}\eta + \iint _{\varOmega _{\text {high}}} |X(T,\xi ,\eta ) - X_N(\xi ,\eta )|^2\,\mathrm {d}\xi \,\mathrm {d}\eta \\&\quad = O(h_x^2) + O(h_y^2) + O(k) + O(h_x^{-1}h_y^{-1}\theta ^N). \end{aligned}$$

For initial data in \(L_2\), \( \iint |\widetilde{v}(0,\xi ,\eta )|^2\,\mathrm {d}\xi \,\mathrm {d}\eta < \infty , \) and therefore

$$\begin{aligned} \iint _{\varOmega _{\text {high}}} |\widetilde{v}(0,\xi ,\eta )|^2|X(T,\xi ,\eta ) - X_N(\xi ,\eta )|^2\,\mathrm {d}\xi \,\mathrm {d}\eta = o(k^r)\quad \text {for any }r>0, \end{aligned}$$

and consequently

$$\begin{aligned} \iint |X(T,\xi ,\eta ) - X_N(\xi ,\eta )|^2\,\mathrm {d}\xi \,\mathrm {d}\eta = O(h_x^2) + O(h_y^2) + O(k). \end{aligned}$$

\(\square \)

4.1 Low wave number region (proof of Lemma 4.1)

For the low wave region, we consider the case where both \(\xi ,\,\eta \) are small. It follows from (3.4) that the exact solution of \(X(t_{n+1})\) given \(X(t_n)\) is

$$\begin{aligned} X(t_{n+1}) = X(t_n)\exp \bigg (-\frac{1}{2}(1-\rho _x)\xi ^2k -\frac{1}{2}(1-\rho _y)\eta ^2k -\mathrm {i}\xi \sqrt{\rho _xk}Z_{n,x} - \mathrm {i}\eta \sqrt{\rho _yk}\widetilde{Z}_{n,y}\bigg ), \end{aligned}$$

where \(M_{t_{n+1}}^x-M_{t_n}^x\equiv \sqrt{k}Z_{n,x},\ M_{t_{n+1}}^y-M_{t_n}^y\equiv \sqrt{k}\widetilde{Z}_{n,y}\) are the Brownian increments.

Now we consider \(X_n\), the numerical approximation of X(nk). Let

$$\begin{aligned} X_{n+1} = C_n\,X_n, \end{aligned}$$

where

$$\begin{aligned} C_n = \exp \bigg (-\frac{1}{2}(1-\rho _x)\xi ^2k -\frac{1}{2}(1-\rho _y)\eta ^2k -\mathrm {i}\xi \sqrt{\rho _xk}Z_{n,x} - \mathrm {i}\eta \sqrt{\rho _yk}\widetilde{Z}_{n,y} + e_n\bigg ), \end{aligned}$$
(4.1)

and \(e_n\) is the logarithmic error between the numerical solution and the exact solution introduced during \([nk,(n+1)k]\). Aggregating over N time steps, at \(t_N = kN = T\),

$$\begin{aligned} X_N = X(T)\exp \bigg (\sum _{n=0}^{N-1}e_n\bigg ), \end{aligned}$$
(4.2)

where

$$\begin{aligned} X(T)= & {} \exp \bigg (-\frac{1}{2}(1-\rho _x)\xi ^2T -\frac{1}{2}(1-\rho _y)\eta ^2T\\&-\mathrm {i}\xi \sqrt{\rho _xk}\sum _{n=0}^{N-1}Z_{n,x} - \mathrm {i}\eta \sqrt{\rho _yk}\sum _{n=0}^{N-1}\widetilde{Z}_{n,y}\bigg ) \end{aligned}$$

is the exact solution at time T.

From (4.1), we have

$$\begin{aligned} e_n = \log C_n + \frac{1}{2}(1-\rho _x)\xi ^2k + \frac{1}{2}(1-\rho _y)\eta ^2k + \mathrm {i}\xi \sqrt{\rho _xk}Z_{n,x} + \mathrm {i}\eta \sqrt{\rho _yk}\widetilde{Z}_{n,y}, \end{aligned}$$

hence

$$\begin{aligned} \sum _{n=0}^{N-1}e_n= & {} \sum _{n=0}^{N-1}\log C_n +\frac{1}{2}(1-\rho _x)\xi ^2T + \frac{1}{2}(1-\rho _y)\eta ^2T + \mathrm {i}\xi \sqrt{\rho _xk}\sum _{n=0}^{N-1}Z_{n,x} \\&+ \mathrm {i}\eta \sqrt{\rho _yk}\sum _{n=0}^{N-1}\widetilde{Z}_{n,y}. \end{aligned}$$

From (3.8), \(C_n\) has the form

$$\begin{aligned} C_n= \frac{1 -\mathrm {i}c_x\sqrt{\rho _xk}Z_{n,x} -\mathrm {i}c_y\sqrt{\rho _yk}\widetilde{Z}_{n,y} + b_x\rho _xk(Z_{n,x}^2-1) + b_y\rho _yk(\widetilde{Z}_{n,y}^2-1) + d\sqrt{\rho _x\rho _y}kZ_{n,x}\widetilde{Z}_{n,y}}{1-(a_x+a_y)k}, \end{aligned}$$

where

$$\begin{aligned}&a_x = -\frac{\xi ^2}{2}\cdot \frac{\sin ^2\frac{\xi h_x}{2}}{\left( \frac{\xi h_x}{2}\right) ^2} = -\frac{\xi ^2}{2} + \frac{\xi ^4}{24}h_x^2 + O(h_x^4\xi ^6),\qquad \\&a_y = -\frac{\eta ^2}{2}\cdot \frac{\sin ^2\frac{\eta h_y}{2}}{\left( \frac{\eta h_y}{2}\right) ^2}= -\frac{\eta ^2}{2} + \frac{\eta ^4}{24}h_y^2 + O(h_y^4\eta ^6),\\&b_x = -\frac{\xi ^2}{2}\cdot \frac{\sin ^2\xi h_x}{\xi ^2h_x^2}= -\frac{\xi ^2}{2} + \frac{\xi ^4}{6}h_x^2 + O(h_x^4\xi ^6),\qquad \\&b_y = -\frac{\eta ^2}{2}\cdot \frac{\sin ^2\eta h_y}{\eta ^2h_y^2}= -\frac{\eta ^2}{2} + \frac{\eta ^4}{6}h_y^2 + O(h_y^4\eta ^6),\\&c_x = \xi \cdot \frac{\sin \xi h_x}{\xi h_x} = \xi -\frac{\xi ^3}{6}h_x^2 + O(h_x^4\xi ^5),\qquad \qquad \quad \\&c_y = \eta \cdot \frac{\sin \eta h_y}{\eta h_y} = \eta - \frac{\eta ^3}{6}h_y^2 + O(h_y^4\eta ^5),\\&d = -\xi \eta \cdot \frac{\sin \xi h_x\sin \eta h_y}{\xi h_x\eta h_y}. \end{aligned}$$

Note that \(c_xc_y+d=0,\, b_x+\frac{1}{2}c_x^2=0,\, b_y+\frac{1}{2}c_y^2=0\), then one can derive by Taylor expansion (by lengthy, but elementary calculations),

$$\begin{aligned}&\log C_n = -\mathrm {i}c_x\sqrt{\rho _x k}Z_{n,x} - \mathrm {i}c_y\sqrt{\rho _y k}\widetilde{Z}_{n,y} \\&\qquad +\, (a_x+a_y-b_x\rho _x -b_y\rho _y)k + \left( b_x+\frac{1}{2}c_x^2\right) \rho _xkZ_{n,x}^2 \\&\qquad + \left( b_y+\frac{1}{2}c_y^2\right) \rho _yk\widetilde{Z}_{n,y}^2 + (c_xc_y+d)\sqrt{\rho _x\rho _y}kZ_{n,x}\widetilde{Z}_{n,y}\\&\qquad +\, O\big ((|\xi |+|\eta |)^3k\sqrt{k}\big )\cdot \mathrm {i}\phi _1(Z_{n,x},\widetilde{Z}_{n,y})\\&\qquad +\, O\big ((|\xi |+|\eta |)^4k^2\big )\cdot \phi _2(Z_{n,x},\widetilde{Z}_{n,y}) + o\big ((|\xi |+|\eta |)^4k^2\big )\\&\quad = -\mathrm {i}c_x\sqrt{\rho _x k}Z_{n,x} - \mathrm {i}c_y\sqrt{\rho _y k}\widetilde{Z}_{n,y} + (a_x+a_y-b_x\rho _x -b_y\rho _y)k \\&\qquad +\, O\big ((|\xi |+|\eta |)^3k\sqrt{k}\big )\cdot \mathrm {i} \phi _1(Z_{n,x},\widetilde{Z}_{n,y}) + O\big ((|\xi |+|\eta |)^4 k^2\big )\\&\qquad \cdot \phi _2(Z_{n,x},\widetilde{Z}_{n,y})+ o\big ((|\xi |+|\eta |)^4k^2\big ), \end{aligned}$$

where \(\phi _1(\cdot ,\cdot )\) is an odd and \(\phi _2(\cdot ,\cdot )\) an even degree polynomial. Therefore

$$\begin{aligned} \sum _{n=0}^{N-1}e_n&= \sum _{n=0}^{N-1}\log C_n+\frac{1}{2}(1-\rho _x)\xi ^2T + \frac{1}{2}(1-\rho _y)\eta ^2T \\&\quad + \mathrm {i}\xi \sqrt{\rho _xk}\sum _{n=0}^{N-1}Z_{n,x} + \mathrm {i}\eta \sqrt{\rho _yk}\sum _{n=0}^{N-1}\widetilde{Z}_{n,y}\\&= \mathrm {i}(\xi -c_x)\sqrt{\rho _xk}\sum _{n=0}^{N-1}Z_{n,x} + \mathrm {i}(\eta -c_y)\sqrt{\rho _yk}\sum _{n=0}^{N-1}\widetilde{Z}_{n,y} \\&\quad + \Big (a_x+a_y-b_x\rho _x-b_y\rho _y + \frac{1-\rho _x}{2}\xi ^2 + \frac{1-\rho _y}{2}\eta ^2\Big )T \\&\quad + O\big ((|\xi |+|\eta |)^3k\sqrt{k}\big )\cdot \mathrm {i}\sum _{n=0}^{N-1}\phi _1(Z_{n,x},\widetilde{Z}_{n,y})\\&\quad + O\big ((|\xi |+|\eta |)^4 k^2\big )\cdot \sum _{n=0}^{N-1}\phi _2(Z_{n,x},\widetilde{Z}_{n,y})+ o\big ((|\xi |+|\eta |)^4k\big ), \end{aligned}$$

so we have

$$\begin{aligned} \exp \left( \sum _{n=0}^{N-1}e_n\right)&= \exp \left( \Big (a_x+a_y-b_x\rho _x-b_y\rho _y + \frac{1-\rho _x}{2}\xi ^2 + \frac{1-\rho _y}{2}\eta ^2\Big )T\right) \\&\quad \cdot \exp \bigg ( \mathrm {i}(\xi -c_x)\sqrt{\rho _xk}\sum _{n=0}^{N-1}Z_{n,x} + \mathrm {i}(\eta -c_y)\sqrt{\rho _yk}\sum _{n=0}^{N-1}\widetilde{Z}_{n,y} \\&\quad + O\big ((|\xi |+|\eta |)^3k\sqrt{k}\big )\cdot \mathrm {i}\sum _{n=0}^{N-1}\phi _1(Z_{n,x},\widetilde{Z}_{n,y})\\&\quad + O\big ((|\xi |+|\eta |)^4 k^2\big )\cdot \sum _{n=0}^{N-1}\phi _2(Z_{n,x},\widetilde{Z}_{n,y})+ o\big ((|\xi |+|\eta |)^4k\big ) \bigg ). \end{aligned}$$

Here

$$\begin{aligned}&\exp \bigg (\Big (a_x+a_y-b_x\rho _x-b_y\rho _y + \frac{1-\rho _x}{2}\xi ^2 + \frac{1-\rho _y}{2}\eta ^2\Big )T\bigg )\\&\quad = 1 + \frac{\xi ^4}{24}h_x^2(1-4\rho _x)T + \frac{\eta ^4}{24}h_y^2(1-4\rho _y)T + O(\xi ^6h_x^4) + O(\eta ^6h_y^4), \end{aligned}$$

and

$$\begin{aligned}&\exp \bigg ( \mathrm {i}(\xi -c_x)\sqrt{\rho _xk}\sum _{n=0}^{N-1}Z_{n,x} + \mathrm {i}(\eta -c_y)\sqrt{\rho _yk}\sum _{n=0}^{N-1}\widetilde{Z}_{n,y} \\&\qquad + O\big ((|\xi |+|\eta |)^3k\sqrt{k}\big )\cdot \mathrm {i}\sum _{n=0}^{N-1}\phi _1(Z_{n,x},\widetilde{Z}_{n,y})\\&\qquad + O\big ((|\xi |+|\eta |)^4 k^2\big )\cdot \sum _{n=0}^{N-1}\phi _2(Z_{n,x},\widetilde{Z}_{n,y}) \bigg )\\&\quad = 1 + \mathrm {i}(\xi -c_x)\sqrt{\rho _xk}\sum _{n=0}^{N-1}Z_{n,x} + \mathrm {i}(\eta -c_y)\sqrt{\rho _yk}\sum _{n=0}^{N-1}\widetilde{Z}_{n,y} \\&\qquad - \frac{1}{2}(\xi -c_x)^2\rho _xk\bigg (\sum _{n=0}^{N-1}Z_{n,x}\bigg )^2\\&\qquad - \frac{1}{2}(\eta -c_y)^2\rho _yk\bigg (\sum _{n=0}^{N-1}\widetilde{Z}_{n,y}\bigg )^2 + O\big ((|\xi |+|\eta |)^3k\sqrt{k}\big )\cdot \mathrm {i}\sum _{n=0}^{N-1}\widehat{\phi }_1(Z_{n,x},\widetilde{Z}_{n,y})\\&\qquad + O\big ((|\xi |+|\eta |)^4 k^2\big )\cdot \sum _{n=0}^{N-1}\widehat{\phi }_2(Z_{n,x},\widetilde{Z}_{n,y})+ o\big ((|\xi |+|\eta |)^4k\big ), \end{aligned}$$

where \(\widehat{\phi }_1(\cdot ,\cdot )\) is a polynomial function with odd degree, and \(\widehat{\phi }_2(\cdot ,\cdot )\) are with even degree, and

$$\begin{aligned} \begin{aligned} \mathbb {E}\bigg [X(T)\sum _n\widehat{\phi }_1(Z_{n,x},\widetilde{Z}_{n,y})\bigg ]&= O(k^{-\frac{1}{2}})\exp \bigg (-\frac{1}{2}\big (\xi ^2+\eta ^2+2\xi \eta \sqrt{\rho _x\rho _y}\rho _{xy}\big )T\bigg ),\\ \mathbb {E}\bigg [X(T)\sum _n\widehat{\phi }_2(Z_{n,x},\widetilde{Z}_{n,y})\bigg ]&= O(k^{-1})\exp \bigg (-\frac{1}{2}\big (\xi ^2+\eta ^2+2\xi \eta \sqrt{\rho _x\rho _y}\rho _{xy}\big )T\bigg ),\\ \mathbb {E}\bigg |X(T)\sum _n\widehat{\phi }_1(Z_{n,x},\widetilde{Z}_{n,y})\bigg |^2&= O(k^{-1})\exp \bigg (-\big (\xi ^2+\eta ^2+2\xi \eta \sqrt{\rho _x\rho _y}\rho _{xy}\big )T\bigg ),\\ \mathbb {E}\bigg |X(T)\sum _n\widehat{\phi }_2(Z_{n,x},\widetilde{Z}_{n,y})\bigg |^2&= O(k^{-2})\exp \bigg (-\big (\xi ^2+\eta ^2+2\xi \eta \sqrt{\rho _x\rho _y}\rho _{xy}\big )T\bigg ). \end{aligned} \end{aligned}$$

Hence we have in the low wave number region,

$$\begin{aligned} X_N- X(T)&=X(T)\cdot \bigg (\exp \Big (\sum _{n=0}^{N-1}e_n\Big )-1\bigg ) \\&= X(T)\cdot \bigg \{\frac{\mathrm {i}}{6}\sqrt{\rho _x}\xi ^3h_x^2M_{T}^x+ \frac{1}{24}(1-4\rho _x)\xi ^4h_x^2T\\&\quad + \frac{\mathrm {i}}{6}\sqrt{\rho _y}\eta ^3h_y^2\widetilde{M}_{T}^y + \frac{1}{24}(1-4\rho _y)\eta ^4h_y^2T + O\big ((|\xi |+|\eta |)^3k\sqrt{k}\big )\\&\quad \cdot \mathrm {i}\sum _{n=0}^{N-1}\widehat{\phi }_1(Z_{n,x},\widetilde{Z}_{n,y}) + O\big ((|\xi |+|\eta |)^4 k^2\big )\cdot \sum _{n=0}^{N-1}\widehat{\phi }_2(Z_{n,x},\widetilde{Z}_{n,y})\\&\quad + o(k,h_x^2,h_y^2)\cdot \varphi (T,h_x\xi ,h_y\eta )\bigg \}, \end{aligned}$$

where \(\varphi (T,h_x\xi ,h_y\eta )\) is a random variable with bounded moments.

Remark 4.1

We can derive the exact leading order term by taking the inverse Fourier transform. For instance, the leading order error in \(h_x\) is

$$\begin{aligned} \bigg (-\frac{1}{6}\sqrt{\rho _x}M_{T}^x \frac{\partial ^3}{\partial x^3} v(T,x,y) + \frac{1}{24}(1-4\rho _x)T\frac{\partial ^4}{\partial x^4} v(T,x,y)\bigg )\cdot h_x^2, \end{aligned}$$

and similar for \(h_y\) (replacing ‘x’ by ‘y’); the leading order error in k can be found by the same technique but is significantly lengthier and hence omitted.

4.2 High wave number region (proof of Lemma 4.2)

Now we consider the case when either \(\xi \) or \(\eta \) is large.

First we calculate the upper bound of \(\mathbb {E}\big [\big |X_N(\xi ,\eta )\big |^2\big ]\). To simplify the proof, we take \(h_x=h_y=h\), and the case where \(h_x\ne h_y\) is similar. Write \(\lambda = \frac{k}{h^2}\) .

Lemma 4.3

For \((\xi ,\eta )\notin \varOmega _\text {low}\),

$$\begin{aligned} \mathbb {E}\big [\big |X_N(\xi ,\eta )\big |^2\big ] \le |X_0|^2\bigg (1-4\beta \frac{\lambda \big (\sin ^2\frac{\xi h}{2} + \sin ^2\frac{\eta h}{2}\big ) + \lambda ^2\big (\sin ^2\frac{\xi h}{2} + \sin ^2\frac{\eta h}{2}\big )^2}{\Big (1 + 2\lambda \big (\sin ^2\frac{\xi h}{2} + \sin ^2\frac{\eta h}{2}\big )\Big )^2}\bigg )^N, \end{aligned}$$

where

$$\begin{aligned} \beta= & {} \min \big \{1-\rho _x, 1-\rho _y, 1-2\rho _x^2(1+2|\rho _{xy}|), 1-2\rho _y^2(1+2|\rho _{xy}|),\\&1-2\rho _x\rho _y(1+2|\rho _{xy}|+3\rho _{xy}^2)\big \} \in (0,1). \end{aligned}$$

Proof

By (3.8), we have \(X_N = X_0\prod _{n=0}^{N-1} C_n\), where

$$\begin{aligned} C_n = \frac{1 -\mathrm {i}c_x\sqrt{\rho _xk}Z_{n,x} -\mathrm {i}c_y\sqrt{\rho _yk}\widetilde{Z}_{n,y} + b_x\rho _xk(Z_{n,x}^2-1) + b_y\rho _yk(\widetilde{Z}_{n,y}^2-1) + d\sqrt{\rho _x\rho _y}kZ_{n,x}\widetilde{Z}_{n,y}}{1-(a_x+a_y)k}, \end{aligned}$$
$$\begin{aligned} a_x&= -\frac{2\sin ^2\frac{\xi h_x}{2}}{h_x^2},\qquad b_x = -\frac{\sin ^2\xi h_x}{2h_x^2},\qquad c_x = \frac{\sin \xi h_x}{h_x},\qquad d = -\frac{\sin \xi h_x\sin \eta h_y}{h_xh_y},\\ a_y&= -\frac{2\sin ^2\frac{\eta h_y}{2}}{h_y^2},\qquad b_y = -\frac{\sin ^2\eta h_y}{2h_y^2},\qquad c_y = \frac{\sin \eta h_y}{h_y}. \end{aligned}$$

Then

$$\begin{aligned} \mathbb {E}\big [ \big |C_n\big |^2\big ]&\le 1- \frac{4\frac{k}{h^2}\sin ^2\frac{\xi h}{2}\left( 1-\rho _x\cos ^2\frac{\xi h}{2}+\frac{k}{h^2}\sin ^2\frac{\xi h}{2}\left( 1-2\rho _x^2(1+2|\rho _{xy}|)\right) \right) }{\left[ 1+2\frac{k}{h^2}\Big (\sin ^2\frac{\xi h}{2}+\sin ^2\frac{\eta h}{2}\Big )\right] ^2}\\&\qquad - \frac{4\frac{k}{h^2}\sin ^2\frac{\eta h}{2}\left( 1-\rho _y\cos ^2\frac{\eta h}{2}+\frac{k}{h^2}\sin ^2\frac{\eta h}{2}\left( 1-2\rho _y^2(1+2|\rho _{xy}|)\right) \right) }{\left[ 1+2\frac{k}{h^2}\Big (\sin ^2\frac{\eta h}{2}+\sin ^2\frac{\eta h}{2}\Big )\right] ^2}\\&\qquad - \frac{8\frac{k^2}{h^4}\sin ^2\frac{\xi h}{2}\sin ^2\frac{\eta h}{2}\left( 1-2\rho _x\rho _y(1+2|\rho _{xy}|+3\rho _{xy}^2)\right) }{\left[ 1+2\frac{k}{h^2}\Big (\sin ^2\frac{\eta h}{2}+\sin ^2\frac{\eta h}{2}\Big )\right] ^2}. \end{aligned}$$

Denote \( \lambda = \frac{k}{h^2},\ a = \sin ^2\frac{\xi h}{2},\ b = \sin ^2\frac{\eta h}{2}\). It follows that

$$\begin{aligned} \mathbb {E}\big [ \big |C_n\big |^2\big ]&\le 1-\frac{4\lambda a\Big ( 1-\rho _x\cos ^2\frac{\xi h}{2} + \lambda a\big (1-2\rho _x^2(1+2|\rho _{xy}|)\big ) \Big )}{\big (1+2\lambda (a+b)\big )^2}\\&\quad -\frac{4\lambda b\Big ( 1-\rho _y\cos ^2\frac{\eta h}{2} + \lambda b\big (1-2\rho _y^2(1+2|\rho _{xy}|)\big ) \Big )}{\big (1+2\lambda (a+b)\big )^2}\\&\quad -\frac{8\lambda ^2 ab\Big (1-2\rho _x\rho _y(1+2|\rho _{xy}|+3\rho _{xy}^2)\Big )}{\big (1+2\lambda (a+b)\big )^2}. \end{aligned}$$

By Assumption 2.1,

$$\begin{aligned}&0\le \rho _x< 1,\quad 0\le \rho _y<1,\quad 0< 1-2\rho _x^2(1+2|\rho _{xy}|)\le 1,\\&0< 1-2\rho _y^2(1+2|\rho _{xy}|)\le 1,\quad 0< 1-2\rho _x\rho _y(1+2|\rho _{xy}|+3\rho _{xy}^2)\le 1, \end{aligned}$$

we write

$$\begin{aligned} \beta&= \min \big \{1-\rho _x, 1-\rho _y, 1-2\rho _x^2(1+2|\rho _{xy}|), \\&\quad 1-2\rho _y^2(1+2|\rho _{xy}|), 1-2\rho _x\rho _y(1+2|\rho _{xy}|+3\rho _{xy}^2)\big \} \in (0,1),\\ d&= a+b = \sin ^2\frac{\xi h}{2} + \sin ^2\frac{\eta h}{2}. \end{aligned}$$

Consequently,

$$\begin{aligned} \mathbb {E}\big [ \big |C_n\big |^2\big ]&\le 1-\frac{4\lambda a\big (\beta + \lambda a \beta \big ) + 4\lambda b\big (\beta + \lambda b \beta \big ) + 8\lambda ^2 ab\beta }{\big (1+2\lambda (a+b)\big )^2} \\&= 1- \frac{4 \beta \big (\lambda d+\lambda ^2 d^2\big )}{\big (1+2\lambda d\big )^2}. \end{aligned}$$

Therefore we have

$$\begin{aligned} \mathbb {E}\big [\big |X_N\big |^2\big ]= & {} \big |X_0\big |^2\prod _{n=0}^{N-1} \mathbb {E}\big [\big |C_n\big |^2\big ] \nonumber \\\le & {} |X_0|^2\bigg (1-4\beta \frac{\lambda \big (\sin ^2\frac{\xi h}{2} + \sin ^2\frac{\eta h}{2}\big ) + \lambda ^2\big (\sin ^2\frac{\xi h}{2} + \sin ^2\frac{\eta h}{2}\big )^2}{\big (1 + 2\lambda \Big (\sin ^2\frac{\xi h}{2} + \sin ^2\frac{\eta h}{2}\big )\Big )^2}\bigg )^N.\nonumber \\ \end{aligned}$$
(4.3)

\(\square \)

Then we consider two scenarios: \(0<\lambda <1\) and \(\lambda \ge 1\). For \(0<\lambda <1\),

$$\begin{aligned} \varOmega _{\text {high}} {=} \left\{ (\xi ,\eta ):\vert \xi \vert>h^{-2p},\text { or }\ \vert \eta \vert >h^{-2p}\right\} \cap [-\pi h^{-1},\pi h^{-1}]\times [-\pi h^{-1},\pi h^{-1}]. \end{aligned}$$

Lemma 4.4

For \(0<\lambda <1\) (i.e., \(k<h^2\)),

$$\begin{aligned} \mathbb {E}\Bigg [\,\bigg |\iint _{\varOmega _{\text {high}}} X(T,\xi ,\eta )-X_N(\xi ,\eta )\,\mathrm {d}\xi \mathrm {d}\eta \bigg |^2\,\Bigg ] = o(h^r),\quad \forall r>0. \end{aligned}$$

Proof

Note that

$$\begin{aligned} X(T) = \exp \bigg (-\frac{1}{2}(1-\rho _x)\xi ^2T -\frac{1}{2}(1-\rho _y)\eta ^2T -\mathrm {i}\xi \sqrt{\rho _x}M_T^x - \mathrm {i}\eta \sqrt{\rho _y}M_T^y\bigg ). \end{aligned}$$

Then

$$\begin{aligned}&\mathbb {E}\Bigg [\,\bigg |\iint _{\varOmega _{\text {high}}} X(T,\xi ,\eta )-X_N(\xi ,\eta )\,\mathrm {d}\xi \mathrm {d}\eta \bigg |^2\,\Bigg ]\\&\quad < 4\pi ^2 h^{-2}\mathbb {E}\Bigg [\,\iint _{\varOmega _{\text {high}}}\bigg |X(T,\xi ,\eta )-X_N(\xi ,\eta )\bigg |^2\,\mathrm {d}\xi \mathrm {d}\eta \,\Bigg ]\\&\quad \le 8\pi ^2 h^{-2} \iint _{\varOmega _{\text {high}}} \mathbb {E}\Big [\big |X_N(\xi ,\eta )\big |^2+ \big |X(T,\xi ,\eta )\big |^2\Big ]\,\mathrm {d}\xi \mathrm {d}\eta \\&\quad = 8\pi ^2 h^{-2} \iint _{\varOmega _{\text {high}}} \mathbb {E}\Big [\big |X_N(\xi ,\eta )\big |^2\Big ]\,\mathrm {d}\xi \mathrm {d}\eta + f_0(k), \end{aligned}$$

where

$$\begin{aligned} f_0(k)&= 8\pi ^2 h^{-2} \iint _{\varOmega _{\text {high}}}\mathrm {e}^{-(1-\rho _x)\xi ^2T - (1-\rho _y)\eta ^2T}\,\mathrm {d}\xi \mathrm {d}\eta \\&= 8\pi ^2 h^{-2} \int _{0}^{\pi /h}\int _{h^{-2p}}^{\pi /h} \mathrm {e}^{-(1-\rho _x)\xi ^2T - (1-\rho _y)\eta ^2T}\,\mathrm {d}\xi \mathrm {d}\eta \\&\quad + 8\pi ^2 h^{-2} \int _{h^{-2p}}^{\pi /h} \int _{0}^{\pi /h}\mathrm {e}^{-(1-\rho _x)\xi ^2T - (1-\rho _y)\eta ^2T}\,\mathrm {d}\xi \mathrm {d}\eta \\&\quad -8\pi ^2 h^{-2} \int _{h^{-2p}}^{\pi /h}\int _{h^{-2p}}^{\pi /h} \mathrm {e}^{-(1-\rho _x)\xi ^2T - (1-\rho _y)\eta ^2T}\,\mathrm {d}\xi \mathrm {d}\eta \\&\le C\cdot h^{-2+2p}\big (\mathrm {e}^{-(1-\rho _x)Th^{-2p}}+\mathrm {e}^{-(1-\rho _y)Th^{-2p}}\big ) = o(h^r),\quad \forall r>0. \end{aligned}$$

Denote \(d = \sin ^2\frac{\xi h}{2} + \sin ^2\frac{\eta h}{2}\). From (4.3) and \(\lambda =k/h^2\),

$$\begin{aligned} \mathbb {E}\big [\big |X_N\big |^2\big ]&\le |X_0|^2\bigg (1- \frac{4 \beta \big (d+\lambda d^2\big )}{\big (1+2\lambda d\big )^2}\cdot \frac{T}{Nh^2}\bigg )^N\\&< |X_0|^2\exp \Big (- \frac{4 \beta \big (d+\lambda d^2\big )T}{\big (1+2\lambda d\big )^2}\cdot h^{-2}\Big ). \end{aligned}$$

In this case, as at least one of \(\xi \) and \(\eta \) belongs to \((h^{-2p},\pi /h)\), we have

$$\begin{aligned} d = a+b = \sin ^2\frac{\xi h}{2} + \sin ^2\frac{\eta h}{2} \ge \sin ^2\frac{h^{1-2p}}{2} = \frac{h^{2-4p}}{4} - \frac{h^{4-8p}}{48} + O(h^{5-10p}). \end{aligned}$$

Therefore,

$$\begin{aligned} \mathbb {E}\big [\big |X_N\big |^2\big ]< & {} |X_0|^2\exp \Big (- \frac{4 \beta \big (d+\lambda d^2\big )T}{\big (1+2\lambda d\big )^2}\cdot h^{-2}\Big ) \\< & {} |X_0|^2\exp \big (-4\beta dTh^{-2} \big ) < |X_0|^2\exp \big (-\beta Th^{-4p} \big ) = o(h^r). \end{aligned}$$

As a result,

$$\begin{aligned}&8\pi ^2 h^{-2} \iint _{\varOmega _{\text {high}}} \mathbb {E}\Big [\big |X_N(\xi ,\eta )\big |^2\Big ]\,\mathrm {d}\xi \mathrm {d}\eta \\&\quad < 16\pi ^4|X_0|^2 h^{-4}\exp \big (-\beta Th^{-4p} \big ) = o(h^r),\quad \forall r>0. \end{aligned}$$

Therefore, for all \(r>0\),

$$\begin{aligned}&\mathbb {E}\Bigg [\,\bigg |\iint _{\varOmega _{\text {high}}} X(T,\xi ,\eta )-X_N(\xi ,\eta )\,\mathrm {d}\xi \mathrm {d}\eta \bigg |^2\,\Bigg ]\\&\quad < 8\pi ^2 h^{-2} \iint _{\varOmega _{\text {high}}} \mathbb {E}\Big [\big |X_N(\xi ,\eta )\big |^2\Big ]\,\mathrm {d}\xi \mathrm {d}\eta + f_0(k) = o(h^r). \end{aligned}$$

\(\square \)

For \(\lambda \ge 1\), we further separate the domain \(\varOmega _{\text {high}}\) into a middle wave region

$$\begin{aligned} \varOmega _{\text {high}}^1 = \left\{ (|\xi |,|\eta |)\in \big [k^{-p},k^{-\frac{1}{2}}\big ]\times \big [0,k^{-\frac{1}{2}}\big ]\cup \big [0,k^{-\frac{1}{2}}\big ] \times \big [k^{-p},k^{-\frac{1}{2}}\big ] \right\} , \end{aligned}$$
(4.4)

and a high wave region

$$\begin{aligned} \varOmega _{\text {high}}^2 = \left\{ (|\xi |,|\eta |)\in \left[ k^{-\frac{1}{2}},\pi /h\right] \times [0,\pi /h]\cup [0,\pi /h]\times \left[ k^{-\frac{1}{2}},\pi /h\right] \right\} , \end{aligned}$$
(4.5)

such that \(\varOmega _{\text {high}} = \varOmega _{\text {high}}^1\cup \varOmega _{\text {high}}^2\).

Lemma 4.5

For \(\lambda \ge 1\) (i.e., \(k\ge h^2\)), there exists \(\theta \in (0,1)\) independent of h and k such that

$$\begin{aligned} \mathbb {E}\Bigg [\,\bigg |\iint _{\varOmega _{\text {high}}} X(T,\xi ,\eta )-X_N(\xi ,\eta )\,\mathrm {d}\xi \mathrm {d}\eta \bigg |^2\,\Bigg ] \le C h^{-4}\theta ^{N}. \end{aligned}$$

Proof

$$\begin{aligned}&\quad \mathbb {E}\Bigg [\,\bigg |\iint _{\varOmega _{\text {high}}} X(T,\xi ,\eta )-X_N(\xi ,\eta )\,\mathrm {d}\xi \mathrm {d}\eta \bigg |^2\,\Bigg ]\\&\le 2\,\mathbb {E}\Bigg [\,\bigg |\iint _{\varOmega _{\text {high}}^1}X(T,\xi ,\eta )-X_N(\xi ,\eta )\, \mathrm {d}\xi \mathrm {d}\eta \bigg |^2\,\Bigg ]\\&\quad + 2\,\mathbb {E}\Bigg [\,\bigg |\iint _{\varOmega _{\text {high}}^2}X(T,\xi ,\eta )-X_N(\xi ,\eta )\,\mathrm {d}\xi \mathrm {d}\eta \bigg |^2\,\Bigg ]\\&< 8 k^{-1}\mathbb {E}\Bigg [\,\iint _{\varOmega _{\text {high}}^1}\bigg |X(T,\xi ,\eta )-X_N(\xi ,\eta )\bigg |^2\,\mathrm {d}\xi \mathrm {d}\eta \Bigg ]\\&\quad + 8\pi ^2 h^{-2}\mathbb {E}\Bigg [\,\iint _{\varOmega _{\text {high}}^2}\bigg |X(T,\xi ,\eta )-X_N(\xi ,\eta )\bigg |^2\,\mathrm {d}\xi \mathrm {d}\eta \,\Bigg ]\\&\le 16k^{-1} \iint _{\varOmega _{\text {high}}^1}\mathbb {E}\Big [\big |X_N(\xi ,\eta )\big |^2 + \big |X(T,\xi ,\eta )\big |^2\Big ]\,\mathrm {d}\xi \mathrm {d}\eta \\&\quad + 16\pi ^2 h^{-2} \iint _{\varOmega _{\text {high}}^2} \mathbb {E}\Big [\big |X_N(\xi ,\eta )\big |^2+ \big |X(T,\xi ,\eta )\big |^2\Big ]\,\mathrm {d}\xi \mathrm {d}\eta \\&= 16 k^{-1} \iint _{\varOmega _{\text {high}}^1}\mathbb {E}\Big [\big |X_N(\xi ,\eta )\big |^2 \Big ]\,\mathrm {d}\xi \mathrm {d}\eta \\&\quad + 16\pi ^2 h^{-2} \iint _{\varOmega _{\text {high}}^2} \mathbb {E}\Big [\big |X_N(\xi ,\eta )\big |^2\Big ]\,\mathrm {d}\xi \mathrm {d}\eta + f_1(k), \end{aligned}$$

where

$$\begin{aligned} f_1(k)&= 16 k^{-1}\iint _{\varOmega _{\text {high}}^1}\mathrm {e}^{-(1-\rho _x)\xi ^2T - (1-\rho _y)\eta ^2T}\,\mathrm {d}\xi \mathrm {d}\eta \\&\quad + 16\pi ^2 h^{-2} \iint _{\varOmega _{\text {high}}^2}\mathrm {e}^{-(1-\rho _x)\xi ^2T - (1-\rho _y)\eta ^2T}\,\mathrm {d}\xi \mathrm {d}\eta \\&\le C\cdot k^{p-1}\exp \big (-\beta Tk^{-p}\big ) + C\cdot kh^{-2}\exp \big (-2\beta Tk^{-1}\big ). \end{aligned}$$

Denote \(d = \sin ^2\frac{\xi h}{2} + \sin ^2\frac{\eta h}{2}\), from (4.3),

$$\begin{aligned} \mathbb {E}\big [\big |X_N\big |^2\big ]&\le |X_0|^2\bigg (1- \frac{4 \beta \big (\lambda d+\lambda ^2 d^2\big )}{\big (1+2\lambda d\big )^2}\bigg )^N. \end{aligned}$$

For \(\lambda \ge 1\), and \((\xi ,\eta )\in \varOmega _{\text {high}}^1\),

$$\begin{aligned} \mathbb {E}\big [\big |X_N\big |^2\big ]&\le |X_0|^2\bigg (1- \frac{4 \beta \big (d+\lambda d^2\big )}{\big (1+2\lambda d\big )^2}\cdot \frac{T}{Nh^2}\bigg )^N\\&< |X_0|^2\exp \Big (- \frac{4 \beta \big (d+\lambda d^2\big )T}{\big (1+2\lambda d\big )^2}\cdot h^{-2}\Big ). \end{aligned}$$

In this case, as at least one of \(\xi \) and \(\eta \) belongs to \((k^{-p},k^{-1/2})\), we have

$$\begin{aligned}&d = a+b = \sin ^2\frac{\xi h}{2} + \sin ^2\frac{\eta h}{2} \\&\quad \ge \sin ^2\frac{k^{-p}h}{2} = \frac{k^{-2p}h^2}{4} - \frac{k^{-4p}h^4}{48} + O(k^{-5p}h^5).\\&\mathbb {E}\big [\big |X_N\big |^2\big ]< |X_0|^2\exp \Big (- \frac{4 \beta \big (d+\lambda d^2\big )T}{\big (1+2\lambda d\big )^2}\cdot h^{-2}\Big )\\&\quad< |X_0|^2\exp \big (-4\beta dTh^{-2} \big )< |X_0|^2\exp \big (-\beta Tk^{-2p} \big ). \end{aligned}$$

Since \(|\varOmega _{\text {high}}^1|<4k^{-1}\),

$$\begin{aligned} 16 k^{-1} \iint _{\varOmega _{\text {high}}^1} \mathbb {E}\Big [\big |X_N(\xi ,\eta )\big |^2\Big ]\,\mathrm {d}\xi \mathrm {d}\eta < 64|X_0|^2 k^{-2}\exp \big (-\beta Tk^{-2p} \big ). \end{aligned}$$

For \(\lambda \ge 1\), and \((\xi ,\eta )\in \varOmega _{\text {high}}^2\), \(d = \sin ^2\frac{\xi h}{2} + \sin ^2\frac{\eta h}{2} \in [\sin ^2\frac{1}{2\sqrt{\lambda }},2]\),

$$\begin{aligned} \max _{d} \bigg (1- \frac{4 \beta \big (\lambda d+\lambda ^2 d^2\big )}{\big (1+2\lambda d\big )^2}\bigg )= & {} 1 - \beta \min _{d}\bigg ( 1 - \frac{1}{\big (1+2\lambda d\big )^2}\bigg )\\= & {} 1 - \beta \bigg ( 1 - \max _{d}\frac{1}{\big (1+2\lambda d\big )^2}\bigg ). \end{aligned}$$

As

$$\begin{aligned} \max _{d}\frac{1}{\big (1+2\lambda d\big )^2}&= \frac{1}{\big (1+2\lambda d\big )^2}\bigg |_{d = \sin ^2\frac{1}{2\sqrt{\lambda }}} \\&= \frac{1}{\big (1+2\lambda \sin ^2\frac{1}{2\sqrt{\lambda }}\big )^2}\le \frac{1}{\big (1+2 \sin ^2\frac{1}{2}\big )^2} < 0.5, \end{aligned}$$

we have

$$\begin{aligned} 1- \frac{4 \beta \big (\lambda d+\lambda ^2 d^2\big )}{\big (1+2\lambda d\big )^2}&\le 1-\beta \Big (1-\frac{1}{\big (1+2\lambda \sin ^2\frac{1}{2\sqrt{\lambda }}\big )^2}\Big )\\&= 1-\beta +\frac{\beta }{\big (1+2\lambda \sin ^2\frac{1}{2\sqrt{\lambda }}\big )^2}<1-\frac{1}{2}\beta . \end{aligned}$$

Denote \(\theta _0:=1-\frac{1}{2}\beta \in (0,1)\), then

$$\begin{aligned} \mathbb {E}\big [\big |X_N\big |^2\big ] \le |X_0|^2\bigg (1- \frac{4 \beta \big (\lambda d+\lambda ^2 d^2\big )}{\big (1+2\lambda d\big )^2}\bigg )^N\le |X_0|^2\cdot \theta _0^N. \end{aligned}$$

So

$$\begin{aligned} 16\pi ^2 h^{-2} \iint _{\varOmega _{\text {high}}^2} \mathbb {E}\Big [\big |X_N(\xi ,\eta )\big |^2\Big ]\,\mathrm {d}\xi \mathrm {d}\eta < 64\pi ^4|X_0|^2h^{-4}\theta _0^N. \end{aligned}$$

Hence

$$\begin{aligned} \begin{aligned}&\mathbb {E}\Bigg [\,\bigg |\iint _{\varOmega _{\text {high}}} X(T,\xi ,\eta )-X_N(\xi ,\eta )\,\mathrm {d}\xi \mathrm {d}\eta \bigg |^2\,\Bigg ] \\&\quad \le C\,k^{p-1}\exp \big (-\beta Tk^{-p}\big ) + C\, kh^{-2}\exp \big (-2\beta Tk^{-1}\big ) \\&\qquad + 64|X_0|^2 k^{-2}\exp \big (-\beta Tk^{-2p} \big ) + 64\pi ^4|X_0|^2h^{-4}\theta _0^N. \end{aligned} \end{aligned}$$
(4.6)

As the first three terms in (4.6) are terms of higher order than \(h^4\theta ^N\), we have, for \(\lambda \ge 1\),

$$\begin{aligned} \mathbb {E}\Bigg [\,\bigg |\iint _{\varOmega _{\text {high}}} X(T,\xi ,\eta )-X_N(\xi ,\eta )\,\mathrm {d}\xi \mathrm {d}\eta \bigg |^2\,\Bigg ] \le C h^{-4}\theta _0^{N}. \end{aligned}$$

Letting \(\theta = \sqrt{\theta _0}\) the result follows. \(\square \)

4.3 Convergence of the ADI scheme (proof of Theorem 2.2)

Theorem 2.2 states that the error of the ADI method (2.3) has the same order as the implicit Milstein scheme (2.2). We now give a proof as follows.

Proof of Theorem 2.2

Let

$$\begin{aligned} X_{n+1} = C_n\,X_n, \end{aligned}$$

where

$$\begin{aligned} C_n \equiv \exp \bigg (-\frac{1}{2}(1-\rho _x)\xi ^2k -\frac{1}{2}(1-\rho _y)\eta ^2k -\mathrm {i}\xi \sqrt{\rho _xk}Z_{n,x} - \mathrm {i}\eta \sqrt{\rho _yk}\widetilde{Z}_{n,y} + e_n\bigg ), \end{aligned}$$

and \(e_n\) is the logarithmic error between the numerical solution and the exact solution introduced during \([nk,(n+1)k]\). From (3.11), \(C_n\) has the form

$$\begin{aligned} C_n = \frac{1 -\mathrm {i}c_x\sqrt{\rho _xk}Z_{n,x} -\mathrm {i}c_y\sqrt{\rho _yk}\widetilde{Z}_{n,y} + b_x\rho _xk(Z_{n,x}^2-1) + b_y\rho _yk(\widetilde{Z}_{n,y}^2-1) + d\sqrt{\rho _x\rho _y}kZ_{n,x}\widetilde{Z}_{n,y}}{(1-a_xk)(1-a_yk)}. \end{aligned}$$

In the low wave region, the numerical solutions are close to the exact solutions. We get from Taylor expansion that

$$\begin{aligned} X_N - X(T)&= X(T)\cdot \bigg \{\frac{\mathrm {i}}{6}\sqrt{\rho _x}\xi ^3h_x^2M_{T}^x + \frac{1}{24}(1-4\rho _x)\xi ^4h_x^2T \\&\quad + \frac{\mathrm {i}}{6}\sqrt{\rho _y}\eta ^3h_y^2\widetilde{M}_{T}^y + \frac{1}{24}(1-4\rho _y)\eta ^4h_y^2T \\&\quad + \mathrm {i} k\sqrt{k}\sum _{n=0}^{N-1}\widehat{\phi }_1(Z_{n,x},\widetilde{Z}_{n,y})+ k^2\sum _{n=0}^{N-1}\widehat{\phi }_2(Z_{n,x},\widetilde{Z}_{n,y}) + o(k,h_x^2,h_y^2)\bigg \}. \end{aligned}$$

In the high wave region, we have

$$\begin{aligned}&X_N = X_0\\&\prod _{n=0}^{N-1}\frac{1 -\mathrm {i}c_x\sqrt{\rho _xk}\,Z_{n,x} - \mathrm {i}c_y\sqrt{\rho _yk}\,\widetilde{Z}_{n,y} + b_x\rho _xk(Z_{n,x}^2-1) + b_y\rho _yk(\widetilde{Z}_{n,y}^2-1) + d\sqrt{\rho _x\rho _y}\,k\,Z_{n,x}\widetilde{Z}_{n,y}}{(1-a_xk)(1-a_yk)}. \end{aligned}$$

Then

$$\begin{aligned} \lim _{N\rightarrow \infty }\mathbb {E}[X_N]&= X_0\exp \bigg (-\frac{1}{2}\big (\xi ^2u+\eta ^2v + \frac{1}{2}\xi ^2\eta ^2uvk\\&\quad +2\xi \eta \frac{\sin \xi h_x\sin \eta h_y}{\xi h_x\eta h_y}\sqrt{\rho _x\rho _y} \rho _{xy}\big )\,T\bigg ),\\ \lim _{N\rightarrow \infty }\mathbb {E}[|X_N|^2]&\le |X_0|^2\exp \bigg (-\frac{1}{2}\xi ^2uT\big (1-\rho _x+\frac{1}{4}\xi ^2uk (1-2\rho _x(1+\rho _{xy}))\big )\\&\quad -\frac{1}{2}\eta ^2vT\big (1-\rho _y+\frac{1}{4}\eta ^2vk(1-2\rho _y(1+\rho _{xy}))\big )\\&\quad -\frac{1}{4}\xi ^2\eta ^2uvkT\big (1-\rho _{xy}(1+\rho _{xy} + 3\rho _{xy}^2)\big )\bigg ), \end{aligned}$$

where \(u = \sin ^2 \frac{h_x\xi }{2}/(\frac{h_x\xi }{2})^2,\ v = \sin ^2 \frac{h_y\eta }{2}/(\frac{h_y\eta }{2})^2.\) By the same reasoning as for the implicit scheme (2.2), the integration over the high wave region is of higher order than \(h_x^2\) and \(h_y^2\) given condition (2.7). Then the inverse Fourier transform gives the result. \(\square \)

5 Numerical tests

In this section, we illustrate the stability and convergence results from the previous section by way of empirical tests.

Unless stated otherwise, we choose parameters \(T=1,\ x_0= y_0 = 2,\ \mu _x = \mu _y = 0.0809,\ \rho _x = \rho _y = 0.2\), \(\rho _{xy} = 0.45\). For the computations, we truncate the domain to \([-8,12]\times [-8,12]\), chosen large enough such that the effect of zero Dirichlet boundary conditions on the solution is negligible.

Figure 1a shows the numerical solution for one Brownian path, with \(h_x = h_y = 2^{-4},\,k = 2^{-10}\). Figure 1b plots the pointwise error between the Milstein-ADI approximation (2.3) and the analytic solution (1.9).

Fig. 1
figure 1

Numerical approximation to the SPDE (1.7) with the Milstein-ADI approximation (2.3) for one Brownian path, and the pointwise error to the analytic solution (1.9)

Figure 2 verifies the \(L_2\)-convergence order in h and k from (2.10). We approximate the error by

$$\begin{aligned}&\Big (\sum _{i,j}h_xh_y E_L \big [|V_{i,j}^N - v(T,x_i,y_j)|^2\big ]\Big )^{1/2} \nonumber \\&\quad = \Big (\sum _{i,j}\frac{h_x h_y}{L} \sum _{l=1}^L |V_{i,j}^N(M^{(l)}) - v(T,x_i,y_j; M^{(l)})|^2\Big )^{1/2}, \end{aligned}$$
(5.1)

where \(M^{(l)}\) are independent Brownian motions and \(E_L\) is the empirical mean with \(L=100\) samples.

Here, Fig. 2a shows the convergence in \(h = h_x = h_y\) with \(k =2^{-12}\) small enough, which demonstrates second order convergence in h. Figure 2b shows the convergence in k with \(h = h_x = h_y=2^{-6}\) small enough to ensure sufficient accuracy of the spatial approximation. One can clearly observe first order convergence in k.

Fig. 2
figure 2

Verifies the \(L_2\)-convergence order in h and k from (2.10), with coarsest level \(h = 1,\ k=1/4\) and finest level \(h=2^{-5},\ k=2^{-10}\), using 100 Monte Carlo samples

In Fig. 3, we illustrate the dependence of the approximation error in the \(L_2\)-norm on the correlation parameters. The error increases as a function of \(\rho _x\) and \(\rho _y\). The error for \(\rho _x=\rho _y\le 0.3\) (see Fig. 3a) varies between roughly \(10^{-3}\) and \(3 \cdot 10^{-3}\), the error being smallest for \(\rho _x=\rho _y= 0\) (the PDE case), and largest for large \(\rho _x=\rho _y\) and \(\rho _{xy}\) between 0.1 and 0.4. For larger \(\rho _x\) and \(\rho _y\), the error increases sharply. The stability region from Assumption 2.1 is marked in dark blue, which shows that stable results are obtained even outside the region where mean-square stability is proven. We found problems only for \(\rho _x=\rho _y \ge 0.8\). This discrepancy is partly due to the fact that Assumption 2.1 is sufficient, but not necessary, as some of the estimates are not sharp. Figure 3b shows a similar behaviour when varying \(\rho _x\) and \(\rho _y\) independently for fixed \(\rho _{xy}\).

Fig. 3
figure 3

\(L_2\) error in space as function of correlation parameters for fixed \(h_x=h_y=2^{-3}\) and \(k=2^{-9}\), for a fixed path. The dark blue areas correspond to the stability region from Assumption 2.1

Figure 4 shows the singular behaviour of the solution for large k and small h, as predicted by Theorem 2.1. Figure 5 investigates the behaviour of the error in this regime further, with \(h_x=h_y=h\) in Fig. 5a, and \(h_y=2^{-1}\) fixed, \(h_x=h\) in Fig. 5b, letting h go to zero. We calculate the \(L_2\) error in space, and compare different scenarios under the same Brownian path. The top black line shows the error with \(k=2^{-2}\) fixed, and \(\rho _x=\rho _y=0.6,\ \rho _{xy}=0.1\). We can see that as h goes to zero, the error diverges with rate \(h^{-1/2}\) (choosing \(h_y=2^{-1}\) fixed enables more refinements in \(h_x\) to show the asymptotic behaviour better). The blue line second from top plots the error with \(k=2^{-4}\) fixed instead. Note that in this case the error will eventually diverge for h going to zero, but this is not visible yet for this level of h. The next red line plots the error for \(\rho _x = \rho _y =0\), with \(k=2^{-2}\) fixed. Then the SPDE (1.7) becomes a PDE and divergence does not appear. Finally, the bottom blue dotted line plots the error for the SPDE (1.7) with initial condition

$$\begin{aligned} v(0,x,y) = \frac{1}{2\pi \sqrt{(1-\rho _x)(1-\rho _y)}}\,\exp \Big (-\frac{(x-x_0-\mu _x)^2 }{2(1-\rho _x)}-\frac{(y-y_0-\mu _y )^2}{2(1-\rho _y)}\Big ). \end{aligned}$$
(5.2)

For this smooth initial condition, the solution does not diverge for large k and small h. Hence, this verifies that the divergence is a result of the interplay of singular data and stochastic terms only, as shown in Corollary 2.2. We emphasise that the instability is so mild that it is only visible in artificial numerical tests, while for reasonably small k, in particular for \(k\sim h^2\) as would be chosen in practice, no instabilities occur.

Fig. 4
figure 4

Unstable solution for the Milstein-ADI scheme (2.3) for one Brownian path, \(k=2^{-2}\) and \(h_x=h_y=2^{-9}\), \(\rho _x=\rho _y=0.6,\ \rho _{xy}=0.1\)

Fig. 5
figure 5

Comparing the \(L_2\) error in space with fixed k and letting \(h\rightarrow 0\) under various senarios: \(k = 2^{-2};\ k=2^{-4};\ k = 2^{-2}\) and \(\rho _x=\rho _y=0\,\); \(k = 2^{-2}\) and smooth initial condition (5.2), all with a same Brownian path

6 An extended scheme and tests for a more general SPDE

To approximate the general Zakai SPDE (1.1),

$$\begin{aligned} \mathrm {d}v(t,x)= & {} \bigg (\frac{1}{2}\sum _{i,j=1}^d\frac{\partial ^2}{\partial x_i\partial x_j}\big [a_{ij}(x)v(t,x)\big ] - \sum _{i=1}^d\frac{\partial }{\partial x_i}\big [b_i(x)v(t,x)\big ]\bigg )\,\mathrm {d}t \\&- \sum _{l=1}^m\sum _{i=1}^d \frac{\partial }{\partial x_i}(\gamma _{i,l}(x)v(t,x))\,\mathrm {d}M_t^k, \end{aligned}$$

by the Milstein scheme, we approximate in the last term

$$\begin{aligned} v(s) \approx v(t) - \sum _{l=1}^m\sum _{i=1}^d \frac{\partial }{\partial x_i}\Big (\gamma _{i,l}(x)v(t,x)\Big )\,\big (M_l(s) - M_l(t)\big ), \end{aligned}$$

then

$$\begin{aligned}&-\sum _{l=1}^m\sum _{i=1}^d \int _t^{t+k} \frac{\partial }{\partial x_i}\Big (\gamma _{i,l}(x)v(s,x)\Big )\,\mathrm {d}M_l(s)\\&\quad =-\sum _{l=1}^m\sum _{i=1}^d \int _t^{t+k} \frac{\partial }{\partial x_i} \Big (\gamma _{i,l}(x)\Big [ v(t) - \sum _{p=1}^m\sum _{j=1}^d\\&\frac{\partial }{\partial x_j}\Big (\gamma _{j,p}(x)v(t,x)\Big )\,\big (M_p(s) - M_p(t)\big ) \Big ]\Big )\,\mathrm {d}M_l(s) \\&\quad = - \sum _{l=1}^m\sum _{i=1}^d \frac{\partial }{\partial x_i}\Big (\gamma _{i,l}(x)v(t,x)\Big )\,\varDelta M_l^n + \sum _{l=1}^m\sum _{p=1}^m \sum _{i=1}^d \frac{\partial }{\partial x_i}\\ {}&\qquad \times \Big (\gamma _{il}(x)\sum _{j=1}^d\frac{\partial }{\partial x_j}\big ( \gamma _{jp}(x)v(t,x) \big ) \Big )\int _t^{t+k}M_p(s)-M_p(t)\,\mathrm {d}M_l(s). \end{aligned}$$

The corresponding ADI implicit Milstein scheme is

$$\begin{aligned}&\prod _{i=1}^d \bigg (I + \frac{k}{2h_i} D_i b_i(X) - \frac{1}{2} \frac{k}{h_i^2}D_{ii}a_{ii}(X)\bigg )V^{n+1}\\&\quad = \bigg \{I + \frac{1}{2} \sum _{i\ne j} \frac{k}{4h_ih_j}D_{i}D_j a_{ij}(X) - \sum _{l=1}^m \varDelta M_l^n\sum _{i=1}^d \frac{1}{2h_i} D_i \gamma _{i,l}(X) \\&\qquad + \sum _{l=1}^m\sum _{p=1}^m \Big (\int _{nk}^{(n+1)k}\big (M_p(s)-M_p(t)\big )\,\mathrm {d}M_l(s)\Big )\\&\qquad \times \sum _{i=1}^d \Big (\frac{1}{2h_i}D_i \gamma _{il}(X)\sum _{j=1}^d \frac{1}{2h_j}D_j\gamma _{jp}(X)\Big )\bigg \}V^n. \end{aligned}$$

Here, \(\{D_i\}_{1\le i\le d}\) are first order difference operators, and \(\{D_{ij}\}_{1\le i,j\le d}\) are second order difference operators, X is the vector of mesh points ordered the same way as V, and, by slight abuse of notation, we denote by \(a(X), b(X), \gamma (X)\) the diagonal matrices such that each element of the diagonal corresponds to the function evaluated at the corresponding mesh point.

Notice the presence of an iterated Itô integral \(\int _{t}^{t+k} (M_p(s) - M_s(t)) \,\mathrm {d}M_l(s)\), called Lévy area, is common in multi-dimensional Milstein schemes. It has been proved in [8, 33] that there is no way to achieve a better order of strong convergence than for the Euler scheme by using solely the discrete increments of the driving Brownian motions. An efficient algorithm for the approximate simulation of the Lévy area has been proposed in [38], building on earlier work in [15, 26] and based on an approximation of the distribution of the tail-sum in a truncated infinite series representation derived from the characteristic functions of these integrals, achieving the complexity \(\varepsilon ^{-3/2}\) of sampling a single path to obtain a strong error \(\varepsilon \). Malham and Wiese proposed another algorithm for the simulation of the Lévy area conditioned on the endpoints, in [32], representing the Lévy area by an infinite weighted sum of independent logistic random variables, further reducing the complexity to \(\varepsilon ^{-1}|\log (\varepsilon ^{-1})|\).

However, the algorithms mentioned above are fairly complex. What we do instead is to further divide the interval \([t,t+k]\) into \(O(k^{-1})\) steps and perform a simple Euler approximation to the Lévy area. Note that we always use the timestep k for the SPDE computation. Overall, this still leads to first order convergence in time, and second order in space. To balance the leading order error, the optimal choice is \(O(k) = O(h^2) = O(\varepsilon )\). Therefore, the estimate of the Lévy area in each time-step increases the computation time by \(O(k^{-1}) = O(\varepsilon ^{-1})\) for one step, whereas the matrix calculation for each time step is also \(O(\varepsilon ^{-1})\), and hence the order of the total complexity does not change.

Moreover, the path simulation including the Lévy areas can be performed separately beforehand using vectorisation, leading to further speed-up.

Now we apply this method to an SPDE from [21],

$$\begin{aligned} \mathrm {d}u= & {} \bigg [\kappa _1 u - \Big (r_1 - \frac{1}{2}y - \xi _1\rho _3\rho _{1,1}\rho _{2,1}\Big ) \frac{\partial u}{\partial x} - \Big ( \kappa _1(\theta _1-y) - \xi _1^2 \Big )\frac{\partial u}{\partial y} + \frac{1}{2}y \frac{\partial ^2 u}{\partial x^2} \nonumber \\&\quad + \xi _1\rho _3\rho _{1,1}\rho _{2,1}y\frac{\partial ^2 u}{\partial x\partial y} + \frac{\xi _1^2}{2}y\frac{\partial ^2 u}{\partial y^2} \bigg ]\,\mathrm {d}t - \rho _{1,1}\sqrt{y}\frac{\partial u}{\partial x}\,\mathrm {d}W_t - \xi _1\rho _{2,1}\frac{\partial }{\partial y}(\sqrt{y}u)\,\mathrm {d}B_t,\nonumber \\ \end{aligned}$$
(6.1)

with Dirac initial \(u(0,x,y) = \delta (x-x_0)\otimes \delta (y-y_0)\). This SPDE models the limit empirical measure of a large portfolio of defaultable assets in which the asset value processes are modelled by Heston-type stochastic volatility models with common and idiosyncratic factors in both the asset values and the variances, and default is triggered by hitting a lower boundary.

Similar to before, we implement the SPDE (6.1) with a Milstein ADI scheme:

$$\begin{aligned} \begin{aligned} A_x\,A_y\,U^{n+1}&= \bigg ( (1+\kappa _1\,k)I - \frac{k}{4h_x}\xi _1\rho _{1,1}\rho _{2,1}\rho _3D_x - \frac{\sqrt{k}Z_{n,x}}{2h_x}\rho _{1,1}\sqrt{Y}D_x \\&\quad + \frac{k(Z_{n,x}^2-1)}{8h_x^2}\rho _{1,1}^2YD_{x}^2 + \frac{k}{4h_x}\xi _1\rho _{1,1}\rho _{2,1}Z_{n,x}\widetilde{Z}_{n,y} \Big (D_x +\frac{1}{h_y}Y D_{xy}\Big ) \\&\quad + \frac{1}{4h_x}\xi _1\rho _{2,1}\rho _{1,1} \Big (\int _t^{t+k}(W_s-W_t)\,\mathrm {d}B_s\Big ) D_x\bigg )U^n\\&\quad - \frac{\sqrt{k}\widetilde{Z}_{n,y}}{2h_y}\xi _1\rho _{2,1}D_y(\sqrt{Y}U^{n})\\ {}&\quad + \frac{k(\widetilde{Z}^2_{n,y}-1)}{8h_y^2}\xi _1^2\rho _{2,1}^2D_y\Big (\sqrt{Y}\big (D_y(\sqrt{Y}U^{n})\big )\Big ), \end{aligned} \end{aligned}$$
(6.2)

where the notation for Y follows the same principle as above for X, \(\widetilde{Z}_n^y = \rho _{xy}Z_n^x + \sqrt{1-\rho _{xy}^2}Z_n^y\), with \(Z_n^x,Z_n^y\sim N(0,1)\) independent normal random variables, and

$$\begin{aligned} A_x&= I + \frac{k}{2h_x}\Big (r_1 - \xi _1\rho _3\rho _{1,1}\rho _{2,1}\Big )D_x - \frac{k}{4h_x}YD_{x} - \frac{k}{2h_x^2}YD_{xx},\\ A_y&= I + \frac{k}{2h_y}(\kappa _1\theta _1 - \xi _1^2)D_{y} - \frac{k}{2h_y}\kappa _1YD_{y} - \frac{k}{2h_y^2}\xi _1^2 Y D_{yy}. \end{aligned}$$

Furthermore, we also implement the SPDE (6.1) with coefficients frozen at \((x_0,y_0)\),

$$\begin{aligned} \begin{aligned} \mathrm {d}u&= \bigg [\kappa _1 u - \Big (r_1 - \frac{1}{2}y_0 - \xi _1\rho _3\rho _{1,1}\rho _{2,1}\Big ) \frac{\partial u}{\partial x} - \Big ( \kappa _1(\theta _1-y_0) - \xi _1^2 \Big )\frac{\partial u}{\partial y} + \frac{1}{2}y_0 \frac{\partial ^2 u}{\partial x^2} \\&\quad + \xi _1\rho _3\rho _{1,1}\rho _{2,1}y_0\frac{\partial ^2 u}{\partial x\partial y} + \frac{\xi _1^2}{2}y_0\frac{\partial ^2 u}{\partial y^2} \bigg ]\,\mathrm {d}t \\ {}&\quad - \rho _{1,1}\sqrt{y_0}\frac{\partial u}{\partial x}\,\mathrm {d}W_t - \xi _1\rho _{2,1}\frac{\partial }{\partial y}(\sqrt{y_0}u)\,\mathrm {d}B_t, \end{aligned} \end{aligned}$$
(6.3)

with Dirac initial datum \(u(0,x,y) = \delta (x-x_0) \otimes \delta (y-y_0)\). We can apply a Milstein ADI scheme similar to (2.2),

$$\begin{aligned} \begin{aligned}&\bigg (I + \frac{k}{2h_x}\Big (r_1 - \xi _1\rho _3\rho _{1,1}\rho _{2,1} - \frac{1}{2}y_0\Big )D_x - \frac{k}{2h_x^2}y_0D_{xx}\bigg )\\ {}&\bigg (I + \frac{k}{2h_y}(\kappa _1(\theta _1-y_0) - \xi _1^2)D_{y} - \frac{k}{2h_y^2}\xi _1^2 y_0 D_{yy}\bigg ) U^{n+1}\\&\quad =\bigg ((1+\kappa _1\,k)I - \frac{\sqrt{k}Z_{n,x}}{2h_x}\rho _{1,1}\sqrt{y_0}D_x + \frac{k(Z_{n,x}^2-1)}{8h_x^2}\rho _{1,1}^2y_0D_{x}^2\\ {}&\qquad - \frac{\sqrt{k}\widetilde{Z}_{n,y}}{2h_y}\xi _1\rho _{2,1}\sqrt{y_0}D_y+ \frac{k(\widetilde{Z}_{n,y}^2-1)}{8h_y^2}\xi _1^2\rho _{2,1}^2y_0D_{y}^2 \\&\qquad + \frac{k\,Z_{n,x}\widetilde{Z}_{n,y}}{4h_xh_y}\xi _1\rho _{1,1}\rho _{2,1}D_{xy} \bigg )U^{n}. \end{aligned} \end{aligned}$$
(6.4)

We will test the behaviour of (6.3) at \((x_0,y_0)\) and compare with the SPDE (1.7).

We choose parameters \(T = 1,\, x_0 = 2,\, y_0 = 1.4,\, r = 0.05,\, \xi = 0.5,\, \theta = 0.4,\, \kappa = 2,\, \rho _{1,1}= 0.3\), \(\rho _{2,1}=0.2\), \(\rho _{3}=0.5\). We truncate the domain to \([-3,7]\times [0,1.5]\) for the scheme (6.2), and \([-3,7]\times [-1,1.5]\) for the scheme (6.4), sufficiently large in this setting.

Figure 6a shows the density for a single Brownian path, with \(k=2^{-8}\), \(h_x = 5/16\), and \(h_y = 1/80\).

Figure 6b compares the computational cost under different time-stepping schemes: the Milstein scheme, a “modified” Milstein scheme, and the Euler scheme. Here, for the Milstein scheme we approximate the Lévy area by sub-timestepping as explained above, while in the “modified” Milstein scheme we drop \(\int _t^{t+k} (W_s-W_t)\,\mathrm {d}B_s\) but keep the one-dimensional iterated integrals as they are known analytically. We expect that the latter will lead to a worse convergence in time (for non-zero \(\xi _1, \rho _{2,1}, \rho _{1,1}\)), which is verified in Fig. 7b.

In Fig. 6b, from a coarsest mesh with \(h_x = 0.625\), \(h_y = 0.025\), and \(k = 0.25\), we keep decreasing the time-step k by a factor of 4, and the spatial mesh width by a factor of 2. This shows that the cost, measured by time elapsed in simulating one path, increases by a factor of 16, demonstrating that these three schemes result in the same order of complexity.

Fig. 6
figure 6

Single path realisation of the density to the SPDE (6.1) and its numerical cost (CPU time in sec)

Figure 7 verifies the \(L_2\) convergence order in h and k. In absence of an exact solution, we compute a proxy to the error in \(h_x,\ h_y\) by

$$\begin{aligned}&\bigg (\sum _{i,j}h_xh_y E_L\Big [\big |V_{2i,2j}^N\Big (k,\frac{h_x}{2},\frac{h_y}{2}\Big ) - V_{i,j}^N(k,h_x,h_y)\big |^2\Big ]\bigg )^{1/2}\\&\quad = \Big (\sum _{i,j} \frac{h_xh_y}{L} \sum _{l=1}^{L} \Big |V_{2i,2j}^N\Big (k,\frac{h_x}{2},\frac{h_y}{2};M^{(l)}\Big ) - V_{i,j}^N(k,h_x,h_y; M^{(l)})\Big |^2\Big )^{1/2}, \end{aligned}$$

where \(V_{i,j}^N(k,h_x,h_y)\) is the numerical approximation to \(v(T,ih_x,jh_y)\) with mesh widths \(h_x\) and \(h_y\), \(V_{2i,2j}^N(k,h_x/2,h_y/2)\) uses a fine mesh with \(h_x/2\) and \(h_y/2\), and \(E_L\) is the empirical mean with L samples of the Brownian path M. Both share the same Brownian path and same time step k, thus the univariate error in k cancels and we should see the correct convergence order in \(h_x\) and \(h_y\). We could alternatively solve on a very fine mesh and treat the result as the exact solution to test the convergence as in (5.1). It is clear from the construction that the observed convergence order will be the same. However, since the computation of a highly accurate solution is extremely expensive, and such a numerical reference solution will introduce a bias, we do not perform this test here. Here, Fig. 7a shows the convergence in \(h = h_x = h_y\) with \(k =2^{-4}\) fixed and \(L = 100\), which demonstrates second order convergence in h.

Similarly, we study the error in k by using the difference between two solutions with same mesh size, same Brownian path, but different time-steps,

$$\begin{aligned}&\Big (\sum _{i,j}h_xh_yE_L\big [|V_{i,j}^N(k,h_x,h_y) - V_{i,j}^{2N}(k/2,h_x,h_y)|^2\big ]\Big )^{1/2}\\&\quad = \Big (\sum _{i,j}\frac{h_xh_y}{L} \sum _{l=1}^{L} \Big |V_{i,j}^N\Big (k,h_x,h_y;M^{(l)}\Big ) - V_{i,j}^{2N}(k/2,h_x,h_y; M^{(l)})\Big |^2\Big )^{1/2}. \end{aligned}$$

Figure 7b shows the convergence in k with fixed \(h_x = 5/8,\ h_y = 1/40\), under the Milstein scheme, “modified” Milstein scheme, and Euler scheme. The timestep k decreases by a factor of 4 from one level to the next. We also plot two blue dashed lines with slope 1 / 2 and 1 as reference. One can clearly observe first order convergence in k for the Milstein scheme, and half order convergence in k for the Euler scheme. As for the “modified” Milstein scheme, although it appears to converge with first order on coarse levels (due to dominance of the terms which converge with first order for this level of accuracy), the asymptotic order is seen to be lower.

Fig. 7
figure 7

Test the convergence order in \((h_x,h_y)\) and k for the schemes (6.2) and (6.4), in \(L_2\) in space and probability, using 100 paths

In Fig. 7, we also compare the convergence of the scheme (6.4) for the frozen coefficient SPDE (6.3) with the scheme (6.2) for the SPDE (6.1). We can see that the scheme (6.4) has second order convergence in h, and first order convergence in k, as expected.

We now illustrate more carefully the behaviour at \((x_0,y_0)\) over time. Figure 8a shows one path realisation of the solution \(U(t,x_0,y_0)\) with the scheme (6.2) as a function of t, and \(\widehat{U}(t,x_0,y_0)\) with the scheme (6.4). The mesh sizes and time step are \(h_x = 5\times 2^{-8},\ h_y = 0.2\times 2^{-8},\ k=4^{-8}\). Note that the plots below start from \(t=0.015\) in Fig. 8a and \(t=0.03\) in Fig. 8b rather than zero, because of the Dirac initial datum, however, even under this singular initial condition, \(U(t,x_0,y_0)\) and \(\widehat{U}(t,x_0,y_0)\) are very close over an initial time period. Figure 8b shows the error of the schemes (6.2) and (6.4) for one realisation of the path, where we use \(h_x = 5/16,\ h_y = 1/80,\ k=1/256\) for “coarse” solutions \(U_c\) and \(\widehat{U}_c\), and we treat the fine approximations U and \(\widehat{U}\) as the exact solution. The errors for the SPDEs (6.1) and (6.3) track each other closely.

In Fig. 9, we use another approach to show the error by taking the difference between the solutions with mesh sizes and time steps \((h_x,h_y,k)\) and \((h_x/2,h_y/2,k/4)\). We show the error both for one path, as well as \(L_2\) in probability \((E_L[|U_f(t,x_0,y_0)-U_c(t,x_0,y_0)|^2])^{1/2}\), where \(E_L\) is the empirical mean with \(L=100\) Monte Carlo samples.

Fig. 8
figure 8

Single path realisation of the solution at \((x_0,y_0)\) over time, of SPDE (6.1) with variable coefficients and (6.3) with frozen coefficients

Fig. 9
figure 9

Numerical error at \((x_0,y_0)\) by taking difference between \((h_x,h_y,k)\) and \((h_x/2,h_y/2,k/4)\) to the schemes (6.2) and (6.4), where \(h_x = 5/32,\ h_y = 1/160,\ k=1/1024\)

7 Conclusions

We studied a two-dimensional parabolic SPDE arising from a filtering problem. We proved mean-square stability and convergence for a semi-implicit Milstein discretisation scheme in terms of \(L_2\) in probability, and pointwise as well as \(L_2\) in space (as defined previously). To reduce the complexity, we also implemented an ADI version of the scheme, and provided corresponding convergence results.

Further research is needed to analyse almost sure convergence, which is of interest for filtering applications and does not follow directly from our analysis.

Another open question is a complete analysis of the numerical approximation of initial-boundary value problems (as opposed to problems posed on \(\mathbb {R}^d\)) for the considered SPDE, when the regularity at the boundary is lost. For example, for the 1-d SPDE with constant coefficients on the half-line,

$$\begin{aligned} \mathrm {d}v&= -\mu \frac{\partial v}{\partial x}\,\mathrm {d}t + \frac{1}{2}\frac{\partial ^2 v}{\partial x^2}\,\mathrm {d}t - \sqrt{\rho }\frac{\partial v}{\partial x}\,\mathrm {d}M_t,\qquad (t,x)\in (0,T)\times \mathbb {R}^+,\\ v(t,0)&=0, \end{aligned}$$

with initial condition \(v(0,\cdot )\in H^1\), the second derivative can be unbounded, i.e., \(v(t,\cdot )\notin H^2\). This and more general forms have been studied in [6, 28]. In such cases, the assumptions on Galerkin approximations in papers previously mentioned such as in [2, 4] are not established in the literature, hence a new approach for the numerical analysis is to be developed.