1 Introduction

Recently, for solving high-dimensional partial differential equations (PDEs), deep learning-based algorithms have been actively proposed (see [2, 3] for instance). Moreover, a number of papers for mathematical justification on the deep learning-based spatial approximations have appeared, where the authors demonstrate that deep neural networks overcome the curse of dimensionality in approximations of high-dimensional PDEs. For the related literature, see [4,5,6, 11, 19] for example. In particular, these works treat some specific forms of PDEs such as high-dimensional heat equations or Kolmogorov PDEs with constant diffusion and nonlinear drift coefficient. Also, integral kernels are assumed to have explicit forms for justification of the spatial approximations for solutions to high-dimensional PDEs.

However, most high-dimensional PDEs may not have explicit integral forms in practice. In other words, integral forms of solutions themselves should be approximated by a certain method.

In the current paper, we give a new spatial approximation using an asymptotic expansion method with a deep learning-based algorithm for solving high-dimensional PDEs without the curse of dimensionality. More precisely, we follow approaches given in [40] and the literature such as [8, 17, 18, 23, 24, 26, 27, 30, 32, 33, 35, 38, 39, 41, 43]. Particularly, we provide a uniform error estimate for the asymptotic expansion for solutions of Kolmogorov PDEs with nonlinear coefficients, motivated by the works of [2, 11, 31]. For a solution to a d-dimensional Kolmogorov PDE with a small parameter \(\lambda \), namely \(u_{\lambda }:[0,T] \times {\mathbb {R}}^d \rightarrow {\mathbb {R}}\) given by \(u_\lambda (t,x)=E[f(X_t^{\lambda ,x})]\) for \((t,x) \in [0,T] \times {\mathbb {R}}^d\) where \(\{ X_t^{\lambda ,x}\}_{t\ge 0}\) is a d-dimensional diffusion process starting from x, we justify the following spatial approximation on a range \([a,b]^d\):

$$\begin{aligned} u_\lambda (t,\cdot )\approx & {} \hbox {``high-dimensional asymptotic expansion''} \ E[f(\bar{X}_t^{\lambda , \cdot }) {{\mathcal {M}}}_t^{\lambda , \cdot }] \end{aligned}$$
(1.1)
$$\begin{aligned}\approx & {} \hbox {``deep neural network approximation''} \ \mathcal{R}(\phi )(\cdot ), \end{aligned}$$
(1.2)

by applying an appropriate neural network \(\phi \). Here, for \(t>0\) and \(x \in {\mathbb {R}}^d\), \(\bar{X}_t^{\lambda , x}\) is a certain Gaussian random variable and \({{\mathcal {M}}}_t^{\lambda ,x}\) is a stochastic weight for the expansion given based on Malliavin calculus. In order to chose the network \(\phi \), the analysis of “product of neural networks" and a dimension analysis of asymptotic expansion with Malliavin calculus are crucial in our approach. We show a precise error estimate for the approximation (1.1) and prove that the complexity of the neural network grows at most polynomially in the dimension d and the reciprocal of the precision \(\varepsilon \) of the approximation (1.1). Moreover, we give an explicit form of the asymptotic expansion in (1.1) and show numerical examples to demonstrate effectiveness of the proposed scheme for high-dimensional Kolmogorov PDEs.

The organization of the paper is as follows. Section 2 is dedicated to notation, definitions and preliminary results on deep learning and Malliavin calculus. Section 3 provides the main result, namely, the deep learning-based asymptotic expansion for solving Kolmogorov PDEs. The proof is shown in Sect. 4. Section 5 introduces the deep learning implementation. Various numerical examples are shown in Sect. 6. The useful lemmas on Malliavin calculus and ReLU calculus are summarized, and furthermore the sample code is listed in Appendix.

2 Preliminaries

We first prepare notation. For \(d \in {\mathbb {N}}\) and for a vector \(x \in {\mathbb {R}}^d\), we denote by \(\Vert x \Vert \) the Euclidean norm. Also, for \(k,\ell \in {\mathbb {N}}\) and for a matrix \(A \in {\mathbb {R}}^{k \times \ell }\), we denote by \(\Vert A \Vert \) the Frobenius norm. For \(d \in {\mathbb {N}}\), let \(I_d\) be the identity matrix. For \(m,k,\ell \in {\mathbb {N}}\), let \(C({\mathbb {R}}^m, {\mathbb {R}}^{k \times \ell })\) (resp., \(C([0,T] \times {\mathbb {R}}^m, {\mathbb {R}}^{k \times \ell })\)) be the set of continuous functions \(f: {\mathbb {R}}^k \rightarrow {\mathbb {R}}^{k \times \ell }\) (resp., \(f: [0,T] \times {\mathbb {R}}^m \rightarrow {\mathbb {R}}^{k \times \ell }\)) and \(C_{Lip}({\mathbb {R}}^m, {\mathbb {R}}^{k \times \ell })\) be the set of Lipschitz continuous functions \(f: {\mathbb {R}}^m \rightarrow {\mathbb {R}}^{k \times \ell }\). Also, we define \(C^\infty _b({\mathbb {R}}^m, {\mathbb {R}}^\ell )\) as the set of smooth functions \(f: {\mathbb {R}}^m \rightarrow {\mathbb {R}}^{k \times \ell }\) with bounded derivatives of all orders. For a multi-index \(\alpha \), let \(|\alpha |\) be the length of \(\alpha \). For a bounded function \(f:{\mathbb {R}}^m \rightarrow {\mathbb {R}}^{k \times \ell }\), we define \(\Vert f \Vert _{\infty }=\textstyle {\sup _{x \in {\mathbb {R}}^{m}}} \Vert f(x) \Vert \). For \(m,k,\ell \in {\mathbb {N}}\), for a function \(f \in C_{Lip}({\mathbb {R}}^m, {\mathbb {R}}^{k \times \ell })\), we denote by \(C_{Lip}[f]\) the Lipschitz continuous constant. For \(d \in {\mathbb {N}}\) and for a smooth function \(f:{\mathbb {R}}^d \rightarrow {\mathbb {R}}\), we define \(\partial _i f=\textstyle {\frac{\partial }{\partial x_i}f}\) for \(i=1,\ldots ,d\), moreover we define \(\partial ^\alpha f=\partial _{\alpha _1}\cdots \partial _{\alpha _k}f\) for \(\alpha =(\alpha _1,\ldots ,\alpha _k) \in \{1,\ldots ,d \}^k\), \(k \in {\mathbb {N}}\). For \(a,b \in {\mathbb {R}}\), we may write \(a \vee b=\max \{ a,b \}\).

2.1 Deep neural networks

Let us prepare notation and definitions for deep neural networks. Let \({{\mathcal {N}}}\) be the set of deep neural networks (DNNs):

$$\begin{aligned} {{\mathcal {N}}}=\cup _{L \in {\mathbb {N}} \cap [2,\infty )} \cup _{(N_0,N_1,\ldots ,N_L)\in {\mathbb {N}}^{L+1}} \mathcal{N}_L^{N_0,N_1,\ldots ,N_L}, \end{aligned}$$
(2.1)

where \({{\mathcal {N}}}_L^{N_0,N_1,\ldots ,N_L}={\times }_{\ell =1}^{L} ({\mathbb {R}}^{N_\ell \times N_{\ell -1}} \times {\mathbb {R}}^{N_\ell })\).

Let \(\varrho \in C({\mathbb {R}},{\mathbb {R}})\) be an activation function, and for \(k\in {\mathbb {N}}\), define \(\varrho _{k}(x)=(\varrho (x_1),\ldots ,\varrho (x_k))\), \(x \in {\mathbb {R}}^k\).

We define \({{\mathcal {R}}}:{{\mathcal {N}}} \rightarrow \cup _{m,n\in {\mathbb {N}}} C({\mathbb {R}}^m,{\mathbb {R}}^n)\), \({{\mathcal {C}}}:{{\mathcal {N}}} \rightarrow {\mathbb {N}}\), \({{\mathcal {L}}}: {{\mathcal {N}}} \rightarrow {\mathbb {N}}\), \(\textrm{dim}_{\textrm{in}}:{{\mathcal {N}}} \rightarrow {\mathbb {N}}\) and \(\textrm{dim}_{\textrm{out}}:{{\mathcal {N}}} \rightarrow {\mathbb {N}}\) as follows:

For \(L \in {\mathbb {N}} \cap [2,\infty )\), \(N_0,\ldots ,N_L \in {\mathbb {N}}\), \(\psi =((W_1,B_1),\ldots ,(W_L,B_L)) \in \mathcal{N}_L^{N_0,N_1,\ldots ,N_L}\), let \({{\mathcal {L}}}(\psi )=L\), \(\textrm{dim}_{\textrm{in}}(\psi )=N_0\), \(\textrm{dim}_{\textrm{out}}(\psi )=N_L\), \(\mathcal{C}(\psi )=\textstyle {\sum _{\ell =1}^L} N_{\ell }(N_{\ell -1}+1)\), and

$$\begin{aligned} {{\mathcal {R}}}(\psi )(\cdot )={{\mathcal {A}}}_{W_L,B_L} \circ \varrho _{N_{L-1}} \circ {{\mathcal {A}}}_{W_{L-1},B_{L-1}} \circ \cdots \circ \varrho _{N_{1}} \circ {{\mathcal {A}}}_{W_{1},B_{1}} (\cdot ) \in C({\mathbb {R}}^{N_0},{\mathbb {R}}^{N_L}), \end{aligned}$$
(2.2)

where \({{\mathcal {A}}}_{W_k,B_k}(x)=W_kx+B_k\), \(x \in {\mathbb {R}}^{N_{k-1}}\), \(k=1,\ldots ,L\).

2.2 Malliavin calculus

We prepare basic notation and definitions on Malliavin calculus following Bally [1] Ikeda and Watanabe [16], Malliavin [25], Malliavin and Thalmaier [26] and Nualart [29].

Let \(\Omega ^d=\{ \omega : [0,T] \rightarrow {\mathbb {R}}^d; \ \omega \ \hbox {is continuous}, \ \omega (0)=0 \}\), \(H^d=L^2([0,T],{\mathbb {R}}^d)\) and let \(\mu ^d\) be the Wiener measure on \((\Omega ^d,\mathcal {B}(\Omega ^d))\), where \(\mathcal {B}(\Omega ^d)\) is the Borel \(\sigma \)-field induced by the topology of the uniform convergence on [0, T]. We call \((\Omega ^d,H^d,\mu ^d)\) the d-dimensional Wiener space. For a Hilbert space V with the norm \(\Vert \cdot \Vert _{V}\) and \(p \in [1,\infty )\), the \(L^p\)-space of V-valued Wiener functionals is denoted by \(L^p(\Omega ^d,V)\), that is, \(L^p(\Omega ^d,V)\) is a real Banach space of all \(\mu ^d\)-measurable functionals \(F: \Omega ^d \rightarrow V\) such that \(\Vert F \Vert _p =E [\Vert F \Vert _V^p]^{1/p}< \infty \) with the identification \(F = G\) if and only if \(F(\omega )=G(\omega )\), a.s. When \(V={\mathbb {R}}\), we write \(L^p(\Omega ^d)\). For a real separable Hilbert space V and \(F: \Omega ^d \rightarrow V\), we write \(\Vert F \Vert _{p,V}=E [\Vert F\Vert _V^p]^{1/p}\), in particular, \(\Vert F \Vert _{p}\) when \(V={\mathbb {R}}\). Let \(B^d=\{B^d_t\}_t\) be a coordinate process defined by \(B^d_t(\omega )=\omega (t)\), \(\omega \in \Omega ^d\), i.e. \(B^d\) is a d-dimensional Brownian motion, and \(B^d(h)\) be the Wiener integral \(\textstyle {B^d(h)=\sum _{j=1}^d \int _{0}^{T} {h}^{j}(s) dB_s^{d,j}}\) for \(h\in H^d\).

Let \({\mathscr {S}}(\Omega ^d)\) denote the class of smooth random variables of the form \(F=f( B^d(h_{1}),\ldots ,B^d(h_{n}) )\) where \(f\in C_{b}^{\infty } ( {\mathbb {R}}^{n},{\mathbb {R}}) \), \(h_{1},\ldots ,h_{n}\in H^d\), \(n\ge 1\). For \(F\in {\mathscr {S}}(\Omega ^d)\), we define the derivative DF as the H-valued random variable \(\textstyle {DF=\sum _{j=1}^{n}\partial _{j}f( B^d(h_{1}),\ldots ,B^d(h_{n}) ) h_{j}}\), which is regarded as the stochastic process:

$$\begin{aligned} D_{i,t}F=\textstyle {\sum \limits _{j=1}^{n}}\partial _{j}f( B^d(h_{1}),\ldots ,B^d(h_{n}) ) {h}^i_{j}(t), \ \ i=1,\ldots ,d, \ \ t \in [0,T]. \end{aligned}$$
(2.3)

For \(F \in {\mathscr {S}}(\Omega ^d)\) and \(j \in {\mathbb {N}}\), we set \(D^j F\) as the \((H^d)^{\otimes j}\)-valued random variable obtained by the j-times iteration of the operator D. For a real separable Hilbert space V, consider \({\mathscr {S}}_V\) of V-valued smooth Wiener functionals of the form \(\textstyle {F = \sum _{i=1}^\ell F_i v_i}\), \(v_i \in V\), \(F_i \in {\mathscr {S}}(\Omega ^d)\), \(i \le \ell \), \(\ell \in {\mathbb {N}}\). Define \(\textstyle {D^j F = \sum _{i=1}^\ell D^j F_i \otimes v_i}\), \(j \in {\mathbb {N}}\). Then for \(j \in {\mathbb {N}}\), \(D^j\) is a closable operator from \({\mathscr {S}}_V\) into \(L^p(\Omega ^d,(H^d)^{\otimes j} \otimes V)\) for any \(p \in [1,\infty )\) (see p. 31 of Nualart [29]). For \(k \in {\mathbb {N}}\), \(p \in [1,\infty )\), we define \(\textstyle {\Vert F \Vert ^p_{k,p,V}=E [\Vert F \Vert _V^p] + \sum _{j=1}^k E [ \Vert D^j F \Vert _{(H^d)^{\otimes j} \otimes V}^p ]}\), \(F \in {\mathscr {S}}_V\). Then, the space \({\mathbb {D}}^{k,p}(\Omega ^d,V)\) is defined as the completion of \({\mathscr {S}}_V\) with respect to the norm \(\Vert \cdot \Vert _{k,p,V}\). Moreover, let \({\mathbb {D}}^\infty (\Omega ^d,V)\) be the space of smooth Wiener functionals in the sense of Malliavin \({\mathbb {D}}^\infty (\Omega ^d,V) = \cap _{p\ge 1} \cap _{k\in {\mathbb {N}}} {\mathbb {D}}^{k,p}(\Omega ^d,V)\). We write \({\mathbb {D}}^{k,p}(\Omega ^d)\), \(k \in {\mathbb {N}}\), \(p \in [1,\infty )\) and \({\mathbb {D}}^\infty (\Omega ^d)\), when \(V={\mathbb {R}}\). Let \(\delta \) be an unbounded operator from \(L^2(\Omega ^d,H^d)\) into \(L^2(\Omega ^d)\) such that the domain of \(\delta \), denoted by \(\textrm{Dom}(\delta )\), is the set of \(H^d\)-valued square integrable random variables u such that \(|E [\langle DF,u \rangle _{H^d}]| \le c\Vert F \Vert _{1,2}\) for all \(F \in {\mathbb {D}}^{1,2}(\Omega ^d)\) where c is some constant depending on u, and if \(u \in \textrm{Dom}(\delta )\), there exists \(\delta (u) \in L^2(\Omega ^d)\) satisfying

$$\begin{aligned} E [\langle DF,u \rangle _{H^d}]=E[ F \delta (u) ] \end{aligned}$$
(2.4)

for any \(F \in {\mathbb {D}}^{1,2}(\Omega ^d)\). For \(u=(u^1,\ldots ,u^d) \in \textrm{Dom}(\delta )\), \(\delta (u)=\textstyle {\sum _{i=1}^d} \delta ^{i}(u^i)\) is called the Skorohod integral of u, and it holds that \(E[\textstyle {\int _0^T} D_{i,s}Fu^i_s ds]=E[F \delta ^i(u^i) ]\), \(i=1,\ldots ,d\) for all \(F \in {\mathbb {D}}^{1,2}\) (see Proposition 6 of Bally [1]). For all \(k \in {\mathbb {N}} \cup \{ 0 \}\) and \(p>1\), the operator \(\delta \) is continuous from \({\mathbb {D}}^{k+1,p}(\Omega ^d,H^d)\) into \({\mathbb {D}}^{k,p}(\Omega ^d)\) (see Proposition 1.5.7 of Nualart [29]). For \(G \in {\mathbb {D}}^{1,2}(\Omega ^d)\) and \(h \in \textrm{Dom}(\delta )\) such that \(Gh \in L^{2}(\Omega ^d,H^d)\), it holds that

$$\begin{aligned} \delta ^i(Gh^i)=G\delta ^i(h^i)-{\int _0^T} D_{i,s}Gh^i_sds, \quad i=1,\ldots ,d, \end{aligned}$$
(2.5)

and in particular, if \(h \in \textrm{Dom}(\delta )\) is an adapted process, \(\delta ^i(h^i)\) is given by the Itô integral, i.e. \(\delta ^i(h^i)=\textstyle {\int _0^T} h^i_s dB_s^{d,i}\) for \(i=1,\ldots ,d\) (e.g. see Section 3.1.1 of Bally [1], Proposition 1.3.3 and Proposition 1.3.11 of Nualart [29]).

For \(F=(F^1,\ldots ,F^d) \in ({\mathbb {D}}^{\infty }(\Omega ^d))^d\), define the Malliavin covariance matrix of F, \(\sigma ^F=(\sigma ^F_{ij})_{1 \le i,j \le d}\), by \(\textstyle {\sigma ^F_{ij}=\langle DF^i,DF^j \rangle _{H^d}=\sum _{k=1}^d \int _0^T D_{k,s}F^i D_{k,s}F^j ds}\), \(1\le i,j \le d\). We say that \(F\in ({\mathbb {D}}^{\infty }(\Omega ^d))^d\) is nondegenerate if the matrix \(\sigma ^F\) is invertible a.s. and satisfies \(\Vert ( \det \sigma ^F)^{-1}\Vert _p < \infty \), \(p>1\). Malliavin’s theorem claims that if \(F \in ({\mathbb {D}}^\infty (\Omega ^d))^d\) is nondegenerate, then F has the smooth density \(p^{F}(\cdot )\). Malliavin calculus is further refined by Watanabe’s theory. Let \(\mathcal {S}({\mathbb {R}}^d)\) be the Schwartz space or the space of rapidly decreasing functions and \(\mathcal {S}'({\mathbb {R}}^d)\) be the dual of \(\mathcal {S}({\mathbb {R}}^d)\), i.e. \(\mathcal {S}'({\mathbb {R}}^d)\) is the space of Schwartz tempered distributions. For a tempered distribution \({{\mathcal {T}}} \in \mathcal {S}'({\mathbb {R}}^d)\) and a nondegenerate Wiener functional in the sense of Malliavin \(F \in ({\mathbb {D}}^\infty (\Omega ^d))^d\), \({{\mathcal {T}}}(F)={{\mathcal {T}}} \circ F\) is well-defined as an element of the space of Watanabe distributions \({\mathbb {D}}^{-\infty }(\Omega ^d)\), that is the dual space of \({\mathbb {D}}^{\infty }(\Omega ^d)\) (e.g. see p. 379, Corollary of Ikeda and Watanabe [16], Theorem of Chapter III 6.2 of Malliavin [25], Theorem 7.3 of Malliavin and Thalmaier [26]). Also, for \(G \in {\mathbb {D}}^{\infty }(\Omega ^d)\), a (generalized) expectation \(E[\mathcal{T}(F)G]\) is understood as a pairing of \({{\mathcal {T}}}(F)\in {\mathbb {D}}^{-\infty }(\Omega ^d)\) and \(G\in {\mathbb {D}}^{\infty }(\Omega ^d)\), namely \({}_{{\mathbb {D}}^\infty }\langle {{\mathcal {T}}}(F),G \rangle {}_{{\mathbb {D}}^{-\infty }}\), and it holds that

$$\begin{aligned} {}_{{\mathbb {D}}^{-\infty }}\langle {{\mathcal {T}}}(F),G \rangle {}_{{\mathbb {D}}^\infty }={}_{{{\mathcal {S}}}'}\langle \mathcal{T},E[G|F=\cdot ] p^F(\cdot ) \rangle {}_{{{\mathcal {S}}}} \end{aligned}$$
(2.6)

where \({}_{{{\mathcal {S}}}'}\langle \cdot , \cdot \rangle _{{{\mathcal {S}}}} \) is the bilinear form on \({{\mathcal {S}}}'({\mathbb {R}}^d)\) and \(\mathcal{S}({\mathbb {R}}^d)\), \(E[ G | F= \xi ]\) is the conditional expectation of G conditioned on the set \(\{ \omega ; F(\omega )= \xi \}\) (e.g. see Chapter III 6.2.2 of Malliavin [25], (7.5) of Theorem 7.3 of Malliavin and Thalmaier [26]). In particular, we have \({}_{{\mathbb {D}}^{-\infty }}\langle \delta _y (F),1 \rangle {}_{{\mathbb {D}}^\infty }={}_{{{\mathcal {S}}}'}\langle \delta _y, p^F(\cdot ) \rangle {}_{{{\mathcal {S}}}}=p^F(y)\) for \(y \in {\mathbb {R}}^d\), and thus \(p^F\) is not only smooth but also in \(\mathcal {S}({\mathbb {R}}^d)\), i.e. a rapidly decreasing function (see Theorem 9.2 of Ikeda and Watanabe [16]), Proposition 2.1.5 of Nualart [29]). For a nondegenerate \(F \in ({\mathbb {D}}^\infty (\Omega ^d))^d\), \(G \in {\mathbb {D}}^\infty (\Omega ^d)\) and a multi-index \(\gamma =(\gamma _1,\ldots ,\gamma _k)\), there exists \(H_{\gamma }(F,G) \in {\mathbb {D}}^\infty (\Omega ^d)\) such that

$$\begin{aligned} {}_{{\mathbb {D}}^{-\infty }} \langle \partial ^{\gamma }{{\mathcal {T}}}(F),G \rangle {}_{{\mathbb {D}}^\infty }={}_{{\mathbb {D}}^{-\infty }}\langle \mathcal{T}(F),H_{\gamma }(F,G) \rangle {}_{{\mathbb {D}}^\infty } \end{aligned}$$
(2.7)

for all \({{\mathcal {T}}} \in {{\mathcal {S}}}'({\mathbb {R}}^d)\) (e.g. see Chapter 4.4 and Theorem 7.3 of Malliavin and Thalmaier [26]), where \(H_{\gamma }(F,G)\) is given by \(H_{\gamma }(F,G)=H_{(\gamma _k)}(F,H_{(\gamma _1,\ldots ,\gamma _{k-1})}(F,G))\) with

$$\begin{aligned}&H_{(i)}(F,G)=\delta (\textstyle {\sum _{j=1}^d} (\sigma ^{F})^{-1}_{ij} DF^j G). \end{aligned}$$
(2.8)

3 Main result

Let \(a\in {\mathbb {R}}\), \(b\in (a,\infty )\) and \(T>0\). For \(d \in {\mathbb {N}}\), consider the solution to the following stochastic differential equation (SDE) driven by a d-dimensional Brownian motion \(B^d=(B^{d,1},\ldots ,B^{d,d})\) on the d-dimensional Wiener space \((\Omega ^d,H^d,\mu ^d)\):

$$\begin{aligned} dX_t^{d,\lambda ,x}= \mu ^{\lambda }_{d}(X_t^{d,\lambda ,x})dt+ \sigma ^{\lambda }_{d}(X_t^{d,\lambda ,x})dB_t^{d}, \quad X_0^{d,\lambda ,x}=x \in {\mathbb {R}}^d, \end{aligned}$$
(3.1)

where \(\mu ^{\lambda }_{d}: {\mathbb {R}}^d \rightarrow {\mathbb {R}}^d\) and \(\sigma ^{\lambda }_{d}: {\mathbb {R}}^d \rightarrow {\mathbb {R}}^{d \times d}\) are Lipschitz continuous functions depending on a parameter \(\lambda \in (0,1]\). The solution \(X_t^{d,\lambda ,x}=(X_t^{d,\lambda ,x,1},\ldots ,X_t^{d,\lambda ,x,d})\) is equivalently written in the integral form as:

$$\begin{aligned} X_t^{d,\lambda ,x,j}=x_j + \int _0^t \mu ^{\lambda ,j}_{d}(X_s^{d,\lambda ,x})ds+ \sum _{i=1}^d \int _0^t \sigma ^{\lambda ,j}_{d,i}(X_s^{d,\lambda ,x})dB_s^{d,i}, \quad X_0^{d,\lambda ,x,j}=x_j \in {\mathbb {R}},\nonumber \\ \end{aligned}$$
(3.2)

for \(j=1,\ldots ,d\). Furthermore, for a given appropriate continuous function \(f_d: {\mathbb {R}}^d \rightarrow {\mathbb {R}}\) and for \(\lambda \in (0,1]\), we consider \(u_\lambda ^d \in C([0,T] \times {\mathbb {R}}^d,{\mathbb {R}})\) given by

$$\begin{aligned} u_\lambda ^d(t,x)=E[f_d(X_t^{d,\lambda ,x})] \end{aligned}$$
(3.3)

for \(t \in [0,T]\) and \(x \in {\mathbb {R}}^d \), which is a solution of Kolmogorov PDE:

$$\begin{aligned} \partial _t u_\lambda ^d(t,x)={{\mathcal {L}}}^{d,\lambda } u_\lambda ^d(t,x), \end{aligned}$$
(3.4)

for all \((t,x) \in (0,T) \times {\mathbb {R}}^d\) and \(u_\lambda ^d(0,\cdot )=f_{d}(\cdot )\), where \({{\mathcal {L}}}^{d,\lambda }\) is the following second order differential operator:

$$\begin{aligned} {{\mathcal {L}}}^{d,\lambda }=\sum _{j=1}^d \mu _{d}^{\lambda ,j}(\cdot )\frac{\partial }{\partial x_j}+\frac{1}{2}\sum _{i,j_1,j_2=1}^d \sigma _{d,i}^{\lambda ,j_1}(\cdot ) \sigma _{d,i}^{\lambda ,j_2}(\cdot ) \frac{\partial ^2}{\partial x_{j_1} \partial x_{j_2}}. \end{aligned}$$
(3.5)

Our purpose is to show a new spatial approximation scheme of \(u_\lambda ^d(t,\cdot )\) for \(t>0\) by using asymptotic expansion and deep neural network approximation. The main theorem (Theorem 1) is stated at the end of this section.

3.1 Asymptotic expansion

We first put the following assumptions on \(\{ \mu ^{\lambda }_{d} \}_{\lambda \in (0,1]}\), \(\{ \sigma ^{\lambda }_{d} \}_{\lambda \in (0,1]}\) and \(f_d\).

Assumption 1

(Assumptions for the family of SDEs and asymptotic expansion) Let \(C>0\). For \(d \in {\mathbb {N}}\), let \(\{ \mu ^{\lambda }_{d} \}_{\lambda \in (0,1]} \subset C_{Lip}({\mathbb {R}}^d,{\mathbb {R}}^{d})\) and \(\{ \sigma ^{\lambda }_{d} \}_{\lambda \in (0,1]} \subset C_{Lip}({\mathbb {R}}^d,{\mathbb {R}}^{d\times d})\) be families of functions, and \(f_d \in C_{Lip}({\mathbb {R}}^d,{\mathbb {R}})\) be a function satisfying

  1. 1.

    there are \(V_{d,0} \in C_b^\infty ({\mathbb {R}}^d,{\mathbb {R}}^d)\) and \(V_{d}=(V_{d,1},\ldots ,V_{d,d}) \in C_b^\infty ({\mathbb {R}}^d,{\mathbb {R}}^{d\times d})\) such that (i) \(\mu ^\lambda _{d}=\lambda V_{d,0}\) and \(\sigma ^\lambda _{d}=\lambda V_{d}\) for all \(\lambda \in (0,1]\), (ii) \(C_{Lip}[V_{d,0}] \vee C_{Lip}[V_{d}]=C\) and \(\Vert V_{d,0}(0) \Vert \vee \Vert V_{d}(0) \Vert \le C\), (iii) \(\Vert \partial ^{\alpha } V_{d,i} \Vert _{\infty } \le C\) for any multi-index \(\alpha \) and \(i=0,1,\ldots ,d\);

  2. 2.

    \(\textstyle {\sum _{i=1}^d} \sigma ^\lambda _{d,i}(x) \otimes \sigma ^\lambda _{d,i}(x) \ge \lambda ^2 I_{d}\) for all \(x \in {\mathbb {R}}^d\) and \(\lambda \in (0,1]\);

  3. 3.

    \(C_{Lip}[f_d]= C\) and \(\Vert f_d(0) \Vert \le C\).

Remark 1

Assumption 1 justify an asymptotic expansion under the uniformly elliptic condition for the solutions of the perturbed systems of PDEs. Assumption 1.3 is also useful for constructing deep neural network approximations for the family of PDE solutions.

From Assumption 1.2, we may write each SDE (3.1) for \(d \in {\mathbb {N}}\) as

$$\begin{aligned} dX_t^{d,\lambda ,x}= \lambda \sum _{i=0}^d V_{d,i}(X_t^{d,\lambda ,x})dB_t^{d,i}, \end{aligned}$$
(3.6)

with \(X_0^{d,\lambda ,x}=x \in {\mathbb {R}}^d\), where the notation \(dB_t^{d,0}=dt\) is used. We define

$$\begin{aligned} {\mathbb {B}}_t^{d,\alpha }=\int _{0<t_1<\cdots<t_k<t} dB_{t_1}^{d,\alpha _1}\cdots dB_{t_k}^{d,\alpha _k}, \ \ t\ge 0, \ \alpha \in \{0,1,\ldots ,d \}^k, \ k \in {\mathbb {N}}, \end{aligned}$$
(3.7)

and \(\textstyle {L_{d,0}{=}\sum _{j=1}^d V_{d,0}^{j}(\cdot )\frac{\partial }{\partial x_j}{+}\frac{1}{2}\sum _{i,j_1,j_2=1}^d V_{d,i}^{j_1}(\cdot ) V_{d,i}^{j_2}(\cdot ) \frac{\partial ^2}{\partial x_{j_1}\partial x_{j_2}}}\), \(\textstyle {L_{d,i}{=}\sum _{j=1}^d V_{d,i}^{j}(\cdot )\frac{\partial }{\partial x_j}}\), \(i=1,\ldots ,d\). We define

$$\begin{aligned} \bar{X}_t^{d,\lambda ,x}=x+\lambda \sum _{i=0}^d V_{d,i}(x)B_t^{d,i}. \end{aligned}$$
(3.8)

Proposition 1

(Asymptotic expansion and the error bound) For \(m \in {\mathbb {N}} \cup \{ 0 \} \), there exists \(c >0\) such that for all \(d\in {\mathbb {N}}\), \(t>0\), \(\lambda \in (0,1]\),

$$\begin{aligned}&\sup _{x \in [a,b]^d}\Big |E [f_d(X_t^{d,\lambda ,x})]-\Big \{ E\Big [f_{d}(\bar{X}_t^{d,\lambda ,x})] \nonumber \\&\qquad + \sum _{j=1}^m \lambda ^j E\Big [f_{d}(\bar{X}_t^{d,\lambda ,x}) \sum _{\beta ^{(k)},\gamma ^{(k)}}^{(j)} H_{\gamma ^{(k)}} \Big (\sum _{i=0}^dV_{d,i}(x)B_t^{d,i},\prod _{\ell =1}^k \sum _{ | \alpha |=\beta _\ell } \hat{V}_{d,\alpha }^{\gamma _\ell }(x) {\mathbb {B}}_t^{d,\alpha } \Big ) \Big ]\Big \}\Big | \nonumber \\&\quad \le \ c d^c \lambda ^{m+1} t^{(m+1)/2}, \end{aligned}$$
(3.9)

where \(\hat{V}_{d,\alpha }^{e}(x)=L_{d,\alpha _1}\cdots L_{d,\alpha _{r-1}}V_{d,\alpha _r}^{e}(x)\), \(e\in \{1,\ldots ,d \}\), \(\alpha \in \{1,\ldots ,d \}^p\), and

$$\begin{aligned} \sum _{\beta ^{(k)},\gamma ^{(k)}}^{(j)}=\sum _{k=1}^j \sum _{\beta ^{(k)}=(\beta _1,\ldots ,\beta _k) \ s.t. \ \beta _1+\cdots +\beta _k=j+k,\beta _i\ge 2}\sum _{\gamma ^{(k)}=(\gamma _1,\ldots ,\gamma _k)\in \{1,\ldots ,d \}^k} \frac{1}{k!}, \quad j\ge 1. \end{aligned}$$
(3.10)

Proof

See Sect. 4. \(\square \)

The weights in the expansion terms in Proposition 1 can be represented by some polynomials of Brownian motion. We show it through distribution theory on Wiener space. Let \(d \in {\mathbb {N}}\), for \(t \in (0,T]\) and \(\alpha =(\alpha _1,\ldots ,\alpha _k)\in \{0,1,\ldots ,d \}^k\), \(k \in {\mathbb {N}} \cap [2,\infty )\), let

$$\begin{aligned} \textbf{B}_t^{d,\alpha } =\delta ^{\alpha _k}(\textbf{B}_t^{d,(\alpha _1,\ldots ,\alpha _{k-1})})=B_t^{d,\alpha _k}\textbf{B}_t^{d,(\alpha _1,\ldots ,\alpha _{k-1})}-\int _0^t D_{\alpha _k,s}\textbf{B}_t^{d,(\alpha _1,\ldots ,\alpha _{k-1})} ds, \end{aligned}$$
(3.11)

with \(\textbf{B}_t^{d,(\alpha _1)}=B_t^{d,\alpha _1}\), which can be obtained by (2.5). For example, we have \(\textbf{B}_t^{d,(\alpha _1,\alpha _2)}=B_t^{d,\alpha _1}B_t^{d,\alpha _2}-t \textbf{1}_{\alpha _1=\alpha _2\ne 0}\) for \(\alpha =(\alpha _1,\alpha _2) \in \{0,1,\ldots ,d \}^2\). Let \(\sigma _\ell \in {\mathbb {R}}^d\), \(\ell =0,1,\ldots ,d\) and \(\Sigma \) be a matrix given by \(\Sigma _{i,j}=\textstyle {\sum _{\ell =1}^d} \sigma _\ell ^i \sigma _\ell ^j\), \(1\le i,j \le d\) and satisfying \(\det \Sigma >0\). Let \({{\mathcal {T}}} \in {{\mathcal {S}}}'({\mathbb {R}}^d)\). We show an efficient computation of \(\textstyle {{}_{{\mathbb {D}}^{-\infty }} \langle \mathcal{T} (\sum _{i=0}^d \sigma _i B_t^{d,i} ), H_{\gamma } (\sum _{i=0}^d \sigma _i B_t^{d,i},{\mathbb {B}}_t^{d,\alpha } )\rangle {}_{{\mathbb {D}}^\infty }}\) in order to give a polynomial representation of the Malliavin weights in the expansion terms of the asymptotic expansion in Proposition 1. Note that we have

$$\begin{aligned}&{}_{{\mathbb {D}}^{-\infty }} \Bigg \langle {{\mathcal {T}}} \Bigg (\sum _{i=0}^d \sigma _i B_t^{d,i} \Bigg ), H_{\gamma } \Bigg (\sum _{i=0}^d \sigma _i B_t^{d,i},{\mathbb {B}}_t^{d,\alpha } \Bigg )\Bigg \rangle {}_{{\mathbb {D}}^\infty } ={}_{{\mathbb {D}}^{-\infty }} \Bigg \langle \partial ^\gamma {{\mathcal {T}}} \Bigg (\sum _{i=0}^d \sigma _i B_t^{d,i} \Bigg ), {\mathbb {B}}_t^{d,\alpha } \Bigg \rangle {}_{{\mathbb {D}}^\infty } \nonumber \\&\quad = {}_{{{\mathcal {S}}}'}\langle \partial ^\gamma {{\mathcal {T}}}(\sigma _0 B_t^{d,0}+\sigma \ \cdot ), E[{\mathbb {B}}_t^{d,\alpha }|B_t^d= \ \cdot \ ]p^{B_t^d}(\cdot ) \rangle _{{{\mathcal {S}}}}, \end{aligned}$$
(3.12)

by (2.7) and (2.6), where \(\sigma \) is the matrix \(\sigma =(\sigma _1,\ldots ,\sigma _d)\), and for \(y \in {\mathbb {R}}^d\), it holds that

$$\begin{aligned} E[{\mathbb {B}}_t^{d,\alpha }|B_t^d=y]p^{B_t^d}(y)={}_{{{\mathcal {S}}}'}\langle \delta _y, E[{\mathbb {B}}_t^{d,\alpha }|B_t^d= \ \cdot \ ]p^{B_t^d}(\cdot ) \rangle _{{{\mathcal {S}}}}={}_{{\mathbb {D}}^{-\infty }} \langle \delta _y (B_t^{d} ), {\mathbb {B}}_t^{d,\alpha } \rangle {}_{{\mathbb {D}}^{\infty }}, \end{aligned}$$

by (2.6). Also, one has

$$\begin{aligned} {}_{{\mathbb {D}}^{-\infty }} \langle \delta _y (B_t^{d} ) {\mathbb {B}}_t^{d,\alpha } \rangle {}_{{\mathbb {D}}^\infty }&= {}_{{\mathbb {D}}^{-\infty }} \langle \partial ^{\alpha ^\star } \delta _y(B_t^{d} ),1 \rangle {}_{{\mathbb {D}}^\infty } \frac{1}{k!}t^{k} \nonumber \\&= {}_{{\mathbb {D}}^{-\infty }} \langle \delta _y(B_t^{d}), H_{\alpha ^\star }(B_t^{d},1) \rangle {}_{{\mathbb {D}}^{\infty }}\frac{1}{k!}t^{k} ={}_{{\mathbb {D}}^{-\infty }} \langle \delta _y(B_t^{d} ), \frac{1}{k!} \textbf{B}_t^{d,\alpha } \rangle {}_{{\mathbb {D}}^\infty }, \end{aligned}$$
(3.13)

by (2.5), (2.7) and (2.8), where \(\alpha ^\star \) is a multi-index such that \(\alpha ^{\star }=(\alpha ^{\star }_1,\ldots ,\alpha ^{\star }_{\ell (\alpha )})=(\alpha _{j_1}, \ldots ,\alpha _{j_{\ell (\alpha )}})\) satisfying \(\ell (\alpha )=\# \{ i; \alpha _i\ne 0 \}\) and \(\alpha _{j_i} \ne 0\), \(i=1,\ldots ,\ell (\alpha )\). Then, we have

$$\begin{aligned}&{}_{{\mathbb {D}}^{-\infty }}\Bigg \langle {{\mathcal {T}}} \Bigg (\sum _{i=0}^d \sigma _i B_t^{d,i} \Bigg ), H_{\gamma } \Bigg (\sum _{i=0}^d \sigma _i B_t^{d,i},{\mathbb {B}}_t^{d,\alpha } \Bigg )\Bigg \rangle {}_{{\mathbb {D}}^\infty }\nonumber \\&\quad ={}_{{{\mathcal {S}}}'}\Bigg \langle \partial ^\gamma {{\mathcal {T}}}\Bigg (\sigma _0 B_t^{d,0}+\sigma \ \cdot \Bigg ),\frac{1}{k!} E[\textbf{B}_t^{d,\alpha } |B_t^d= \ \cdot \ ]p^{B_t^d}(\cdot ) \Bigg \rangle _{{{\mathcal {S}}}} \nonumber \\&\quad = {}_{{\mathbb {D}}^{-\infty }}\Bigg \langle \partial ^\gamma {{\mathcal {T}}} \Bigg (\sum _{i=0}^d \sigma _i B_t^{d,i} \Bigg ), \frac{1}{k!} \textbf{B}_t^{d,\alpha } \Bigg \rangle {}_{{\mathbb {D}}^{\infty }} = {}_{{\mathbb {D}}^{-\infty }}\Bigg \langle {{\mathcal {T}}} \Bigg (\sum _{i=0}^d \sigma _i B_t^{d,i} \Bigg ),\nonumber \\&\quad H_{\gamma } \Bigg (\sum _{i=0}^d \sigma _i B_t^{d,i}, \frac{1}{k!} \textbf{B}_t^{d,\alpha } \Bigg ) \Bigg \rangle {}_{{\mathbb {D}}^\infty } \nonumber \\&\quad {=} {}_{{\mathbb {D}}^{{-}\infty }} \Bigg \langle {{\mathcal {T}}} \Bigg (\sum _{i{=}0}^d \sigma _i B_t^{d,i} \Bigg ), \sum _{j_1,\ldots ,j_{|\gamma |},\beta _{1},\ldots ,\beta _{|\gamma |}=1}^d \frac{1}{t^{|\gamma |}} \prod _{q=1}^{|\gamma |} \Sigma _{\gamma _q,j_q}^{-1} \sigma _{\beta _{q}}^{j_q} \frac{1}{k!} \textbf{B}_t^{d,(\alpha _1,\ldots ,\alpha _{k},\beta _1,\ldots ,\beta _{|\gamma |})} \Bigg \rangle {}_{{\mathbb {D}}^\infty }, \end{aligned}$$
(3.14)

where, we iteratively used (2.5), (2.6), (2.7) and (2.8). An explicit polynomial representation of the asymptotic expansion is derived through (3.14). For instance, the first order expansion (\(m=1\)) as follows:

(First order asymptotic expansion with Malliavin weight)

$$\begin{aligned}&E\Big [f_{d}(\bar{X}_t^{d,\lambda ,x})\Big \{1 + \lambda \sum _{\ell =1}^d H_{(\ell )} \Big (\sum _{i=0}^dV_{d,i}(x)B_t^{d,i}, \sum _{\alpha _1,\alpha _2=0}^d L_{d,\alpha _1}{V}_{d,\alpha _2}^{\ell }(x) {\mathbb {B}}_t^{d,(\alpha _1,\alpha _2)} \Big ) \Big \} \Big ]\\&\quad = E\Big [f_{d}(\bar{X}_t^{d,\lambda ,x})\Big ] + \lambda \sum _{\ell =1}^d \int _{{\mathbb {R}}^d} f_d(x+\lambda y) \sum _{\alpha _1,\alpha _2=0}^d L_{d,\alpha _1}{V}_{d,\alpha _2}^{\ell }(x) \\&{}_{{\mathbb {D}}^{-\infty }} \Big \langle \delta _y\left( \sum _{i=0}^dV_{d,i}(x)B_t^{d,i}\right) , H_{(\ell )} \Big (\sum _{i=0}^dV_{d,i}(x)B_t^{d,i}, {\mathbb {B}}_t^{d,(\alpha _1,\alpha _2)} \Big ) \Big \rangle {}_{{\mathbb {D}}^{\infty }} dy\\&\quad = E\Big [f_{d}(\bar{X}_t^{d,\lambda ,x})\Big ] + \lambda \sum _{\ell =1}^d \int _{{\mathbb {R}}^d} f_d(x+\lambda y) \sum _{\alpha _1,\alpha _2=0}^d L_{d,\alpha _1}{V}_{d,\alpha _2}^{\ell }(x) \\&{}_{{\mathbb {D}}^{-\infty }} \Big \langle \delta _y\left( \sum _{i=0}^dV_{d,i}(x)B_t^{d,i}\right) , \sum _{\alpha _3=1}^d \sum _{j=1}^d \frac{1}{2t} [A_d^{-1}]_{\ell j}(x) V^j_{d,\alpha _3}(x) \textbf{B}_t^{d,(\alpha _1,\alpha _2,\alpha _3)} \Big \rangle {}_{{\mathbb {D}}^{\infty }} dy\\&\quad = E\left[ f_{d}(\bar{X}_t^{d,\lambda ,x}) \left\{ 1 + \lambda \sum _{\ell ,j=1}^d \sum _{\alpha _1,\alpha _2=0}^d \sum _{\alpha _3=1}^d L_{d,\alpha _1}{V}_{d,\alpha _2}^{\ell }(x) \frac{1}{2t} [A_d^{-1}]_{\ell j}(x) V^j_{d,\alpha _3}(x) \textbf{B}_t^{d,(\alpha _1,\alpha _2,\alpha _3)}\right\} \right] . \end{aligned}$$

Thus, the first order expansion is expressed with a Malliavin weight given by third order polynomials of Brownian motion. In general, we have the following representation.

Proposition 2

For \(m \in {\mathbb {N}}\), \(d \in {\mathbb {N}}\), \(\lambda \in (0,1]\), \(t \in (0,T]\) and \(x \in {\mathbb {R}}^d\), there exists a Malliavin weight \({{\mathcal {M}}}^m_{d,\lambda }(t,x,B_t^d)\) such that

$$\begin{aligned}&E[f_{d}(\bar{X}_t^{d,\lambda ,x}) {{\mathcal {M}}}^m_{d,\lambda }(t,x,B_t^d) ] \nonumber \\&\quad = E\Big [f_{d}(\bar{X}_t^{d,\lambda ,x})\Big \{1 + \sum _{j=1}^m \lambda ^j \sum _{\beta ^{(k)},\gamma ^{(k)}}^{(j)} H_{\gamma ^{(k)}} \Big (\sum _{i=0}^dV_{d,i}(x)B_t^{d,i},\prod _{\ell =1}^k \sum _{ | \alpha |=\beta _\ell } \hat{V}_{d,\alpha }^{\gamma _\ell }(x) {\mathbb {B}}_t^{d,\alpha } \Big ) \Big \} \Big ], \end{aligned}$$
(3.15)

and

$$\begin{aligned} {{\mathcal {M}}}^m_{d,\lambda }(t,x,B_t^d)=\textstyle {1+\sum _{e\le n(m)}} \lambda ^{p(e)} g_e(t) h_{e}(x)\textrm{Poly}_e({B}_t^{d}) \end{aligned}$$
(3.16)

for some integers \(n(m)\in {\mathbb {N}}\) and \(p(e) \in {\mathbb {N}}\), \(e=1,\ldots ,n(m)\), polynomials \(\textrm{Poly}_e:{\mathbb {R}}^d \rightarrow {\mathbb {R}}\), \(e=1,\ldots ,n(m)\), continuous functions \(g_e: (0,T] \rightarrow {\mathbb {R}}\), \(e=1,\ldots ,n(m)\), and continuous functions \(h_{e}:{\mathbb {R}}^d \rightarrow {\mathbb {R}}\), \(e=1,\ldots ,n(m)\) constructed by some products of \(A^{-1}_{d}\), \(\{V_{d,i}\}_{0\le i \le d}\) and \(\{ \partial ^\alpha V_{d,i}\}_{0\le i \le d,\alpha \in \{1,\ldots ,d \}^{\ell },\ell \le 2m}\) given in Assumption 1 of the form:

$$\begin{aligned} x \mapsto h_{e}(x)=c_e \textstyle {\prod \limits _{\ell =1}^{q_e}} L_{d,\alpha ^e_{\ell ,1}} \cdots L_{d,\alpha ^e_{\ell ,p^e_\ell -1}} {V}_{d,\alpha ^e_{\ell ,p^e_\ell }}^{\gamma ^e_{\ell }}(x) \textstyle {\sum \limits _{\xi ,\iota =1}^d} [A^{-1}_{d}]_{\gamma ^e_{\ell },\xi }(x)V_{d,\iota }^{\xi }(x) \end{aligned}$$
(3.17)

with some constants \(c_e \in (0,\infty )\), \(q_e \in {\mathbb {N}}\) and some multi-indices \((\gamma ^e_{1},\ldots ,\gamma ^e_{\ell }) \in \{1,\ldots ,d \}^{\ell }\) and \((\alpha ^e_{\ell ,1},\ldots ,\alpha ^e_{\ell ,p^e_\ell }) \in \{0,1,\ldots ,d \}^{p^e_\ell }\) with \(p^e_\ell \in {\mathbb {N}}\), \(\ell =1,\ldots ,e\), which satisfies that for \(p\ge 1\),

$$\begin{aligned} \sup _{(t,x)\in (0,T] \times [a,b]^d, \lambda \in (0,1]}\Vert \mathcal{M}^{m}_{d,\lambda }(t,x,B_t^d) \Vert _p \le cd^c \ \ \end{aligned}$$
(3.18)

for some constant \(c>0\) independent of d.

Proof

See Sect. 4. \(\square \)

Remark 2

(Remark on computation of Malliavin weights) Malliavin weight is initially used in Fournie et al. [7] in sensitivity analysis in financial mathematics, especially in Monte-Carlo computation of “Greeks". Then a discretization scheme for probabilistic automatic differentiation using Malliavin weights is analyzed in Gobet and Munos [10]. The computation of asymptotic expansion with Malliavin weights is developed in Takahashi and Yamada [35, 37], and is further extended to weak approximation of SDEs in Takahashi and Yamada [38]. Note that a PDE expansion is shown in Takahashi and Yamada [36] to partially connect it with the stochastic calculus approach. The computation method of the expansion with Malliavin weights is improved in Yamada [41], Yamada and Yamamoto [42], Naito and Yamada [27, 28], Iguchi and Yamada [17, 18], and Takahashi et al. [34] where technique of stochastic calculus is refined. The main advantages of the stochastic calculus approach are that (i) it provides efficient computation scheme using Watanabe distributions on Wiener space as in (3.13) and (3.14), (ii) it enables us to give precise bounds for approximations of expectations or the corresponding solutions of PDEs. Actually, the computational effort of the expansions is much reduced in the sense that Itô’s iterated integrals are transformed into simple polynomials of Brownian motion, and also the desired deep neural network approximation will be obtained in the next subsection through the approach.

3.2 Deep neural network approximation

In order to construct a deep neural network approximation for the function with respect to the space variable of the asymptotic expansion, i.e. \(x \mapsto E[f_{d}(\bar{X}_t^{d,\lambda ,x}) \mathcal{M}^m_{d,\lambda }(t,x,B_t^d) ]\), we consider the further assumptions.

Assumption 2

(Assumptions for deep neural network approximation) Suppose that Assumption 1 holds. There exist a constant \(\kappa >0\) and sets of networks \(\{ \psi _{\varepsilon ,d}^{V_{d,i}} \}_{\varepsilon \in (0,1),d \in {\mathbb {N}},i \in \{0,1,\ldots ,d\}} \subset {{\mathcal {N}}}\), \(\{ \psi _{\varepsilon ,d}^{\partial ^\alpha V_{d,i}} \}_{\varepsilon \in (0,1),d \in {\mathbb {N}},i \in \{0,1,\ldots ,d\},\alpha \in \{1,\ldots ,d\}^{{\mathbb {N}}}} \subset {{\mathcal {N}}}\), \(\{ \psi _{\varepsilon }^{A_d^{-1}} \}_{\varepsilon \in (0,1),d \in {\mathbb {N}}} \subset {{\mathcal {N}}}\) and \(\{ \psi _{\varepsilon }^{f_d} \}_{\varepsilon \in (0,1),}{d \in {\mathbb {N}}} \subset {{\mathcal {N}}}\) such that

  1. 1.

    for all \(\varepsilon \in (0,1)\), \(d \in {\mathbb {N}}\), \({{\mathcal {C}}}(\psi _{\varepsilon ,d}^{V_{d,i}}) \le \kappa d^\kappa \varepsilon ^{-\kappa }\), \(i=0,1,\ldots ,d\), \({{\mathcal {C}}}(\psi _{\varepsilon ,d}^{\partial ^\alpha {V}_{d,i}}) \le \kappa d^\kappa \varepsilon ^{-\kappa }\), \(i=0,1,\ldots ,d\), \(\alpha \in \{1,\ldots ,d \}^\ell \), \(\ell \in {\mathbb {N}}\), \({{\mathcal {C}}}(\psi _{\varepsilon }^{A_d^{-1}}) \le \kappa d^\kappa \varepsilon ^{-\kappa }\), and \({{\mathcal {C}}}(\psi _{\varepsilon }^{f_d}) \le \kappa d^\kappa \varepsilon ^{-\kappa }\);

  2. 2.

    for all \(\varepsilon \in (0,1)\), \(d \in {\mathbb {N}}\), \(x \in {\mathbb {R}}^d\), \(\Vert V_{d,i}(x)-V_{d,i}^{\varepsilon }(x)\Vert \le \varepsilon \kappa d^\kappa \), \(i=0,1,\ldots ,d\), and \(\Vert \partial ^\alpha V_{d,i}(x)-V_{d,i,\alpha }^{\varepsilon }(x)\Vert \le \varepsilon \kappa d^\kappa \), \(i=0,1,\ldots ,d\), \(\alpha \in \{1,\ldots ,d \}^\ell \), \(\ell \in {\mathbb {N}}\), where \(V_{d,i}^{\varepsilon }={{\mathcal {R}}}(\psi _{\varepsilon }^{V_{d,i}}) \in C({\mathbb {R}}^d,{\mathbb {R}}^{d})\) and \(V_{d,i,\alpha }^{\varepsilon }={{\mathcal {R}}}(\psi _{\varepsilon }^{\partial ^\alpha V_{d,i}}) \in C({\mathbb {R}}^d,{\mathbb {R}}^{d})\);

  3. 3.

    for all \(\varepsilon \in (0,1)\), \(d \in {\mathbb {N}}\), \(x \in {\mathbb {R}}^d\), \(\Vert A_d^{-1}(x)-A_{d,\varepsilon }^{-1}(x)\Vert \le \varepsilon \kappa d^\kappa \), where \(A_d^{-1}(\cdot )\) is the inverse matrix of \(A_d(\cdot ):=\textstyle {\sum _{i=1}^d} V_{d,i}(\cdot ) \otimes V_{d,i}(\cdot )\) and \(A_{d,\varepsilon }^{-1}={{\mathcal {R}}}(\psi _{\varepsilon }^{A_{d}^{-1}}) \in C({\mathbb {R}}^d,{\mathbb {R}}^{d \times d})\), and for all \(\varepsilon \in (0,1)\), \(d \in {\mathbb {N}}\), \(\textstyle {\sup _{x\in [a,b]^d}}\Vert A_{d,\varepsilon }^{-1}(x)\Vert \le \kappa d^\kappa \);

  4. 4.

    for all \(\varepsilon \in (0,1)\), \(d \in {\mathbb {N}}\), \(x \in {\mathbb {R}}^d\), \(|f_d(x)-f_d^{\varepsilon }(x)|\le \varepsilon \kappa d^\kappa \), where \(f_d^{\varepsilon }={{\mathcal {R}}}(\psi _{\varepsilon }^{f_d}) \in C({\mathbb {R}}^d,{\mathbb {R}})\).

Remark 3

Assumption 2 provides the deep neural network approximation of the asymptotic expansion with an appropriate complexity. Note that Assumption 1.1, 1.3, 2.2 and 2.4 give that there exists \(\eta >0\) such that \(\textstyle {|f_d^{\varepsilon }(x)| \le \eta d^\eta (1+\Vert x \Vert )}\) for all \(\varepsilon \in (0,1)\), \(d \in {\mathbb {N}}\), and \(\textstyle {\sup _{x\in [a,b]^d}}\Vert V_{d,i}^{\varepsilon }(x)\Vert \le \eta d^\eta \) for all \(i=0,1,\ldots ,d\), \(\textstyle {\sup _{x\in [a,b]^d}}\Vert V_{d,i,\alpha }^{\varepsilon }(x)\Vert \le \eta d^\eta \) for all \(i=0,1,\ldots ,d\), \(\alpha \in \{1,\ldots ,d \}^\ell \) with \(\ell \in {\mathbb {N}}\). In the following, Assumption 2.2, 2.3 and 2.4 plays an important role for the analysis of “product of neural networks" in the construction of the approximation with asymptotic expansion.

Remark 4

In particular, Assumption 2.3 is satisfied for the cases \(A_d(x)=I_d\) and \(A_d(x)=s(d)I_d\) with a function \(s:{\mathbb {N}} \rightarrow {\mathbb {R}}\). For instance, the case \(A_d(x)=I_d\) corresponds to the d-dimensional heat equation when \(V_{d,0}\equiv 0\). Also, the SDEs with the diffusion matrix \(V_d=(1/\sqrt{d})I_d\) discussed in Section 5.1 and Section 5.2 of [9] and Section 5.2 of [13] are examples of (3.1) (or (3.6)). For those cases, the neural network approximations in Assumption 2 are not necessary, since \(V_{d,i}\), \(i=1,\ldots ,d\) and hence \(A_d\) do not depend on the state variable x, whence \(\textstyle {V_{d,i,\varepsilon }}\) and \(\textstyle {A^{-1}_{d,\varepsilon }}\) are \(V_{d,i}\) and \(A^{-1}_{d}\) themselves. Furthermore, in such cases (e.g. the high-dimensional heat equations) the asymptotic expansion will be simply obtained (usually as the Gaussian approximation), which are exactly reduced to the methods in Beck et al. [2] and Gonon et al. [11].

The main result of the paper is summarized as follows.

Theorem 1

(Deep learning-based asymptotic expansion overcomes the curse of dimensionality) Suppose that Assumptions 1 and 2 hold. Let \(m \in {\mathbb {N}}\). For \(d \in {\mathbb {N}}\), consider the SDE (3.1) on the d-dimensional Wiener space and let \(u_\lambda ^d \in C ([0,T] \times {\mathbb {R}}^d, {\mathbb {R}})\) given by (3.3) be a solution to the Kolmogorov PDE (3.4). Then we have

$$\begin{aligned} \sup _{x \in [a,b]^d}|u_{\lambda }^d(t,x)-E[f_{d}(\bar{X}_t^{d,\lambda ,x})\mathcal{M}^m_{d,\lambda }(t,x,B_t^d)]|=O(\lambda ^{m+1} t^{(m+1)/2}). \end{aligned}$$
(3.19)

Furthermore, for \(t \in (0,T]\) and \(\lambda \in (0,1]\), there exist \(\{ \phi ^{\varepsilon ,d} \}_{\varepsilon \in (0,1),d \in {\mathbb {N}}} \subset {{\mathcal {N}}}\) and \(c>0\) which depend only on \(a,b,C,m,\kappa ,t\) and \(\lambda \), such that for all \(\varepsilon \in (0,1)\) and \(d\in {\mathbb {N}}\), we have \({{\mathcal {R}}}(\phi ^{\varepsilon ,d}) \in C({\mathbb {R}}^d,{\mathbb {R}})\), \({{\mathcal {C}}}(\phi ^{\varepsilon ,d})\le c \varepsilon ^{-c}d^c\) and

$$\begin{aligned} \sup _{x \in [a,b]^d}|E[f_{d}(\bar{X}_t^{d,\lambda ,x})\mathcal{M}^m_{d,\lambda }(t,x,B_t^d)]-{{\mathcal {R}}}(\phi ^{\varepsilon ,d})(x)|\le \varepsilon . \end{aligned}$$
(3.20)

Proof

See Sect. 4. \(\square \)

We provide the weight \({{\mathcal {M}}}^m_{d,\lambda }(t,x,B_t^d)\) with \(m=0,1\) in Theorem 1 for our scheme (the expression for general m will be given in Sect. 4 below). That is, for \(d \in {\mathbb {N}}\), \(\lambda \in (0,1]\), \(t>0\) and \(x \in {\mathbb {R}}^d\),

$$\begin{aligned} {{\mathcal {M}}}^0_{d,\lambda }(t,x,B_t^d)&=1, \end{aligned}$$
(3.21)
$$\begin{aligned} {{\mathcal {M}}}^1_{d,\lambda }(t,x,B_t^d)&=1+\lambda \sum _{\alpha _1,\alpha _2=0}^d \sum _{\alpha _3=1}^d \sum _{\ell ,j=1}^d \frac{1}{2t} L_{d,\alpha _1}V_{d,\alpha _2}^{\ell }(x) [A_d^{-1}]_{\ell j}(x) V^j_{d,\alpha _3}(x) \nonumber \\&\quad \{ B_t^{d,\alpha _1}B_t^{d,\alpha _2}B_t^{d,\alpha _3}-t B_t^{d,\alpha _1} \textbf{1}_{\alpha _2=\alpha _3\ne 0}-t B_t^{d,\alpha _2} \textbf{1}_{\alpha _1=\alpha _3\ne 0}-t\nonumber \\&\quad \times B_t^{d,\alpha _3} \textbf{1}_{\alpha _1=\alpha _2\ne 0} \}, \end{aligned}$$
(3.22)

where

$$\begin{aligned} L_{d,0}&=\sum _{j=1}^d V_{d,0}^{j}(\cdot )\frac{\partial }{\partial x_j}+\frac{1}{2}\sum _{i,j_1,j_2=1}^d V_{d,i}^{j_1}(\cdot ) V_{d,i}^{j_2}(\cdot ) \frac{\partial ^2}{\partial x_{j_1}\partial x_{j_2}}, \end{aligned}$$
(3.23)
$$\begin{aligned} L_{d,i}&=\sum _{j=1}^d V_{d,i}^{j}(\cdot )\frac{\partial }{\partial x_j}, \ \ i=1,\ldots ,d. \end{aligned}$$
(3.24)

Hence, the weight for \(m=0\), i.e. \(\mathcal{M}^0_{d,\lambda }(t,x,B_t^d)=1\) provides a simple (but coarse) Gaussian approximation, and the Malliavin weight for \(m=1\) will be worked as the correction term for the Gaussian approximation. The derivation is provided in the next section.

4 Proofs of Propositions 1, 2 and Theorem 1

We give the proofs of Propositions 1, 2 and Theorem 1. Before providing full proofs, we show their brief outlines below.

  • Proposition 1 (Asymptotic expansion)

    • take a family of uniformly non-degenerate functionals \(F_t^{d,\lambda ,x}=(X_t^{d,\lambda ,x}-x)/\lambda \), \(\lambda \in (0,1]\), as the family \(X_t^{d,\lambda ,x}\), \(\lambda \in (0,1]\) itself degenerates when \(\lambda \downarrow 0\), and consider the expansion \(F_t^{d,\lambda ,x}=F_t^{d,0,x}+\cdots \) in \({\mathbb {D}}^\infty \).

    • expand \(\delta _y(F_t^{d,\lambda ,x}) \sim \delta _y(F_t^{d,0,x})+\cdots \) in \({\mathbb {D}}^{-\infty }\) and take expectation to obtain the expansion of the density \(p^{F_t^{d,\lambda ,x}}(y)=E[\delta _y(F_t^{d,\lambda ,x})] \sim E[\delta _y(F_t^{d,0,x})]+\cdots \) in \({\mathbb {R}}\).

    • derive precise expression of the right-hand side of \(E[f_d(X_t^{d,\lambda ,x})]=c_0^{d,\lambda ,t}+ c_1^{d,\lambda ,t}+\cdots +c_m^{d,\lambda ,t} +\textrm{Residual}^{d,\lambda ,t}_m\) by using Malliavin’s integration by parts.

    • give a precise estimate for \(\textrm{Residual}^{d,\lambda ,t}_m(x)\) (w.r.t \(\lambda \), t and the dimension d) uniformly in x by using the key inequality on Malliavin weight (Lemma 5 in Appendix A) which yields a sharp upper bound of \(\textrm{Residual}^{d,\lambda ,t}_m(x)\).

  • Proposition 2 (Representation and property of Malliavin weight)

    • use the formula (3.14) to prove that \(c_0^{d,\lambda ,t}+ c_1^{d,\lambda ,t}+\cdots +c_m^{d,\lambda ,t}\) above can be represented by an expectation \(E[f_{d}(\bar{X}_t^{d,\lambda ,x}){{\mathcal {M}}}^m_{d,\lambda }(t,x,B_t^d)]\) with a Malliavin weight \({{\mathcal {M}}}^m_{d,\lambda }(t,x,B_t^d)\) constructed by polynomials of Brownian motion.

    • check that the moment of the Malliavin weight \({{\mathcal {M}}}^m_{d,\lambda }(t,x,B_t^d)\) grows polynomially in d from the representation.

  • Theorem 1 (Deep learning-based asymptotic expansion overcomes the curse of dimensionality)

    • (0) for \(d \in {\mathbb {N}}\), first check the expansion \(E[f_{d}(\bar{X}_t^{d,\lambda ,x}){{\mathcal {M}}}^m_{d,\lambda }(t,x,B_t^d)]\) obtained in Proposition 1 and 2 gives an approximation for \(u_d^\lambda (t,x)\) on the cube \([a,b]^d\) with a sharp asymptotic error bound.

    • (1) for an error precision \(\varepsilon \), construct an approximation \(E[f_{d}(\bar{X}_t^{d,\lambda ,x}){{\mathcal {M}}}^m_{d,\lambda }(t,x,B_t^d)] \approx E[f^{\delta }_{d}(\bar{X}_t^{d,\lambda ,x,\delta }){{\mathcal {M}}}^m_{d,\lambda ,\delta }(t,x,B_t^d)]\) on the cube \([a,b]^d\) by using stochastic calculus, where \(f^{\delta }_{d}\), \(\bar{X}_t^{d,\lambda ,x,\delta }\) and \({{\mathcal {M}}}^m_{d,\lambda ,\delta }(t,x,B_t^d)\) are given by replacing \(\{V_{d,i}\}_i\), \(A_d^{-1}\), \(\{V_{d,i,\alpha }\}_{i,\alpha }\) with their neural network approximations \(\{V^\delta _{d,i}\}_i\), \(A_{d,\delta }^{-1}\), \(\{V_{d,i,\alpha ,\delta }\}_{i,\alpha }\) with \(\delta =(\varepsilon ^c d^{-c})\) for some \(c>0\) independent of \(\varepsilon \) and d.

    • (2) for an error precision \(\varepsilon \), construct a realization of the Monte-Carlo approximation \(E[f^{\delta }_{d}(\bar{X}_t^{d,\lambda ,x,\delta }){{\mathcal {M}}}^m_{d,\lambda ,\delta }(t,x,B_t^d)] \approx \textstyle {\frac{1}{M} \sum _{\ell =1}^M f^{\delta }_{d}(\bar{X}_t^{d,\lambda ,x,\delta ,(\ell )}(\omega _{\varepsilon ,d})){{\mathcal {M}}}^{m,\delta }_{d,\lambda }(t,x,B_t^{d,(\ell )}(\omega _{\varepsilon ,d}))}\) on the cube \([a,b]^d\) with a choice \(M=O(\varepsilon ^{-c} d^{c})\) for some \(c>0\) independent of \(\varepsilon \) and d, by using stochastic calculus.

    • (3) for an error precision \(\varepsilon \), construct a realization of the deep neural network approximation \(\textstyle {\frac{1}{M} \sum _{\ell =1}^M f^{\delta }_{d}(\bar{X}_t^{d,\lambda ,x,\delta ,(\ell )}(\omega _{\varepsilon ,d})){{\mathcal {M}}}^{m,\delta }_{d,\lambda }(t,x,B_t^{d,(\ell )}(\omega _{\varepsilon ,d}))} \approx {{\mathcal {R}}}(\phi _{\varepsilon ,d})(x)\) on the cube \([a,b]^d\) whose complexity is bounded by \({{\mathcal {C}}}(\phi _{\varepsilon ,d})\le c \varepsilon ^{-c}d^c\) for some \(c>0\) independent of \(\varepsilon \) and d, where ReLU calculus (Lemma 9, 10, 12 in Appendix B) is essentially used.

    • apply (0), (1), (2) and (3) to obtain the main result.

In the proof, we frequently use an elementary result: \(\textstyle {\sup _{x \in [a,b]^d}} \Vert x \Vert \le d^{1/2} \max \{ |a|,|b| \}\), which is obtained in the proof of Corollary 4.2 of [11].

4.1 Proof of Proposition 1

For \(x\in {\mathbb {R}}^d\), \(t \in (0,T]\) and \(\lambda \in (0,1]\), let \(F_t^{d,\lambda ,x}=(F_t^{d,\lambda ,x,1},\ldots ,F_t^{d,\lambda ,x,d}) \in ({\mathbb {D}}^{\infty }(\Omega ^d))^d\) be given by \(F_t^{d,\lambda ,x,j}=(X_t^{d,\lambda ,x,j}-x_j)/\lambda \), \(j=1,\ldots ,d\). We note that \(\{ F_t^{d,\lambda ,x} \}_{\lambda }\) is a family of uniformly non-degenerate Wiener functionals (see Theorem 3.4 of [40]). Then, for \({{\mathcal {T}}} \in \mathcal{S}'({\mathbb {R}}^d)\), the composition \({{\mathcal {T}}}(F_t^{d,\lambda ,x})\) is well-defined as an element of \({\mathbb {D}}^{-\infty }(\Omega ^d)\), and the density of \(F_t^{d,\lambda ,x}\), namely \(p^{F_t^{d,\lambda ,x}} \in {{\mathcal {S}}}({\mathbb {R}}^d)\) has the representation \(p^{F_t^{d,\lambda ,x}}(y)={}_{{\mathbb {D}}^{-\infty }} \langle \delta _y(F_t^{d,\lambda ,x}), 1 \rangle {}_{{\mathbb {D}}^{-\infty }}\) for \(y \in {\mathbb {R}}^d\). Then, for \(x\in {\mathbb {R}}^d\), \(t>0\) and \(\lambda \in (0,1]\), it holds that

$$\begin{aligned} E[f_d(X_t^{d,\lambda ,x})]=\int _{{\mathbb {R}}^d} f_d(x+\lambda y) {}_{{\mathbb {D}}^{-\infty }} \langle \delta _y(F_t^{d,\lambda ,x}), 1 \rangle {}_{{\mathbb {D}}^{-\infty }} dy. \end{aligned}$$
(4.1)

For \(x\in {\mathbb {R}}^d\), \(t \in (0,T]\), let \(F_t^{d,0,x}=\textstyle {\sum _{i=0}^d}V_{d,i}(x)B_t^{d,i}\). Thus, for \(S \in {{\mathcal {S}}}'({\mathbb {R}}^d)\), the composition \(S(F_t^{d,\lambda ,x})\) is well-defined as an element of \({\mathbb {D}}^{-\infty }(\Omega ^d)\) and has an expansion:

$$\begin{aligned} {}_{{\mathbb {D}}^{-\infty }} \langle \delta _y(F_t^{d,\lambda ,x}), 1 \rangle {}_{{\mathbb {D}}^{\infty }}&={}_{{\mathbb {D}}^{-\infty }} \langle \delta _y(F_t^{d,0,x}), 1 \rangle {}_{{\mathbb {D}}^{\infty }} \nonumber \\&\quad +\sum _{j=1}^m \frac{\lambda ^j}{j!} \frac{\partial ^{j}}{\partial \lambda ^{j}} {}_{{\mathbb {D}}^{-\infty }} \langle \delta _y(F_t^{d,\lambda ,x}), 1 \rangle {}_{{\mathbb {D}}^{\infty }} |_{\lambda =0} +\lambda ^{m+1} {{\mathcal {E}}}_{m,t}^{d,\lambda ,x,y}, \end{aligned}$$
(4.2)

for \(x\in {\mathbb {R}}^d\), \(t>0\) and \(\lambda \in (0,1]\), where

$$\begin{aligned} {{\mathcal {E}}}_{m,t}^{d,\lambda ,x,y}={\int _0^1 \frac{(1-u)^{m}}{m!} \frac{\partial ^{m+1}}{\partial \eta ^{m+1}} {}_{{\mathbb {D}}^{-\infty }} \langle \delta _y(F_t^{d,\eta ,x}),1\rangle {}_{{\mathbb {D}}^{\infty }} |_{\eta =\lambda u} du}. \end{aligned}$$
(4.3)

By the integration by parts (2.7) and Theorem 2.6 of [35] yield that

$$\begin{aligned}&\frac{1}{j!} \frac{\partial ^{j}}{\partial \lambda ^{j}} {}_{{\mathbb {D}}^{-\infty }} \langle \delta _y(F_t^{d,\lambda ,x}), 1 \rangle {}_{{\mathbb {D}}^{\infty }} |_{\lambda =0} \nonumber \\&\quad = \sum _{i^{(k)},\gamma ^{(k)}}^{j} {}_{{\mathbb {D}}^{-\infty }} \Bigg \langle \delta _y(F_t^{d,0,x}),H_{\gamma ^{(k)}} \Bigg (F_t^{d,0,x},\prod _{\ell =1}^k \frac{1}{i_\ell !} \frac{\partial ^{i_\ell }}{\partial \lambda ^{i_\ell }} F_t^{d,\lambda ,x,\gamma _\ell }|_{\lambda =0} \Bigg ) \Bigg \rangle {}_{{\mathbb {D}}^{\infty }}. \end{aligned}$$
(4.4)

where \(\textstyle {\sum _{i^{(k)},\gamma ^{(k)}}^{j}=\sum _{k=1}^j \sum _{i^{(k)}=(i_1,\ldots ,i_k) \ s.t. \ i_1+\cdots +i_k=j,i_e\ge 1}\sum _{\gamma ^{(k)}=(\gamma _1,\ldots ,\gamma _k)\in \{1,\cdots ,d \}^k}\frac{1}{k!}}\) With a calculation

$$\begin{aligned} {\frac{1}{i!}\frac{\partial ^{i}}{\partial \lambda ^{i}} F_t^{d,\lambda ,x,j}|_{\lambda =0}=\sum _{ | \alpha |=i+1} L_{d,\alpha _1}\cdots L_{d,\alpha _{r-1}}V_{d,\alpha _r}^{j}(x) {\mathbb {B}}_t^{d,\alpha }} \end{aligned}$$
(4.5)

for \(j=1,\ldots ,d\) and \(i\in {\mathbb {N}}\), it holds that

$$\begin{aligned}&{}_{{\mathbb {D}}^{-\infty }} \langle \delta _y(F_t^{d,\lambda ,x}), 1 \rangle {}_{{\mathbb {D}}^{\infty }} = {}_{{\mathbb {D}}^{-\infty }} \langle \delta _y(F_t^{d,0,x}), 1 \rangle {}_{{\mathbb {D}}^{\infty }} \nonumber \\&\quad {+}\sum _{j{=}1}^m \lambda ^j \sum _{i^{(k)},\gamma ^{(k)}}^{j} {}_{{\mathbb {D}}^{{-}\infty }} \langle \delta _y(F_t^{d,0,x}),H_{\gamma ^{(k)}} \nonumber \\&\times \quad \Bigg (F_t^{d,0,x},\prod _{\ell {=}1}^k \sum _{ | \alpha |{=}i_\ell } L_{d,\alpha _1}\cdots L_{d,\alpha _{r{-}1}}V_{d,\alpha _r}^{\gamma _\ell }(x) {\mathbb {B}}_t^{d,\alpha } \Bigg ) \rangle {}_{{\mathbb {D}}^{\infty }} \nonumber \\&\quad +\lambda ^{m+1} {{\mathcal {E}}}_{m,t}^{d,\lambda ,x,y}, \end{aligned}$$
(4.6)

Again by the integration by parts (2.7), \(\textstyle {\frac{\partial ^{m+1}}{\partial \eta ^{m+1}}} {}_{{\mathbb {D}}^{-\infty }} \langle \delta _y(F_t^{d,\lambda ,x}),1\rangle {}_{{\mathbb {D}}^{\infty }} |_{\eta =\lambda u}\) (with \(\lambda u \in (0,1]\)) in \(\mathcal{E}_{m,t}^{d,\lambda ,x,y}\) in (4.3) is given by a linear combination of the expectations of the form

$$\begin{aligned} {}_{{\mathbb {D}}^{-\infty }} \Bigg \langle \delta _y(F_t^{d,\lambda u,x}), \textstyle {H_{\gamma }\Bigg (F_t^{d,\lambda u,x}, \prod _{\ell =1}^{k}\frac{1}{\beta _\ell !}\partial _{\eta }^{\beta _\ell } F_t^{d,\eta ,x,\gamma _\ell }}|_{\eta =\lambda u}\Bigg )\Bigg \rangle {}_{{\mathbb {D}}^{\infty }} \end{aligned}$$

with \(k \le m+1\), \(\gamma \in \{1,\ldots ,d \}^k\) and \(\beta _1,\ldots ,\beta _k\ge 1\) such that \(\textstyle {\sum _{\ell =1}^k} \beta _\ell =m+1\). By the inequality of Lemma 5 with \(k=0\) in Appendix A, we have for all \(p\ge 1\) and multi-index \(\gamma \), there are \(c>0\), \(p_1,p_2,p_3>1\) and \(r \in {\mathbb {N}}\) satisfying

$$\begin{aligned} \Vert H_{\gamma }(F_t^{d,\lambda ,x}, G) \Vert _p \le cd^c \Vert \det (\sigma ^{F_t^{d,\lambda ,x}})^{-1} \Vert _{p_1}^{r} \Vert DF_t^{d,\lambda ,x} \Vert ^{2dr-|\gamma |}_{|\gamma |,p_2,H^d} \Vert G \Vert _{|\gamma |,p_3}, \end{aligned}$$
(4.7)

for all \(G \in {\mathbb {D}}^\infty \), \(t \in (0,T]\), \(\lambda \in (0,1]\) and \(x \in [a,b]^d\). In order to show the upper bound of the weight appearing in the residual term of the expansion, we list the following results:

Lemma 1

  1. 1.

    For all \(p>1\), there exists \(\kappa _1>0\) such that for all \(d\in {\mathbb {N}}\), \(t \in (0,T]\), \(x\in [a,b]^d\) and \(\lambda \in (0,1]\),

    $$\begin{aligned} \Vert \det (\sigma ^{F_t^{d,\lambda ,x}})^{-1} \Vert _p \le \kappa _1 d^{\kappa _1} t^{-d}. \end{aligned}$$
    (4.8)
  2. 2.

    For all \(p>1\), \(r\in {\mathbb {N}}\), there exists \(\kappa _2>0\) such that for all \(d\in {\mathbb {N}}\), \(t \in (0,T]\), \(x\in [a,b]^d\) and \(\lambda \in (0,1]\),

    $$\begin{aligned} \Vert DF_t^{d,\lambda ,x} \Vert _{r,p,H}\le \kappa _2 d^\kappa _2 t^{1/2}. \end{aligned}$$
    (4.9)
  3. 3.

    For all \(\ell \in {\mathbb {N}}\), \(p>1\) and \(r\in {\mathbb {N}}\), there exists \(\eta >0\) such that for all \(d\in {\mathbb {N}}\), \(t \in (0,T]\), \(x\in [a,b]^d\) and \(\lambda \in (0,1]\),

    $$\begin{aligned} \Vert \partial _{\lambda }^\ell F_t^{d,\lambda ,x} \Vert _{r,p} \le \eta d^\eta t^{(\ell +1)/2}. \end{aligned}$$
    (4.10)

Proof

For \(d\in {\mathbb {N}}\), let \(V_d: {\mathbb {R}}^d \rightarrow {\mathbb {R}}^{d \times d}\) be such that \(V_d=(V_{d,1},\ldots ,V_{d,d})\) and for \(\lambda \in (0,1]\), let \(V^{\lambda }_d: {\mathbb {R}}^d \rightarrow {\mathbb {R}}^{d \times d}\) be such that \(V^{\lambda }_d=(V^{\lambda }_{d,1},\ldots ,V^{\lambda }_{d,d})\). Moreover, for \(d\in {\mathbb {N}}\), we use the notation \(J_{0\rightarrow t}=\textstyle {\frac{\partial }{\partial x}X_t^{d,\lambda ,x}}=(\textstyle {\frac{\partial }{\partial x_i}X_t^{d,\lambda ,x,j})_{1\le i,j \le d}}\) for \(x\in {\mathbb {R}}^d\), \(t>0\) and \(\lambda \in (0,1]\).

  1. 1.

    Note that for \(d\in {\mathbb {N}}\), \(t \in (0,T]\), \(x \in {\mathbb {R}}^d\) and \(\lambda \in (0,1]\), we have

    $$\begin{aligned} \sigma ^{F_t^{d,\lambda ,x}}&= \int _0^t [D_{s} (X_t^{d,\lambda ,x}-x)/\lambda ] [D_{s} (X_t^{d,\lambda ,x}-x)/\lambda ]^{\top } ds \end{aligned}$$
    (4.11)
    $$\begin{aligned}&=\int _0^t J_{0 \rightarrow t} J_{0 \rightarrow s}^{-1} V_d (X_s^{d,\lambda ,x})V_d(X_s^{d,\lambda ,x})^{\top } {J_{0 \rightarrow s}^{-1}}^{\top } J_{0 \rightarrow t}^{\top } ds. \end{aligned}$$
    (4.12)

    Under the condition \(\sigma _{d}^{\lambda }(\cdot )\sigma _{d}^{\lambda }(\cdot )^{\top } \ge \lambda ^2 I_{d}\), (i.e. \(V_{d}(\cdot )V_{d}(\cdot )^{\top } \ge I_{d}\)) in Assumption 1.3, we have that there is \(c>0\) such that

    $$\begin{aligned} \sup _{x\in [a,b]^d} \Vert (\det \sigma ^{F_t^{d,\lambda ,x}})^{-1} \Vert _p \le cd^c t^{-d}, \end{aligned}$$
    (4.13)

    for all \(d\in {\mathbb {N}}\), \(t \in (0,T]\) and \(\lambda \in (0,1]\), by Theorem 3.5 of Kusuoka and Stroock [22].

  2. 2.

    We recall that for \(d \in {\mathbb {N}}\), \(\lambda \in (0,1]\) and \(0\le s<t\), \(D_{s} (X_t^{d,\lambda ,x}-x)/\lambda =J_{0 \rightarrow t} J_{0 \rightarrow s}^{-1} V(X_s^{d,\lambda ,x})\). Then, there is \(c>0\) such that

    $$\begin{aligned} \sup _{x\in [a,b]^d} \Vert DF_t^{d,\lambda ,x} \Vert _{k,p,H^d} \le c d^c t^{1/2}, \end{aligned}$$
    (4.14)

    for all \(d\in {\mathbb {N}}\), \(t \in (0,T]\) and \(\lambda \in (0,1]\), by Theorem 2.19 of Kusuoka and Stroock [22].

  3. 3.

    Note that

    $$\begin{aligned} \frac{1}{\ell !}\frac{\partial ^\ell }{\partial \lambda ^\ell }X_t^{d,\lambda ,x,r}&= \sum _{i^{(k)},\gamma ^{(k)}}^{\ell -1}\int _0^t \prod _{e=1}^k \frac{1}{i_e !}\frac{\partial ^{i_e}}{\partial \lambda ^{i_e} }X_t^{d,\lambda ,x,\gamma _e}\sum _{j=0}^d \partial ^{\gamma ^{(k)}} V_j^r(X_s^{d,\lambda ,x})dB_s^{d,j} \end{aligned}$$
    (4.15)
    $$\begin{aligned}&\quad +\lambda \sum _{i^{(k)},\gamma ^{(k)}}^{\ell }\int _0^t \prod _{e=1}^k \frac{1}{i_e !}\frac{\partial ^{i_e}}{\partial \lambda ^{i_e} }X_t^{d,\lambda ,x,\gamma _e}\sum _{j=0}^d \partial ^{\gamma ^{(k)}} V_j^r(X_s^{d,\lambda ,x})dB_s^{d,j}. \end{aligned}$$
    (4.16)

    Since the above is a linear SDE, it has the explicit form and we have

    $$\begin{aligned} \sup _{x \in [a,b]^d}\Big \Vert \frac{1}{\ell !}\frac{\partial ^\ell }{\partial \lambda ^\ell }X_t^{d,\lambda ,x} \Big \Vert _{k,p} \le c d^c t^{\ell /2}, \end{aligned}$$
    (4.17)

    for some \(c>0\) independent of t and d, due to the result:

    $$\begin{aligned} \sup _{x \in [a,b]^d} \Big \Vert&\sum _{i^{(k)},\gamma ^{(k)}}^{\ell -1}\int _0^t J_{0\rightarrow t}J_{0\rightarrow s}^{-1} \prod _{e=1}^k \frac{1}{i_e !}\frac{\partial ^{i_e}}{\partial \lambda ^{i_e} }X_t^{d,\lambda ,x,\gamma _e}\sum _{j=0}^d \partial ^{\gamma ^{(k)}} V_j(X_s^{d,\lambda ,x})dB_s^{d,j} \Big \Vert _{k,p}\nonumber \\&\le c d^c t^{\ell /2}, \end{aligned}$$
    (4.18)

    which is obtained by using Lemmas 6 and 7 in Appendix A. Then, the process

    $$\begin{aligned} \frac{1}{\ell !}\frac{\partial ^\ell }{\partial \lambda ^\ell }F_t^{d,\lambda ,x}&= \sum _{i^{(k)},\gamma ^{(k)}}^{\ell }\int _0^t \prod _{e=1}^k \frac{1}{i_e !}\frac{\partial ^{i_e}}{\partial \lambda ^{i_e} }X_t^{d,\lambda ,x,\gamma _e}\sum _{j=0}^d \partial ^{\gamma ^{(k)}} V_j(X_s^{d,\lambda ,x})dB_s^{d,j}, \ \ t\nonumber \\&\ge 0, x \in {\mathbb {R}}^d \end{aligned}$$
    (4.19)

    satisfies

    $$\begin{aligned} \sup _{x \in [a,b]^d} \Big \Vert \frac{1}{\ell !}\frac{\partial ^\ell }{\partial \lambda ^\ell }F_t^{d,\lambda ,x} \Big \Vert _{k,p} \le c d^c t^{(\ell +1)/2}, \end{aligned}$$
    (4.20)

    for some \(c>0\) independent of t and d.

\(\square \)

Using above, we have that for all \(k \le m+1\), \(\gamma \in \{1,\ldots ,d \}^k\) and \(\beta _1,\ldots ,\beta _k\ge 1\) such that \(\textstyle {\sum _{\ell =1}^k} \beta _\ell =m+1\), \(p>1\) and multi-index \(\gamma \), there exists \(\nu >0\) such that

$$\begin{aligned} \Vert H_{\gamma }(F_t^{d,\lambda ,x}, \textstyle {\prod _{\ell =1}^{k}\frac{1}{\beta _\ell !}\partial _{\lambda }^{\beta _\ell } F_t^{d,\lambda ,x,\gamma _\ell }}) \Vert _p \le \nu d^{\nu } t^{-k/2} t^{(\beta _1+\cdots +\beta _k+k)/2}=\nu d^{\nu } t^{(m+1)/2}, \end{aligned}$$
(4.21)

for all \(t \in (0,T]\), \(x\in [a,b]^d\) and \(\lambda \in (0,1]\). Let us define \(r_{m,t}^{d,\lambda ,x}\) for \(t \in (0,T]\), \(x\in [a,b]^d\) and \(\lambda \in (0,1]\) from (4.1) and (4.6) as

$$\begin{aligned} r_{m,t}^{d,\lambda ,x}&= E[f_d(X_t^{d,\lambda ,x})] \nonumber \\&\quad -E \Big [f_{d}(\bar{X}_t^{d,\lambda ,x}) \Big \{ 1+ \sum _{j=1}^m \lambda ^j \sum _{\beta ^{(k)},\gamma ^{(k)}}^{(j)} H_{\gamma ^{(k)}} \nonumber \\&\quad \times \Big (\sum _{i=0}^dV_{d,i}(x)B_t^{d,i},\prod _{\ell =1}^k \sum _{ | \alpha |=\beta _\ell } L_{d,\alpha _1}\cdots L_{d,\alpha _{r-1}}V_{d,\alpha _r}^{\gamma _\ell }(x) {\mathbb {B}}_t^{d,\alpha } \Big ) \Big \} \Big ]\nonumber \\&= \lambda ^{m+1}\int _0^1 \frac{(1-u)^{m}}{m!} E[f_d( \tilde{X}_t^{d,\lambda ,u,x} ) {{\mathcal {W}}}_{m+1,t}^{d,\lambda ,u,x} ] du, \end{aligned}$$
(4.22)

where \(\tilde{X}_t^{d,\lambda ,u,x}=x+\lambda F_t^{d,\lambda u,x}\), \(u \in [0,1]\) and

$$\begin{aligned} \mathcal{W}_{m+1,t}^{d,\lambda ,u,x}=\sum _{\beta ^{(k)},\gamma ^{(k)}}^{[m+1]} {H_{\gamma }\Bigg (F_t^{d,\lambda u,x}, \prod _{\ell =1}^{k}\frac{1}{\beta _\ell !}\partial _{\eta }^{\beta _\ell } F_t^{d,\eta ,x,\gamma _\ell }}|_{\eta =\lambda u}\Bigg ), \ \ u \in [0,1], \end{aligned}$$
(4.23)

with \(\textstyle {\sum _{\beta ^{(k)},\gamma ^{(k)}}^{[m+1]}:=(m+1)! \sum _{k=1}^j \sum _{\beta ^{(k)}=(\beta _1,\ldots ,\beta _k) s.t. \sum _{\ell =1}^k \beta _\ell =j,\beta _i\ge 1}\sum _{\gamma ^{(k)}=(\gamma _1,\ldots ,\gamma _k)\in \{1,\cdots ,d \}^k}}{\frac{1}{k!}}\).

Here, \(X_t^{d,\lambda ,u,x}\), \(u \in [0,1]\) and \(\mathcal{W}_{m+1,t}^{d,\lambda ,u,x}\), \(u \in [0,1]\) satisfy that for \(p \ge 1\), there exists \(\eta >0\) such that

$$\begin{aligned} \textstyle {\sup _{x \in [a,b]^d, u \in [0,1]}}\Vert X_t^{d,\lambda ,u,x} \Vert _p \le \eta d^\eta \ \hbox {and} \ \textstyle {\sup _{x \in [a,b]^d, u \in [0,1]}}\Vert \mathcal{W}_{m+1,t}^{d,\lambda ,u,x} \Vert _p \le \eta d^\eta t^{(m+1)/2} \end{aligned}$$

for all \(\lambda \in (0,1]\) and \(t>0\). Therefore, there exists \(c>0\) such that

$$\begin{aligned} \sup _{x\in [a,b]^d}|r_{m,t}^{d,\lambda ,x}| \le c d^c \lambda ^{m+1} t^{(m+1)/2}, \end{aligned}$$
(4.24)

for all \(\lambda \in (0,1]\) and \(t \in (0,T]\), and then the assertion of Proposition 1 holds.

4.2 Proof of Proposition 2

For \(d \in {\mathbb {N}}\) and for \(m \in {\mathbb {N}}\), first note that the following representation holds:

$$\begin{aligned}&E \Big [f_{d}(\bar{X}_t^{d,\lambda ,x}) H_{\gamma } \Big (\sum _{i=0}^dV_{d,i}(x)B_t^{d,i},\prod _{\ell =1}^k \sum _{ | \alpha |=\beta _\ell } L_{d,\alpha _1}\cdots L_{d,\alpha _{r-1}}V_{d,\alpha _r}^{\gamma _\ell }(x) {\mathbb {B}}_t^{d,\alpha } \Big ) \Big ] \end{aligned}$$
(4.25)
$$\begin{aligned}&\quad =\int _{{\mathbb {R}}^d} f_d(x+\lambda y) {}_{{\mathbb {D}}^{-\infty }} \Bigg \langle \delta _y \Big (\sum _{i=0}^dV_{d,i}(x)B_t^{d,i}\Big ) \end{aligned}$$
(4.26)
$$\begin{aligned}&\quad H_{\gamma } \Bigg (\sum _{i=0}^dV_{d,i}(x)B_t^{d,i},\prod _{\ell =1}^k \sum _{ | \alpha |=\beta _\ell } L_{d,\alpha _1}\cdots L_{d,\alpha _{r-1}}V_{d,\alpha _r}^{\gamma _\ell }(x) {\mathbb {B}}_t^{d,\alpha } \Bigg ) \Bigg \rangle {}_{{\mathbb {D}}^{\infty }} dy, \end{aligned}$$
(4.27)

for \(t \in (0,T]\), \(x \in {\mathbb {R}}^d\), \(\lambda \in (0,1]\), \(k=1,\ldots ,j \le m\), \(\beta _1,\ldots ,\beta _k \ge 2\) such that \(\beta _1+\cdots +\beta _k=j+k\), and \(\gamma \in \{1,\ldots ,d \}^k\). Using the Itô formula for the products of iterated integrals (Proposition 5.2.3 of [21] for example) and the formula from (3.14): for a multi-index \(\gamma \in \{1,\ldots ,d \}^p\) and a multi-index \(\alpha \in \{0,1,\ldots ,d \}^q\),

$$\begin{aligned}&{}_{{\mathbb {D}}^{-\infty }}\Bigg \langle \delta _y \Bigg (\sum _{i=0}^d V_{d,i}(x) B_t^{d,i} \Bigg ), H_{\gamma } \Bigg (\sum _{i=0}^d V_{d,i}(x) B_t^{d,i},{\mathbb {B}}_t^{d,\alpha } \Bigg )\Bigg \rangle {}_{{\mathbb {D}}^\infty } \nonumber \\&\quad = {}_{{\mathbb {D}}^{-\infty }} \Bigg \langle \delta _y \Bigg (\sum _{i=0}^d V_{d,i}(x) B_t^{d,i} \Bigg ), \sum _{j_1,\ldots ,j_{|\gamma |},\beta _{1},\ldots ,\beta _{|\gamma |}=1}^d \frac{1}{t^{|\gamma |}} \prod _{q=1}^{|\gamma |} [A_d^{-1}]_{\gamma _q,j_q}(x) V_{d,\beta _{q}}^{j_q}(x)\\&\qquad \quad \frac{1}{k!} \textbf{B}_t^{d,(\alpha _1,\ldots ,\alpha _{k},\beta _1,\ldots ,\beta _{|\gamma |})} \Bigg \rangle {}_{{\mathbb {D}}^\infty } \end{aligned}$$

iteratively, we have (3.15) and the representation (3.16).

We can see that for \(p\ge 1\) and \(e=1,\ldots ,n(m)\), \(\Vert g_e(t) \textrm{Poly}_e(B_t^d)\Vert _p=O(t^{\nu _r/2})\) for some \(\nu _r \ge 1\), and by Assumption 1 and 2 and the expression of \(h_e\), there is \(\eta >0\) independent of d such that \(|h_e(x)| \le \eta d^\eta \) for all \(e=1,\ldots ,n(m)\) and \(x \in [a,b]^d\). Then, for \(p\ge 1\), there exists \(c>0\) independent of d such that

$$\begin{aligned} \Vert {{\mathcal {M}}}^{m}_{d,\lambda }(t,x,B_t^d) \Vert _p \le cd^c, \end{aligned}$$
(4.28)

uniformly in \((t,x)\in (0,T] \times [a,b]^d\) and \(\lambda \in (0,1]\).

4.3 Proof of Theorem 1

The first statement is immediately obtained by combining Propositions 1 with 2:

$$\begin{aligned} \sup _{x \in [a,b]^d}|u_{\lambda }^d(t,x)-E[f_{d}(\bar{X}_t^{d,\lambda ,x})\mathcal{M}^m_{d,\lambda }(t,x,B_t^d)]|=O(\lambda ^{m+1} t^{(m+1)/2}). \end{aligned}$$
(4.29)

Hereafter, we fix \(t \in (0,T]\) and \(\lambda \in (0,1]\). For \(d \in {\mathbb {N}}\), \(x\in {\mathbb {R}}^d\), \(\delta \in (0,1)\), let

$$\begin{aligned} \bar{X}_t^{d,\lambda ,x,\delta }=x+\lambda \textstyle {\sum _{i=0}^d} V_{d,i}^{\delta }(x)B_t^{d,i} \end{aligned}$$
(4.30)

and \({{\mathcal {M}}}^{m,\delta }_{d,\lambda }(t,x,B_t^d) \in {\mathbb {D}}^\infty (\Omega ^d)\) be a functional which has the form:

$$\begin{aligned} {{\mathcal {M}}}^{m,\delta }_{d,\lambda }(t,x,B_t^d) = \textstyle {1+\sum _{e\le n(m)}} \lambda ^{p(e)} g_e(t) h^{\delta }_{e}(x)\textrm{Poly}_e({B}_t^d), \end{aligned}$$
(4.31)

where \(h_{e}^{\delta }: {\mathbb {R}}^d \rightarrow {\mathbb {R}}\), \(e=1,\ldots ,n(m)\) are functions constructed by some products of \(A^{-1}_{d,\delta }\), \(\{V^\delta _{d,i}\}_{0\le i \le d}\) and \(\{V^\delta _{d,i,\alpha }\}_{0\le i \le d,\alpha \in \{1,\ldots ,d \}^{\ell },\ell \le 2m}\) in Assumption 2, by replacing with \(A^{-1}_{d}\), \(\{V_{d,i}\}_{0\le i \le d}\) and \(\{V_{d,i,\alpha }\}_{0\le i \le d,\alpha \in \{1,\ldots ,d \}^{\ell },\ell \le 2m}\) in Proposition 2, satisfying

$$\begin{aligned}&E[f_{d}(\bar{X}_t^{d,\lambda ,x,\delta }) {{\mathcal {M}}}^{m,\delta }_{d,\lambda }(t,x,B_t^d)]\nonumber \\&\quad =E\Bigg [ f_{d}(\bar{X}_t^{d,\lambda ,x,\delta })\Bigg \{1+\sum _{j=1}^m \lambda ^j \sum _{k=1}^j \sum _{\beta _1+\cdots +\beta _k=j+k,\beta _i\ge 2}\sum _{(\gamma _1,\ldots ,\gamma _k)\in \{1,\ldots ,d \}^k}\frac{1}{k!} \nonumber \\&\quad H_{(\gamma _1,\ldots ,\gamma _k)} \Bigg (\sum _{i=1}^dV^{\delta }_{d,i}(x)B_t^{d,i},\prod _{\ell =1}^k \sum _{ | \alpha |=\beta _\ell } L^{\delta }_{d,\alpha _1}\cdots L^{\delta }_{d,\alpha _{r-1}}V_{d,\alpha _r}^{\delta ,\gamma _\ell }(x) {\mathbb {B}}_t^{d,\alpha } \Bigg )\Bigg \}\Bigg ]. \end{aligned}$$
(4.32)

Next, we prepare the following lemmas (Lemmas 2, 3 and 4) to prove the second assertion ((3.20)) in Theorem 1.

Lemma 2

There exists \(c_1>0\) which depends only on \(a,b,C,m,\kappa ,t\) and \(\lambda \) such that for all \(\varepsilon \in (0,1)\), \(d\in {\mathbb {N}}\), \(\delta =O(\varepsilon ^{c_1} d^{-c_1})\),

$$\begin{aligned} \sup _{x \in [a,b]^d}|E[f_{d}(\bar{X}_t^{d,\lambda ,x})\mathcal{M}^m_{d,\lambda }(t,x,B_t^d)]-E[f^{\delta }_{d}(\bar{X}_t^{d,\lambda ,x,\delta })\mathcal{M}^{m,\delta }_{d,\lambda }(t,x,B_t^d)]|\le \varepsilon ,\qquad \end{aligned}$$
(4.33)

where \(f^{\delta }_{d}={{\mathcal {R}}}(\psi _{\delta }^{f_d}) \in C({\mathbb {R}}^d,{\mathbb {R}})\) is defined in Assumption 2.4.

Proof

In the proof, we use a generic constant \(c>0\) which depends only on \(a,b,C,m,\kappa ,t\) and \(\lambda \). Note that for \(x \in [a,b]^d\),

$$\begin{aligned}{} & {} |E[f_{d}(\bar{X}_t^{d,\lambda ,x})\mathcal{M}^m_{d,\lambda }(t,x,B_t^d)]-E[f^{\delta }_{d}(\bar{X}_t^{d,\lambda ,x,\delta })\mathcal{M}^{m,\delta }_{d,\lambda }(t,x,B_t^d)]|\nonumber \\{} & {} \quad \le | E[f_{d}(\bar{X}_t^{d,\lambda ,x})\mathcal{M}^m_{d,\lambda }(t,x,B_t^d)]-E[f_{d}(\bar{X}_t^{d,\lambda ,x,\delta })\mathcal{M}^m_{d,\lambda }(t,x,B_t^d)] |\nonumber \\{} & {} \qquad +| E[f_{d}(\bar{X}_t^{d,\lambda ,x,\delta })\mathcal{M}^m_{d,\lambda }(t,x,B_t^d)]-E[f^{\delta }_{d}(\bar{X}_t^{d,\lambda ,x,\delta })\mathcal{M}^m_{d,\lambda }(t,x,B_t^d)] |\nonumber \\{} & {} \qquad +| E[f^{\delta }_{d}(\bar{X}_t^{d,\lambda ,x,\delta })\mathcal{M}^m_{d,\lambda }(t,x,B_t^d)]-E[f^{\delta }_{d}(\bar{X}_t^{d,\lambda ,x,\delta })\mathcal{M}^{m,\delta }_{d,\lambda }(t,x,B_t^d)] |. \end{aligned}$$
(4.34)

By 2 of Assumption 2 (with Assumption 1), it holds that

$$\begin{aligned}{} & {} | E[f_{d}(\bar{X}_t^{d,\lambda ,x})\mathcal{M}^m_{d,\lambda }(t,x,B_t^d)]-E[f_{d}(\bar{X}_t^{d,\lambda ,x,\delta })\mathcal{M}^{m}_{d,\lambda }(t,x,B_t^d)] |\nonumber \\{} & {} \quad \le C \Vert \bar{X}_t^{d,\lambda ,x}-\bar{X}_t^{d,\lambda ,x,\delta }\Vert _2 \Vert \mathcal{M}^m_{d,\lambda }(t,x,B_t^d) \Vert _2 \le \delta c d^c, \end{aligned}$$
(4.35)

for all \(x \in [a,b]^d\). By 4 of Assumption 2 (with Assumption 1), it holds that

$$\begin{aligned} | E[f_{d}(\bar{X}_t^{d,\lambda ,x,\delta })\mathcal{M}^m_{d,\lambda }(t,x,B_t^d)]-E[f^{\delta }_{d}(\bar{X}_t^{d,\lambda ,x,\delta })\mathcal{M}^m_{d,\lambda }(t,x,B_t^d)] | \le \delta c d^c, \end{aligned}$$
(4.36)

for all \(x \in [a,b]^d\). Here, the estimate \( \Vert \mathcal{M}^m_{d,\lambda }(t,x,B_t^d) \Vert _2 \le cd^c\) in (3.18) is used in (4.35) and (4.36). By 2, 3, 4 of Assumption 2 (with Assumption 1), (3.16) and (4.31), we have that for \(p\ge 1\),

$$\begin{aligned} \Vert {{\mathcal {M}}}^m_{d,\lambda }(t,x,B_t^d)-\mathcal{M}^{m,\delta }_{d,\lambda }(t,x,B_t^d) \Vert _p \le \delta c d^c \end{aligned}$$
(4.37)

and

$$\begin{aligned} | E[f^{\delta }_{d}(\bar{X}_t^{d,\lambda ,x,\delta })\mathcal{M}^m_{d,\lambda }(t,x,B_t^d)]-E[f^{\delta }_{d}(\bar{X}_t^{d,\lambda ,x,\delta })\mathcal{M}^{m,\delta }_{d,\lambda }(t,x,B_t^d)] | \le \delta c d^c, \end{aligned}$$
(4.38)

for all \(x \in [a,b]^d\). Then, by taking \(\delta =(1/3) c_1^{-1}\varepsilon ^{c_1}d^{-c_1}\) with \(c_1=\max \{1,c \}\) where c is the maximum constant appearing in (4.35), (4.36) and (4.38)), we have

$$\begin{aligned} \sup _{x \in [a,b]^d}|E[f_{d}(\bar{X}_t^{d,\lambda ,x})\mathcal{M}^m_{d,\lambda }(t,x,B_t^d)]-E[f^{\delta }_{d}(\bar{X}_t^{d,\lambda ,x,\delta })\mathcal{M}^{m,\delta }_{d,\lambda }(t,x,B_t^d)]|\le \varepsilon . \ \ \end{aligned}$$
(4.39)

\(\square \)

Lemma 3

For \(d\in {\mathbb {N}}\), \(t \in (0,T]\) and \(M\in {\mathbb {N}}\), let \(B_t^{d,(\ell )}\), \(\ell =1,\ldots ,M\) be independent identically distributed random variables such that \(B_t^{d,(\ell )} \overset{\textrm{law}}{=} B_t^{d}\). There exists \(c_2>0\) which depends only on \(a,b,C,m,\kappa ,t\) and \(\lambda \) such that for \(\varepsilon \in (0,1)\), \(d\in {\mathbb {N}}\) and \(M=O(\varepsilon ^{-c_2} d^{c_2})\), there is \(\omega _{\varepsilon ,d} \in \Omega ^d\) satisfying

$$\begin{aligned}&\sup _{x\in [a,b]^d} \Bigg | E[f^{\delta }_{d}(\bar{X}_t^{d,\lambda ,x,\delta })\mathcal{M}^{m,\delta }_{d,\lambda }(t,x,B_t^d)]-\frac{1}{M} \sum _{\ell =1}^M f^{\delta }_{d}(\bar{X}_t^{d,\lambda ,x,\delta ,(\ell )}(\omega _{\varepsilon ,d}))\mathcal{M}^{m,\delta }_{d,\lambda }(t,x,B_t^{d,(\ell )}\nonumber \\&\qquad \qquad \times (\omega _{\varepsilon ,d})) \Bigg | \le \varepsilon , \end{aligned}$$
(4.40)

where \(\delta =O(\varepsilon ^{c_1}d^{-c_1})\) with the constant \(c_1\) in Lemma 2.

Proof

There exists a constant \(c >0\) which depends only on \(a,b,C,m,\kappa ,t\) and \(\lambda \) such that for all \(x \in [a,b]^d\) and \(M \in {\mathbb {N}}\),

$$\begin{aligned}{} & {} E\Big [\Big | E[f^{\delta }_{d}(\bar{X}_t^{d,\lambda ,x,\delta }){{\mathcal {M}}}^{m,\delta }_{d,\lambda }(t,x,B_t^d)] -\frac{1}{M} \sum _{\ell =1}^M f^{\delta }_{d}(\bar{X}_t^{d,\lambda ,x,\delta ,(\ell )})\mathcal{M}^{m,\delta }_{d,\lambda }(t,x,B_t^{d,(\ell )}) \Big |^2 \Big ]\nonumber \\ \end{aligned}$$
(4.41)
$$\begin{aligned}{} & {} \quad \le \frac{1}{M} E[|f^{\delta }_{d}(\bar{X}_t^{d,\lambda ,x,\delta })\mathcal{M}^{m,\delta }_{d,\lambda }(t,x,B_t^d)|^2] \le \frac{cd^{c}}{M}.\nonumber \\ \end{aligned}$$
(4.42)

Then, by choosing \(c_2=\max \{1,c \}\), we have that for all \(\varepsilon \in (0,1)\), \(d \in {\mathbb {N}}\) and \(M=c_2 \varepsilon ^{-c_2}d^{c_2}\),

$$\begin{aligned} E\Big [\Big | E[f^{\delta }_{d}(\bar{X}_t^{d,\lambda ,x,\delta })\mathcal{M}^{m,\delta }_{d,\lambda }(t,x,B_t^d)]-\frac{1}{M} \sum _{\ell =1}^M f^{\delta }_{d}(\bar{X}_t^{x,\delta ,(\ell )})\mathcal{M}^{m,\delta }_{d}(t,x,B_t^{(\ell )}) \Big |^2 \Big ]^{1/2} \le \varepsilon , \end{aligned}$$
(4.43)

for all \(x \in [a,b]^d\), and therefore, there is \(\omega _{\varepsilon ,d} \in \Omega ^d\) satisfying

$$\begin{aligned}&\sup _{x\in [a,b]^d} \Big | E[f^{\delta }_{d}(\bar{X}_t^{d,\lambda ,x,\delta })\mathcal{M}^{m,\delta }_{d,\lambda }(t,x,B_t^d)]\nonumber \\&\quad -\frac{1}{M} \sum _{\ell =1}^M f^{\delta }_{d}(\bar{X}_t^{x,\delta ,(\ell )}(\omega _{\varepsilon ,d}))\mathcal{M}^{m,\delta }_{d}(t,x,B_t^{(\ell )}(\omega _{\varepsilon ,d})) \Big | \le \varepsilon . \ \ \end{aligned}$$
(4.44)

\(\square \)

Lemma 4

For \(d\in {\mathbb {N}}\), \(t \in (0,T]\) and \(M\in {\mathbb {N}}\), let \(B_t^{d,(\ell )}\), \(\ell =1,\ldots ,M\) be independent identically distributed random variables such that \(B_t^{d,(\ell )} \overset{\textrm{law}}{=} B_t^{d}\). There exist \(\{ \phi _{\varepsilon ,d} \}_{\varepsilon \in (0,1),d \in {\mathbb {N}}} \subset {{\mathcal {N}}}\) and \(c>0\) (which depends only on \(a,b,C,m,\kappa ,t\) and \(\lambda \)) such that for all \(\varepsilon \in (0,1)\), \(d\in {\mathbb {N}}\), we have \(\mathcal{C}(\phi _{\varepsilon ,d})\le c \varepsilon ^{-c}d^c\), and for a realization \(\omega _{\varepsilon ,d} \in \Omega ^d\) given in Lemma 3, it holds that

$$\begin{aligned} \sup _{x \in [a,b]^d} \Big |\frac{1}{M} \sum _{\ell =1}^M f^{\delta }_{d}(\bar{X}_t^{d,\lambda ,x,\delta ,(\ell )}(\omega _{\varepsilon ,d}))\mathcal{M}^{m,\delta }_{d}(t,x,B_t^{d,(\ell )}(\omega _{\varepsilon ,d}))-\mathcal{R}(\phi _{\varepsilon ,d})(x) \Big | \le \varepsilon , \end{aligned}$$
(4.45)

where \(\delta =O(\varepsilon ^{c_1}d^{-c_1})\) and \(M=O(\varepsilon ^{-c_2}d^{c_2})\) with the constants \(c_1\) and \(c_2\) in Lemmas 2 and 3.

Proof

In the proof, we use a generic constant \(c>0\) which depends only on \(a,b,C,m,\kappa ,t\) and \(\lambda \). Let \(\varepsilon \in (0,1)\), \(d\in {\mathbb {N}}\), \(\ell =1,\ldots ,M\), let \(\delta =O(\varepsilon ^{c_1} d^{-c_1})\), \(M=O(\varepsilon ^{-c_2} d^{c_2})\) where \(c_1\) and \(c_2\) are the constants appearing in Lemmas 2 and 3, let \(\omega _{\varepsilon ,d}\) be a realization given in Lemma 3, and let \(b^{d,(\ell )}=B_t^{d,(\ell )}(\omega _{\varepsilon ,d})\). Since there exists \(\eta _{\delta ,d}^{(\ell )} \in {{\mathcal {N}}}\) such that \(\mathcal{R}(\eta ^{(\ell )}_{\delta ,d})(x)=x+\lambda \mathcal{R}(\psi _{\delta ,d}^{V_0})(x)t+\lambda \textstyle {\sum _{i=1}^d} \mathcal{R}(\psi _{\delta ,d}^{V_i})(x) b^{d,(\ell ),i}\) for \(x \in {\mathbb {R}}^d\) and \(\mathcal{C}(\eta ^{(\ell )}_{\delta ,d})=O(\delta ^{-c}d^c)\) (by Lemma 9 in Appendix B), there exists \(\psi _{1,(\ell )}^{\delta ,d} \in {{\mathcal {N}}}\) such that \(\mathcal{R}(\psi _{1,(\ell )}^{\delta ,d})(x)=\mathcal{R}(\psi _{\delta ,d}^{f})(\mathcal{R}(\eta ^{(\ell )}_{\delta ,d})(x))=f_{d}^\delta (\bar{X}_t^{d,\lambda ,x,\delta }(\omega _{\varepsilon ,d}))\) for \(x \in {\mathbb {R}}^d\) and \(\mathcal{C}(\psi _{1,(\ell )}^{\delta ,d})=O(\delta ^{-c}d^c)\) (by Lemma 10 in Appendix B). Next, we recall that by (4.31), the weight \(\mathcal{M}^{m,\delta }_{d,\lambda }(t,x,b^{d,(\ell )})\), \(x \in {\mathbb {R}}^d\) has the form \({{\mathcal {M}}}^{m,\delta }_{d,\lambda }(t,x,b^{d,(\ell )})= \textstyle {1+\sum _{e\le n(m)}} \lambda ^{p(e)} g_e(t) h^{\delta }_{e}(x)\textrm{Poly}_{e}(b^{d,(\ell )})\) constructed by some products of \(A^{-1}_{d,\delta }\), \(\{V^{\delta }_{d,i}\}_{0\le i \le d}\) and \(\{V^{\delta }_{d,i,\alpha }\}_{0\le i \le d,\alpha \in \{1,\ldots ,d \}^{\ell },\ell \le 2m}\) in Assumption 2. Using Lemmas 12, 9 in Appendix B and Assumption 2, there is a neural network \(\psi ^{\varepsilon ,d}_{2,(\ell )} \in {{\mathcal {N}}}\) such that \(\textstyle {\sup _{x\in [a,b]^d}}|\mathcal{M}^{m,\delta }_{d,\lambda }(t,x,b^{d,(\ell )})-\mathcal{R}(\psi ^{\varepsilon ,d}_{2,(\ell )})(x)|\le \varepsilon /2\) and \({{\mathcal {C}}}(\psi ^{\varepsilon ,d}_{2,(\ell )})=O(\varepsilon ^{-c}d^c)\). Hence, we have

$$\begin{aligned} \sup _{x\in [a,b]^d}|f_{d}^\delta (\bar{X}_t^{d,\lambda ,x,\delta ,(\ell )}(\omega _{\varepsilon ,d}))\mathcal{M}^{m,\delta }_{d,\lambda }(t,x,b^{d,(\ell )})-\mathcal{R}(\psi _{1,(\ell )}^{\delta ,d})(x)\mathcal{R}(\psi _{2,(\ell )}^{\varepsilon ,d})(x)|\le \varepsilon /2. \end{aligned}$$
(4.46)

We again use Lemma 12 in Appendix B to see that there exists \(\Psi _{(\ell )}^{\varepsilon ,d} \in {{\mathcal {N}}}\) such that

$$\begin{aligned}{} & {} |{{\mathcal {R}}}(\psi _{1,(\ell )}^{\delta ,d})(x)\mathcal{R}(\psi _{2,(\ell )}^{\varepsilon ,d})(x)-\mathcal{R}(\Psi _{(\ell )}^{\varepsilon ,d})(x)| \le \varepsilon /2, \end{aligned}$$
(4.47)

for all \(x \in [a,b]^d\), and \(\mathcal{C}(\Psi _{(\ell )}^{\varepsilon ,d})=O(\varepsilon ^{-c}d^{c})\). Finally, applying Lemma 9 gives the desired result, i.e. there exist \(\{ \phi _{\varepsilon ,d} \}_{\varepsilon \in (0,1),d \in {\mathbb {N}}} \subset {{\mathcal {N}}}\) and \(c>0\) such that for all \(\varepsilon \in (0,1)\), \(d\in {\mathbb {N}}\), we have \(\mathcal{C}(\phi ^{\varepsilon ,d})\le c \varepsilon ^{-c}d^c\), and for a realization \(\omega _{\varepsilon ,d} \in \Omega ^d\) given in Lemma 3, it holds that

$$\begin{aligned} \sup _{x \in [a,b]^d} \Big |\frac{1}{M} \sum _{\ell =1}^M f^{\delta }_{d}(\bar{X}_t^{d,\lambda ,x,\delta ,(\ell )}(\omega _{\varepsilon ,d}))\mathcal{M}^{m,\delta }_{d}(x,B_t^{d,(\ell )}(\omega _{\varepsilon ,d}))-\mathcal{R}(\phi _{\varepsilon ,d})(x) \Big | \le \varepsilon . \nonumber \\ \end{aligned}$$
(4.48)

\(\square \)

Proof

The first assertion (in (3.19)) follows from (4.29). The second assertion (in (3.20)) is obtained by combining Lemmas 2, 3 and 4. \(\square \)

5 Deep learning implementation

We briefly provide the implementation scheme for the approximation in Theorem 1. Let \(\xi \) be a uniformly distributed random variable, i.e. \(\xi \in U([a,b]^d)\), and define \(\textstyle {{\mathbb {X}}_t^{\xi }=\xi +\lambda \sum _{i=0}^d V_{i,d}(\xi )B_t^{i,d}}\), \(t \ge 0\). For \(t>0\), the m-th order asymptotic expansion of Theorem 1 can be represented by

$$\begin{aligned} u^{m}(t,\cdot )=\textrm{argmin}_{\psi \in C([a,b]^d)} E[ | \psi (\xi )- f({\mathbb {X}}_t^{\xi }) {{\mathcal {M}}}^{m}_{d,\lambda }(t,\xi ,B_t^d) |^2 ], \end{aligned}$$
(5.1)

which is obtained by Theorem 1 of this paper combining with Proposition 2.2 of Beck et al. [2]. We construct a deep neural network \(u^{{{\mathcal {N}}}{{\mathcal {N}}},\theta ^*}(t,\cdot )\) to approximate the function \(u^{m}(t,\cdot )\) given by for a depth \(L \in {\mathbb {N}}\) and \(N_0,N_1,\ldots ,N_L \in {\mathbb {N}}\),

$$\begin{aligned} u^{{{\mathcal {N}}}{{\mathcal {N}}},\theta }(t,x)={{\mathcal {A}}}_{W^\theta _L,B^\theta _L} \circ \varrho _{N_{L-1}} \circ \mathcal{A}_{W^\theta _{L-1},B^\theta _{L-1}} \circ \cdots \circ \varrho _{N_{1}} \circ {{\mathcal {A}}}_{W^\theta _{1},B^\theta _{1}} (x), \ x \in {\mathbb {R}}^d, \end{aligned}$$
(5.2)

where \({{\mathcal {A}}}_{W^\theta _k,B^\theta _k}(x)=W^\theta _kx+B^\theta _k\), \(x \in {\mathbb {R}}^{N_{k-1}}\), \(k=1,\ldots ,L\) with \(((W^\theta _1,B^\theta _1),\ldots ,(W^\theta _L,B^\theta _L)) \in \mathcal{N}_L^{N_0,N_1,\ldots ,N_L}\) given by

$$\begin{aligned}&{{\mathcal {A}}}_{W^\theta _k,B^\theta _k}(x) =\left( \begin{array}{ccc} \theta ^{q+1} &{} \cdots &{} \theta ^{q+N_{k-1}} \\ \vdots &{} \ddots &{} \vdots \\ \theta ^{q+(N_{k}-1)N_{k-1}+1} &{} \cdots &{} \theta ^{q+N_{k} N_{k-1}} \\ \end{array} \right) \left( \begin{array}{c} x_1 \\ \vdots \\ x_{N_{k-1}} \\ \end{array} \right) +\left( \begin{array}{c} \theta ^{q+N_{k} N_{k-1}+1} \\ \vdots \\ \theta ^{q+N_{k} N_{k-1}+N_{k}} \\ \end{array} \right) , \end{aligned}$$
(5.3)

and the optimized parameter \(\theta ^*\) obtained by the following minimization problem:

$$\begin{aligned} \theta ^*=\textrm{argmin}_{\theta \in {\mathbb {R}}^{\sum _{\ell =1}^L N_{\ell }(N_{\ell -1}+1)}} E[ | u^{{{\mathcal {N}}}{{\mathcal {N}}},\theta }(t,\xi )- f({\mathbb {X}}_t^{\xi }) {{\mathcal {M}}}^{m}_{d,\lambda }(t,\xi ,B_t^d) |^2 ]. \end{aligned}$$
(5.4)

In the implementation of the deep neural network approximation, we use stochastic gradient descent method and the Adam optimizer [20] as in Sects. 3 and 4 of Beck et al. [2]. In Appendix C, we list the sample code of the scheme for a high-dimensional PDE with a nonlinear coefficient in Sect. 6.2 (which includes linear coefficient case).

6 Numerical examples

In the section, we perform numerical experiments in order to demonstrate the accuracy of our scheme. We compare the deep learning method of Beck et al. [2] where the Euler–Maruyama scheme is used with the stochastic gradient descent method with the Adam optimizer. All experiments are performed in Google Colaboratory using Tensorflow.

6.1 High-dimensional Black–Scholes model

6.1.1 Uncorrelated case

First, we examine our scheme for a high-dimensional Black–Scholes model (geometric Brownian motion) whose corresponding PDE is given by

$$\begin{aligned} \partial _t u_\lambda ^d(t,x)=\lambda \sum _{i=1}^d \mu x_i \frac{\partial }{\partial x_i}u_\lambda ^d(t,x) + \frac{\lambda ^2}{2} \sum _{i=1}^d c_i^2 x_i^2 \frac{\partial ^2}{\partial x_i^2} u_\lambda ^d(t,x), \ \ u_\lambda ^d(0,x)=f_{d}(x), \end{aligned}$$
(6.1)

where \(f_d(x)=\max \{ \max \{x_1-K \},\ldots ,\max \{x_d-K \} \}\). Let \(d=100\), \(t=1.0\), \(a=99.0\), \(b=101.0\), \(K=100.0\), \(\lambda =0.3\), \(\mu =1/30\) (or \(r:=\lambda \times \mu =0.01\)), \(c_i=1.0\) (or \(\sigma _i:=\lambda \times c_i=0.3\)), \(i=1,\ldots ,100\). We approximate the function \(u_\lambda ^d(t,\cdot )\) (or the maximum option price \(e^{-rt}u_\lambda ^d(t,\cdot )\) in financial mathematics) on \([a,b]^d\) by constructing a deep neural network (1 input layer with d-neurons, 2 hidden layers with 2d-neurons each and 1 output layer with 1-neuron) based on Theorem 1 with \(m=1\) and Sect. 5. For the experiment, we use the batch size \(M=1,024\), the number of iteration steps \(J=5,000\) and the learning rate \(\gamma (j)=10^{-1}{} \textbf{1}_{[0,0.3J]}(j)+10^{-2}\textbf{1}_{(0.3J,0.6J]}(j)+10^{-3}{} \textbf{1}_{(0.6J,J]}(j)\), \(j \le J\) for the stochastic gradient descent method. After we estimate the function \(u_\lambda ^d(t,\cdot )\), we input \(x_0=(100.0,\ldots ,100.0) \in [a,b]^d\) to check the accuracy. We compute the mean of 10 independent trials and estimate the relative error, i.e. \(|(u_\lambda ^{deep,d}(t,x_0)-u_\lambda ^{ref,d}(t,x_0))/u_\lambda ^{ref,d}(t,x_0)|\) where the reference value \(u_\lambda ^{ref,d}(t,x_0)\) is computed by the Itô formula with Monte-Carlo method with \(10^7\)-paths. The same experiment is applied to the method of Beck et al. [2]. Table 1 provides the numerical results (the relative errors and the runtimes) for AE \(m=1\) and the method in Beck et al. [2] with the Euler–Maruyama discretization \(n=16\), 32 (Beck et al. \(n=16\), Beck et al. \(n=32\) in the table).

Table 1 Comparison in deep learning methods for \(d=100\)

6.1.2 Correlated case

We next provide a numerical example for a Black-Scholes model with correlated noise in high-dimension. Let us consider the following PDE:

$$\begin{aligned} \partial _t u_\lambda ^d(t,x){=} \lambda \sum _{i=1}^d \mu x_i \frac{\partial }{\partial x_i}u_\lambda ^d(t,x){+}\frac{\lambda ^2}{2} \sum _{i,j,k{=}1}^d \sigma _k^i \sigma _k^j x_i x_j \frac{\partial ^2}{\partial x_i \partial x_j} u_\lambda ^d(t,x), \ \ u_\lambda ^d(0,x){=}f_{d}(x),\nonumber \\ \end{aligned}$$
(6.2)

where \(f_d(x)=\max \{ K-\textstyle {\frac{1}{d}\sum _{i=1}^d x_i},0 \}\) and \(\sigma =[\sigma _k^j]_{k,j} \in {\mathbb {R}}^{d \times d}\) satisfies \(\sigma _{ij}=0\) for \(i<j\), \(\sigma _{ii}>0\) for \(i=1,\ldots ,d\) and

$$\begin{aligned} \sigma \sigma ^\top =\left( \begin{array}{cccc} 1&{}\rho &{}\cdots &{}\rho \\ \rho &{}1&{}\rho &{}\rho \\ \vdots &{}\vdots &{}\ddots &{}\vdots \\ \rho &{}\rho &{}\rho &{}1 \end{array}\right) \in {\mathbb {R}}^{d \times d}. \end{aligned}$$
(6.3)

Let \(d=100\), \(t=1.0\), \(a=99.0\), \(b=101.0\), \(K=90.0\), \(\lambda =0.3\), \(\mu =0.0\), \(\rho =0.5\). We approximate the function \(u_\lambda ^d(t,\cdot )\) (the basket option price in financial mathematics) on \([a,b]^d\) by constructing a deep neural network (1 input layer with d-neurons, 2 hidden layers with 2d-neurons each and 1 output layer with 1-neuron) based on Theorem 1 (\(m=1\)) with the expansion technique of the basket option price given in Section 3.1 of Takahashi [32] and Sect. 5. For the experiment, we use the batch size \(M=1,024\), the number of iteration steps \(J=5,000\) and the learning rate \(\gamma (j)=5.0\times 10^{-2}{} \textbf{1}_{[0,0.3J]}(j)+5.0\times 10^{-3}\textbf{1}_{(0.3J,0.6J]}(j)+5.0\times 10^{-4}{} \textbf{1}_{(0.6J,J]}(j)\), \(j\le J\) for the stochastic gradient descent method. After we estimate the function \(u_\lambda ^d(t,\cdot )\), we input \(x_0=(100.0,\ldots ,100.0) \in [a,b]^d\) to check the accuracy. We compute the mean of 10 independent trials and estimate the relative error, i.e. \(|(u_\lambda ^{deep,d}(t,x_0)-u_\lambda ^{ref,d}(t,x_0))/u_\lambda ^{ref,d}(t,x_0)|\) where the reference value \(u_\lambda ^{ref,d}(t,x_0)\) is computed by the Itô formula with Monte-Carlo method with \(10^7\)-paths. The same experiment is applied to the method of Beck et al. [2]. Table 2 provides the numerical results (the relative errors and the runtimes) for AE \(m=1\) and the method in Beck et al. [2] with the Euler–Maruyama discretization \(n=32\), 64 (Beck et al. \(n=32\), Beck et al. \(n=64\) in the table).

Table 2 Comparison in deep learning methods for \(d=100\)

6.2 High-dimensional CEV model (nonlinear volatility case)

We consider a Kolmogorov PDE with nonlinear diffusion coefficients whose corresponding stochastic process is called the CEV model:

$$\begin{aligned} \partial _t u_\lambda ^d(t,x)=\lambda \sum _{i=1}^d \mu x_i \frac{\partial }{\partial x_i}u_\lambda ^d(t,x) + \frac{\lambda ^2 }{2} \sum _{i=1}^d \gamma _i^2 c_i^2 x_i^{2\beta _i} \frac{\partial ^2}{\partial x_{i}^2} u_\lambda ^d(t,x), \ \ u_\lambda ^d(0,x)=f_{d}(x),\nonumber \\ \end{aligned}$$
(6.4)

where \(f_d(x)=\max \{ \max \{x_1-K \},\ldots ,\max \{x_d-K \} \}\). Let \(d=100\), \(t=1.0\), \(a=99.0\), \(b=101.0\), \(K=100.0\), \(\lambda =0.3\), \(\mu =1/30\) (or \(r:=\lambda \times \mu =0.01\)), \(\beta _i=0.5\), \(\gamma _i=K^{1-\beta _i}\), \(c_i=1.0\) (or \(\sigma _i:=\lambda \times c_i=0.3\)), \(i=1,\ldots ,d\). We approximate the function \(u_\lambda ^d(t,\cdot )\) (or the maximum option price \(e^{-rt}u_\lambda ^d(t,\cdot )\)) on \([a,b]^d\) by constructing a deep neural network (1 input layer with d-neurons, 2 hidden layers with 2d-neurons each and 1 output layer with 1-neuron,) based on Theorem 1 with \(m=1\). For the experiment, we use the batch size \(M=1024\), the number of iteration steps \(J=5000\) and the learning rate \(\gamma (j)=5.0\times 10^{-1}\textbf{1}_{[0,0.3J]}(j)+5.0\times 10^{-2}{} \textbf{1}_{(0.3J,0.6J]}(j)+5.0\times 10^{-3}{} \textbf{1}_{(0.6J,J]}(j)\), \(j \le J\) for the stochastic gradient descent method. After we estimate the function \(u_\lambda ^d(t,\cdot )\), we input \(x_0=(100.0,\ldots ,100.0) \in [a,b]^d\) to check the accuracy. We compute the mean of 10 independent trials and estimate the relative error, i.e. \(|(u_\lambda ^{deep,d}(t,x_0)-u_\lambda ^{ref,d}(t,x_0))/u_\lambda ^{ref,d}(t,x_0)|\) where the reference value \(u_\lambda ^{ref,d}(t,x_0)\) is computed by Monte-Carlo method with the Euler–Maruyama scheme with time-steps \(2^{10}\) and \(10^7\)-paths. The same experiment is applied to the method of Beck et al. [2]. Table 3 provides the numerical results (the relative errors and the runtimes) for AE \(m=1\) and the method in Beck et al. [2] with the Euler-Maruyama discretization \(n=32\), 64 (Beck et al. \(n=32\), Beck et al. \(n=64\) in the table).

Table 3 Comparison in deep learning methods for \(d=100\)

6.3 High-dimensional Heston model

We finally show an example for a small time asymptotic expansion for a high-dimensional Heston model:

$$\begin{aligned} \partial _t u_\lambda ^{2d}(t,x)={{\mathcal {L}}}^{2d,\lambda } u_\lambda ^{2d}(t,x), \ \ u_\lambda ^{2d}(0,x)=f_{2d}(x), \end{aligned}$$
(6.5)

where \(f_{2d}(x)=\max \{ \max \{x_1-K \},\ldots ,\max \{x_{2d-1}-K \} \}\) and \({{\mathcal {L}}}^{2d,\lambda }\) is a generator given by

$$\begin{aligned} {{\mathcal {L}}}^{2d,\lambda }&= \lambda \sum _{i=1}^d \left[ \kappa _{i} (\theta _{i}-x_{2i}) \frac{\partial }{\partial x_{2i}}\right] \nonumber \\&\quad +\lambda ^2 \sum _{i=1}^d \left[ \frac{1}{2} x_{2i} x_{2i-1}^2 \frac{\partial ^2}{\partial x_{2i-1}^2} + \rho _i \nu _i x_{2i-1} x_{2i} \frac{\partial ^2}{\partial x_{2j-1} \partial x_{2i}}+\frac{1}{2} \nu _i^2 x_2 \frac{\partial ^2}{\partial x_{2i}^2}\right] . \end{aligned}$$
(6.6)

Let \(d=25\) (\(2d=50\)), \(t=0.5\), \(a=99.0\), \(b=101.0\), \(a'=0.035\), \(b'=0.045\), \(K=100.0\), \(\lambda =1.0\), \(\kappa _i=1.0\), \(\theta _i=0.04\), \(\nu _i=0.1\), \(\rho _i=-0.5\), \(i=1,\ldots ,d\). We approximate the function \(u_\lambda ^d(t,\cdot )\) on \([a,b]^d\) by constructing a deep neural network (1 input layer with 2d-neurons, 2 hidden layers with 4d-neurons each and 1 output layer with 1-neuron) based on Theorem 1 with \(m=1\) and Sect. 5. For the experiment, we use the batch size \(M=1,024\), the number of iteration steps \(J=5,000\) and the learning rate \(\gamma (j)=5.0\times 10^{-2}{} \textbf{1}_{[0,0.3J]}(j)+5.0\times 10^{-3}{} \textbf{1}_{(0.3J,0.6J]}(j)+5.0\times 10^{-4}\textbf{1}_{(0.6J,J]}(j)\), \(j \le J\) for the stochastic gradient descent method. After we estimate the function \(u_\lambda ^d(t,\cdot )\), we input \(x_0=(100.0,0.04,\ldots ,100.0,0.04) \in ([a,b] \times [a',b'])^d\) to check the accuracy. We compute the mean of 10 independent trials and estimate the relative error, i.e. \(|(u_\lambda ^{deep,d}(t,x_0)-u_\lambda ^{ref,d}(t,x_0))/u_\lambda ^{ref,d}(t,x_0)|\) where the reference value \(u_\lambda ^{ref,d}(t,x_0)\) is computed by Monte-Carlo method with the Euler–Maruyama scheme with time-steps \(2^{10}\) and \(10^7\)-paths. The same experiment is applied to the method of Beck et al. [2]. Table 4 provides the numerical results (the relative errors and the runtimes) for AE \(m=1\) and the method in Beck et al. [2] with the Euler–Maruyama discretization \(n=16\), 32 (Beck et al. \(n=16\), Beck et al. \(n=32\) in the table).

Table 4 Comparison in deep learning methods for \(2d=50\)

7 Conclusion

In the paper, we introduced a new spatial approximation for solving high-dimensional PDEs without the curse of dimensionality, where an asymptotic expansion method with a deep learning-based algorithm is effectively applied. The mathematical justification for the spatial approximation was provided using Malliavin calculus and ReLU calculus. We checked the effectiveness of our method through numerical examples for high-dimensional Kolmogorov PDEs.

More accurate deep learning-based implementations based on the method of the paper should be studied as a next research topic. We believe that higher order asymptotic expansion or higher order weak approximation (discretization) will give robust computation schemes without the curse of dimensionality, which should be proved mathematically in the future work. Also, applying our method to nonlinear problems as in [14, 15] will be a challenging and important task.