Solving Kolmogorov PDEs without the curse of dimensionality via deep learning and asymptotic expansion with Malliavin calculus

Takahashi, Akihiko; Yamada, Toshihiro

doi:10.1007/s42985-023-00240-4

Solving Kolmogorov PDEs without the curse of dimensionality via deep learning and asymptotic expansion with Malliavin calculus

Original Paper
Open access
Published: 08 June 2023

Volume 4, article number 27, (2023)
Cite this article

Download PDF

You have full access to this open access article

Partial Differential Equations and Applications Aims and scope Submit manuscript

Solving Kolmogorov PDEs without the curse of dimensionality via deep learning and asymptotic expansion with Malliavin calculus

Download PDF

Akihiko Takahashi¹ &
Toshihiro Yamada^2,3

1244 Accesses
Explore all metrics

Abstract

This paper proposes a new spatial approximation method without the curse of dimensionality for solving high-dimensional partial differential equations (PDEs) by using an asymptotic expansion method with a deep learning-based algorithm. In particular, the mathematical justification on the spatial approximation is provided. Numerical examples for high-dimensional Kolmogorov PDEs show effectiveness of our method.

Domain Decomposition Algorithms for Neural Network Approximation of Partial Differential Equations

Numerical solution for high-dimensional partial differential equations based on deep learning with residual learning and data-driven learning

Article 30 January 2021

Improved Deep Neural Networks with Domain Decomposition in Solving Partial Differential Equations

Article 03 September 2022

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Recently, for solving high-dimensional partial differential equations (PDEs), deep learning-based algorithms have been actively proposed (see [2, 3] for instance). Moreover, a number of papers for mathematical justification on the deep learning-based spatial approximations have appeared, where the authors demonstrate that deep neural networks overcome the curse of dimensionality in approximations of high-dimensional PDEs. For the related literature, see [4,5,6, 11, 19] for example. In particular, these works treat some specific forms of PDEs such as high-dimensional heat equations or Kolmogorov PDEs with constant diffusion and nonlinear drift coefficient. Also, integral kernels are assumed to have explicit forms for justification of the spatial approximations for solutions to high-dimensional PDEs.

However, most high-dimensional PDEs may not have explicit integral forms in practice. In other words, integral forms of solutions themselves should be approximated by a certain method.

In the current paper, we give a new spatial approximation using an asymptotic expansion method with a deep learning-based algorithm for solving high-dimensional PDEs without the curse of dimensionality. More precisely, we follow approaches given in [40] and the literature such as [8, 17, 18, 23, 24, 26, 27, 30, 32, 33, 35, 38, 39, 41, 43]. Particularly, we provide a uniform error estimate for the asymptotic expansion for solutions of Kolmogorov PDEs with nonlinear coefficients, motivated by the works of [2, 11, 31]. For a solution to a d-dimensional Kolmogorov PDE with a small parameter $\lambda $, namely $u_{\lambda }:[0,T] \times {\mathbb {R}}^d \rightarrow {\mathbb {R}}$ given by $u_\lambda (t,x)=E[f(X_t^{\lambda ,x})]$ for $(t,x) \in [0,T] \times {\mathbb {R}}^d$ where $\{ X_t^{\lambda ,x}\}_{t\ge 0}$ is a d-dimensional diffusion process starting from x, we justify the following spatial approximation on a range $[a,b]^d$:

$$\begin{aligned} u_\lambda (t,\cdot )\approx & {} \hbox {``high-dimensional asymptotic expansion''} \ E[f(\bar{X}_t^{\lambda , \cdot }) {{\mathcal {M}}}_t^{\lambda , \cdot }] \end{aligned}$$

(1.1)

$$\begin{aligned}\approx & {} \hbox {``deep neural network approximation''} \ \mathcal{R}(\phi )(\cdot ), \end{aligned}$$

(1.2)

by applying an appropriate neural network $\phi $. Here, for $t>0$ and $x \in {\mathbb {R}}^d$, $\bar{X}_t^{\lambda , x}$ is a certain Gaussian random variable and ${{\mathcal {M}}}_t^{\lambda ,x}$ is a stochastic weight for the expansion given based on Malliavin calculus. In order to chose the network $\phi $, the analysis of “product of neural networks" and a dimension analysis of asymptotic expansion with Malliavin calculus are crucial in our approach. We show a precise error estimate for the approximation (1.1) and prove that the complexity of the neural network grows at most polynomially in the dimension d and the reciprocal of the precision $\varepsilon $ of the approximation (1.1). Moreover, we give an explicit form of the asymptotic expansion in (1.1) and show numerical examples to demonstrate effectiveness of the proposed scheme for high-dimensional Kolmogorov PDEs.

The organization of the paper is as follows. Section 2 is dedicated to notation, definitions and preliminary results on deep learning and Malliavin calculus. Section 3 provides the main result, namely, the deep learning-based asymptotic expansion for solving Kolmogorov PDEs. The proof is shown in Sect. 4. Section 5 introduces the deep learning implementation. Various numerical examples are shown in Sect. 6. The useful lemmas on Malliavin calculus and ReLU calculus are summarized, and furthermore the sample code is listed in Appendix.

2 Preliminaries

We first prepare notation. For $d \in {\mathbb {N}}$ and for a vector $x \in {\mathbb {R}}^d$, we denote by $\Vert x \Vert $ the Euclidean norm. Also, for $k,\ell \in {\mathbb {N}}$ and for a matrix $A \in {\mathbb {R}}^{k \times \ell }$, we denote by $\Vert A \Vert $ the Frobenius norm. For $d \in {\mathbb {N}}$, let $I_d$ be the identity matrix. For $m,k,\ell \in {\mathbb {N}}$, let $C({\mathbb {R}}^m, {\mathbb {R}}^{k \times \ell })$ (resp., $C([0,T] \times {\mathbb {R}}^m, {\mathbb {R}}^{k \times \ell })$) be the set of continuous functions $f: {\mathbb {R}}^k \rightarrow {\mathbb {R}}^{k \times \ell }$ (resp., $f: [0,T] \times {\mathbb {R}}^m \rightarrow {\mathbb {R}}^{k \times \ell }$) and $C_{Lip}({\mathbb {R}}^m, {\mathbb {R}}^{k \times \ell })$ be the set of Lipschitz continuous functions $f: {\mathbb {R}}^m \rightarrow {\mathbb {R}}^{k \times \ell }$. Also, we define $C^\infty _b({\mathbb {R}}^m, {\mathbb {R}}^\ell )$ as the set of smooth functions $f: {\mathbb {R}}^m \rightarrow {\mathbb {R}}^{k \times \ell }$ with bounded derivatives of all orders. For a multi-index $\alpha $, let $|\alpha |$ be the length of $\alpha $. For a bounded function $f:{\mathbb {R}}^m \rightarrow {\mathbb {R}}^{k \times \ell }$, we define $\Vert f \Vert _{\infty }=\textstyle {\sup _{x \in {\mathbb {R}}^{m}}} \Vert f(x) \Vert $. For $m,k,\ell \in {\mathbb {N}}$, for a function $f \in C_{Lip}({\mathbb {R}}^m, {\mathbb {R}}^{k \times \ell })$, we denote by $C_{Lip}[f]$ the Lipschitz continuous constant. For $d \in {\mathbb {N}}$ and for a smooth function $f:{\mathbb {R}}^d \rightarrow {\mathbb {R}}$, we define $\partial _i f=\textstyle {\frac{\partial }{\partial x_i}f}$ for $i=1,\ldots ,d$, moreover we define $\partial ^\alpha f=\partial _{\alpha _1}\cdots \partial _{\alpha _k}f$ for $\alpha =(\alpha _1,\ldots ,\alpha _k) \in \{1,\ldots ,d \}^k$, $k \in {\mathbb {N}}$. For $a,b \in {\mathbb {R}}$, we may write $a \vee b=\max \{ a,b \}$.

2.1 Deep neural networks

Let us prepare notation and definitions for deep neural networks. Let ${{\mathcal {N}}}$ be the set of deep neural networks (DNNs):

$$\begin{aligned} {{\mathcal {N}}}=\cup _{L \in {\mathbb {N}} \cap [2,\infty )} \cup _{(N_0,N_1,\ldots ,N_L)\in {\mathbb {N}}^{L+1}} \mathcal{N}_L^{N_0,N_1,\ldots ,N_L}, \end{aligned}$$

(2.1)

where ${{\mathcal {N}}}_L^{N_0,N_1,\ldots ,N_L}={\times }_{\ell =1}^{L} ({\mathbb {R}}^{N_\ell \times N_{\ell -1}} \times {\mathbb {R}}^{N_\ell })$.

Let $\varrho \in C({\mathbb {R}},{\mathbb {R}})$ be an activation function, and for $k\in {\mathbb {N}}$, define $\varrho _{k}(x)=(\varrho (x_1),\ldots ,\varrho (x_k))$, $x \in {\mathbb {R}}^k$.

We define ${{\mathcal {R}}}:{{\mathcal {N}}} \rightarrow \cup _{m,n\in {\mathbb {N}}} C({\mathbb {R}}^m,{\mathbb {R}}^n)$, ${{\mathcal {C}}}:{{\mathcal {N}}} \rightarrow {\mathbb {N}}$, ${{\mathcal {L}}}: {{\mathcal {N}}} \rightarrow {\mathbb {N}}$, $\textrm{dim}_{\textrm{in}}:{{\mathcal {N}}} \rightarrow {\mathbb {N}}$ and $\textrm{dim}_{\textrm{out}}:{{\mathcal {N}}} \rightarrow {\mathbb {N}}$ as follows:

For $L \in {\mathbb {N}} \cap [2,\infty )$, $N_0,\ldots ,N_L \in {\mathbb {N}}$, $\psi =((W_1,B_1),\ldots ,(W_L,B_L)) \in \mathcal{N}_L^{N_0,N_1,\ldots ,N_L}$, let ${{\mathcal {L}}}(\psi )=L$, $\textrm{dim}_{\textrm{in}}(\psi )=N_0$, $\textrm{dim}_{\textrm{out}}(\psi )=N_L$, $\mathcal{C}(\psi )=\textstyle {\sum _{\ell =1}^L} N_{\ell }(N_{\ell -1}+1)$, and

$$\begin{aligned} {{\mathcal {R}}}(\psi )(\cdot )={{\mathcal {A}}}_{W_L,B_L} \circ \varrho _{N_{L-1}} \circ {{\mathcal {A}}}_{W_{L-1},B_{L-1}} \circ \cdots \circ \varrho _{N_{1}} \circ {{\mathcal {A}}}_{W_{1},B_{1}} (\cdot ) \in C({\mathbb {R}}^{N_0},{\mathbb {R}}^{N_L}), \end{aligned}$$

(2.2)

where ${{\mathcal {A}}}_{W_k,B_k}(x)=W_kx+B_k$, $x \in {\mathbb {R}}^{N_{k-1}}$, $k=1,\ldots ,L$.

2.2 Malliavin calculus

We prepare basic notation and definitions on Malliavin calculus following Bally [1] Ikeda and Watanabe [16], Malliavin [25], Malliavin and Thalmaier [26] and Nualart [29].

Let $\Omega ^d=\{ \omega : [0,T] \rightarrow {\mathbb {R}}^d; \ \omega \ \hbox {is continuous}, \ \omega (0)=0 \}$, $H^d=L^2([0,T],{\mathbb {R}}^d)$ and let $\mu ^d$ be the Wiener measure on $(\Omega ^d,\mathcal {B}(\Omega ^d))$, where $\mathcal {B}(\Omega ^d)$ is the Borel $\sigma $-field induced by the topology of the uniform convergence on [0, T]. We call $(\Omega ^d,H^d,\mu ^d)$ the d-dimensional Wiener space. For a Hilbert space V with the norm $\Vert \cdot \Vert _{V}$ and $p \in [1,\infty )$, the $L^p$-space of V-valued Wiener functionals is denoted by $L^p(\Omega ^d,V)$, that is, $L^p(\Omega ^d,V)$ is a real Banach space of all $\mu ^d$-measurable functionals $F: \Omega ^d \rightarrow V$ such that $\Vert F \Vert _p =E [\Vert F \Vert _V^p]^{1/p}< \infty $ with the identification $F = G$ if and only if $F(\omega )=G(\omega )$, a.s. When $V={\mathbb {R}}$, we write $L^p(\Omega ^d)$. For a real separable Hilbert space V and $F: \Omega ^d \rightarrow V$, we write $\Vert F \Vert _{p,V}=E [\Vert F\Vert _V^p]^{1/p}$, in particular, $\Vert F \Vert _{p}$ when $V={\mathbb {R}}$. Let $B^d=\{B^d_t\}_t$ be a coordinate process defined by $B^d_t(\omega )=\omega (t)$, $\omega \in \Omega ^d$, i.e. $B^d$ is a d-dimensional Brownian motion, and $B^d(h)$ be the Wiener integral $\textstyle {B^d(h)=\sum _{j=1}^d \int _{0}^{T} {h}^{j}(s) dB_s^{d,j}}$ for $h\in H^d$.

Let ${\mathscr {S}}(\Omega ^d)$ denote the class of smooth random variables of the form $F=f( B^d(h_{1}),\ldots ,B^d(h_{n}) )$ where $f\in C_{b}^{\infty } ( {\mathbb {R}}^{n},{\mathbb {R}}) $, $h_{1},\ldots ,h_{n}\in H^d$, $n\ge 1$. For $F\in {\mathscr {S}}(\Omega ^d)$, we define the derivative DF as the H-valued random variable $\textstyle {DF=\sum _{j=1}^{n}\partial _{j}f( B^d(h_{1}),\ldots ,B^d(h_{n}) ) h_{j}}$, which is regarded as the stochastic process:

$$\begin{aligned} D_{i,t}F=\textstyle {\sum \limits _{j=1}^{n}}\partial _{j}f( B^d(h_{1}),\ldots ,B^d(h_{n}) ) {h}^i_{j}(t), \ \ i=1,\ldots ,d, \ \ t \in [0,T]. \end{aligned}$$

(2.3)

For $F \in {\mathscr {S}}(\Omega ^d)$ and $j \in {\mathbb {N}}$, we set $D^j F$ as the $(H^d)^{\otimes j}$-valued random variable obtained by the j-times iteration of the operator D. For a real separable Hilbert space V, consider ${\mathscr {S}}_V$ of V-valued smooth Wiener functionals of the form $\textstyle {F = \sum _{i=1}^\ell F_i v_i}$, $v_i \in V$, $F_i \in {\mathscr {S}}(\Omega ^d)$, $i \le \ell $, $\ell \in {\mathbb {N}}$. Define $\textstyle {D^j F = \sum _{i=1}^\ell D^j F_i \otimes v_i}$, $j \in {\mathbb {N}}$. Then for $j \in {\mathbb {N}}$, $D^j$ is a closable operator from ${\mathscr {S}}_V$ into $L^p(\Omega ^d,(H^d)^{\otimes j} \otimes V)$ for any $p \in [1,\infty )$ (see p. 31 of Nualart [29]). For $k \in {\mathbb {N}}$, $p \in [1,\infty )$, we define $\textstyle {\Vert F \Vert ^p_{k,p,V}=E [\Vert F \Vert _V^p] + \sum _{j=1}^k E [ \Vert D^j F \Vert _{(H^d)^{\otimes j} \otimes V}^p ]}$, $F \in {\mathscr {S}}_V$. Then, the space ${\mathbb {D}}^{k,p}(\Omega ^d,V)$ is defined as the completion of ${\mathscr {S}}_V$ with respect to the norm $\Vert \cdot \Vert _{k,p,V}$. Moreover, let ${\mathbb {D}}^\infty (\Omega ^d,V)$ be the space of smooth Wiener functionals in the sense of Malliavin ${\mathbb {D}}^\infty (\Omega ^d,V) = \cap _{p\ge 1} \cap _{k\in {\mathbb {N}}} {\mathbb {D}}^{k,p}(\Omega ^d,V)$. We write ${\mathbb {D}}^{k,p}(\Omega ^d)$, $k \in {\mathbb {N}}$, $p \in [1,\infty )$ and ${\mathbb {D}}^\infty (\Omega ^d)$, when $V={\mathbb {R}}$. Let $\delta $ be an unbounded operator from $L^2(\Omega ^d,H^d)$ into $L^2(\Omega ^d)$ such that the domain of $\delta $, denoted by $\textrm{Dom}(\delta )$, is the set of $H^d$-valued square integrable random variables u such that $|E [\langle DF,u \rangle _{H^d}]| \le c\Vert F \Vert _{1,2}$ for all $F \in {\mathbb {D}}^{1,2}(\Omega ^d)$ where c is some constant depending on u, and if $u \in \textrm{Dom}(\delta )$, there exists $\delta (u) \in L^2(\Omega ^d)$ satisfying

$$\begin{aligned} E [\langle DF,u \rangle _{H^d}]=E[ F \delta (u) ] \end{aligned}$$

(2.4)

for any $F \in {\mathbb {D}}^{1,2}(\Omega ^d)$. For $u=(u^1,\ldots ,u^d) \in \textrm{Dom}(\delta )$, $\delta (u)=\textstyle {\sum _{i=1}^d} \delta ^{i}(u^i)$ is called the Skorohod integral of u, and it holds that $E[\textstyle {\int _0^T} D_{i,s}Fu^i_s ds]=E[F \delta ^i(u^i) ]$, $i=1,\ldots ,d$ for all $F \in {\mathbb {D}}^{1,2}$ (see Proposition 6 of Bally [1]). For all $k \in {\mathbb {N}} \cup \{ 0 \}$ and $p>1$, the operator $\delta $ is continuous from ${\mathbb {D}}^{k+1,p}(\Omega ^d,H^d)$ into ${\mathbb {D}}^{k,p}(\Omega ^d)$ (see Proposition 1.5.7 of Nualart [29]). For $G \in {\mathbb {D}}^{1,2}(\Omega ^d)$ and $h \in \textrm{Dom}(\delta )$ such that $Gh \in L^{2}(\Omega ^d,H^d)$, it holds that

$$\begin{aligned} \delta ^i(Gh^i)=G\delta ^i(h^i)-{\int _0^T} D_{i,s}Gh^i_sds, \quad i=1,\ldots ,d, \end{aligned}$$

(2.5)

and in particular, if $h \in \textrm{Dom}(\delta )$ is an adapted process, $\delta ^i(h^i)$ is given by the Itô integral, i.e. $\delta ^i(h^i)=\textstyle {\int _0^T} h^i_s dB_s^{d,i}$ for $i=1,\ldots ,d$ (e.g. see Section 3.1.1 of Bally [1], Proposition 1.3.3 and Proposition 1.3.11 of Nualart [29]).

For $F=(F^1,\ldots ,F^d) \in ({\mathbb {D}}^{\infty }(\Omega ^d))^d$, define the Malliavin covariance matrix of F, $\sigma ^F=(\sigma ^F_{ij})_{1 \le i,j \le d}$, by $\textstyle {\sigma ^F_{ij}=\langle DF^i,DF^j \rangle _{H^d}=\sum _{k=1}^d \int _0^T D_{k,s}F^i D_{k,s}F^j ds}$, $1\le i,j \le d$. We say that $F\in ({\mathbb {D}}^{\infty }(\Omega ^d))^d$ is nondegenerate if the matrix $\sigma ^F$ is invertible a.s. and satisfies $\Vert ( \det \sigma ^F)^{-1}\Vert _p < \infty $, $p>1$. Malliavin’s theorem claims that if $F \in ({\mathbb {D}}^\infty (\Omega ^d))^d$ is nondegenerate, then F has the smooth density $p^{F}(\cdot )$. Malliavin calculus is further refined by Watanabe’s theory. Let $\mathcal {S}({\mathbb {R}}^d)$ be the Schwartz space or the space of rapidly decreasing functions and $\mathcal {S}'({\mathbb {R}}^d)$ be the dual of $\mathcal {S}({\mathbb {R}}^d)$, i.e. $\mathcal {S}'({\mathbb {R}}^d)$ is the space of Schwartz tempered distributions. For a tempered distribution ${{\mathcal {T}}} \in \mathcal {S}'({\mathbb {R}}^d)$ and a nondegenerate Wiener functional in the sense of Malliavin $F \in ({\mathbb {D}}^\infty (\Omega ^d))^d$, ${{\mathcal {T}}}(F)={{\mathcal {T}}} \circ F$ is well-defined as an element of the space of Watanabe distributions ${\mathbb {D}}^{-\infty }(\Omega ^d)$, that is the dual space of ${\mathbb {D}}^{\infty }(\Omega ^d)$ (e.g. see p. 379, Corollary of Ikeda and Watanabe [16], Theorem of Chapter III 6.2 of Malliavin [25], Theorem 7.3 of Malliavin and Thalmaier [26]). Also, for $G \in {\mathbb {D}}^{\infty }(\Omega ^d)$, a (generalized) expectation $E[\mathcal{T}(F)G]$ is understood as a pairing of ${{\mathcal {T}}}(F)\in {\mathbb {D}}^{-\infty }(\Omega ^d)$ and $G\in {\mathbb {D}}^{\infty }(\Omega ^d)$, namely ${}_{{\mathbb {D}}^\infty }\langle {{\mathcal {T}}}(F),G \rangle {}_{{\mathbb {D}}^{-\infty }}$, and it holds that

$$\begin{aligned} {}_{{\mathbb {D}}^{-\infty }}\langle {{\mathcal {T}}}(F),G \rangle {}_{{\mathbb {D}}^\infty }={}_{{{\mathcal {S}}}'}\langle \mathcal{T},E[G|F=\cdot ] p^F(\cdot ) \rangle {}_{{{\mathcal {S}}}} \end{aligned}$$

(2.6)

where ${}_{{{\mathcal {S}}}'}\langle \cdot , \cdot \rangle _{{{\mathcal {S}}}} $ is the bilinear form on ${{\mathcal {S}}}'({\mathbb {R}}^d)$ and $\mathcal{S}({\mathbb {R}}^d)$, $E[ G | F= \xi ]$ is the conditional expectation of G conditioned on the set $\{ \omega ; F(\omega )= \xi \}$ (e.g. see Chapter III 6.2.2 of Malliavin [25], (7.5) of Theorem 7.3 of Malliavin and Thalmaier [26]). In particular, we have ${}_{{\mathbb {D}}^{-\infty }}\langle \delta _y (F),1 \rangle {}_{{\mathbb {D}}^\infty }={}_{{{\mathcal {S}}}'}\langle \delta _y, p^F(\cdot ) \rangle {}_{{{\mathcal {S}}}}=p^F(y)$ for $y \in {\mathbb {R}}^d$, and thus $p^F$ is not only smooth but also in $\mathcal {S}({\mathbb {R}}^d)$, i.e. a rapidly decreasing function (see Theorem 9.2 of Ikeda and Watanabe [16]), Proposition 2.1.5 of Nualart [29]). For a nondegenerate $F \in ({\mathbb {D}}^\infty (\Omega ^d))^d$, $G \in {\mathbb {D}}^\infty (\Omega ^d)$ and a multi-index $\gamma =(\gamma _1,\ldots ,\gamma _k)$, there exists $H_{\gamma }(F,G) \in {\mathbb {D}}^\infty (\Omega ^d)$ such that

$$\begin{aligned} {}_{{\mathbb {D}}^{-\infty }} \langle \partial ^{\gamma }{{\mathcal {T}}}(F),G \rangle {}_{{\mathbb {D}}^\infty }={}_{{\mathbb {D}}^{-\infty }}\langle \mathcal{T}(F),H_{\gamma }(F,G) \rangle {}_{{\mathbb {D}}^\infty } \end{aligned}$$

(2.7)

for all ${{\mathcal {T}}} \in {{\mathcal {S}}}'({\mathbb {R}}^d)$ (e.g. see Chapter 4.4 and Theorem 7.3 of Malliavin and Thalmaier [26]), where $H_{\gamma }(F,G)$ is given by $H_{\gamma }(F,G)=H_{(\gamma _k)}(F,H_{(\gamma _1,\ldots ,\gamma _{k-1})}(F,G))$ with

$$\begin{aligned}&H_{(i)}(F,G)=\delta (\textstyle {\sum _{j=1}^d} (\sigma ^{F})^{-1}_{ij} DF^j G). \end{aligned}$$

(2.8)

3 Main result

Let $a\in {\mathbb {R}}$, $b\in (a,\infty )$ and $T>0$. For $d \in {\mathbb {N}}$, consider the solution to the following stochastic differential equation (SDE) driven by a d-dimensional Brownian motion $B^d=(B^{d,1},\ldots ,B^{d,d})$ on the d-dimensional Wiener space $(\Omega ^d,H^d,\mu ^d)$:

$$\begin{aligned} dX_t^{d,\lambda ,x}= \mu ^{\lambda }_{d}(X_t^{d,\lambda ,x})dt+ \sigma ^{\lambda }_{d}(X_t^{d,\lambda ,x})dB_t^{d}, \quad X_0^{d,\lambda ,x}=x \in {\mathbb {R}}^d, \end{aligned}$$

(3.1)

where $\mu ^{\lambda }_{d}: {\mathbb {R}}^d \rightarrow {\mathbb {R}}^d$ and $\sigma ^{\lambda }_{d}: {\mathbb {R}}^d \rightarrow {\mathbb {R}}^{d \times d}$ are Lipschitz continuous functions depending on a parameter $\lambda \in (0,1]$. The solution $X_t^{d,\lambda ,x}=(X_t^{d,\lambda ,x,1},\ldots ,X_t^{d,\lambda ,x,d})$ is equivalently written in the integral form as:

$$\begin{aligned} X_t^{d,\lambda ,x,j}=x_j + \int _0^t \mu ^{\lambda ,j}_{d}(X_s^{d,\lambda ,x})ds+ \sum _{i=1}^d \int _0^t \sigma ^{\lambda ,j}_{d,i}(X_s^{d,\lambda ,x})dB_s^{d,i}, \quad X_0^{d,\lambda ,x,j}=x_j \in {\mathbb {R}},\nonumber \\ \end{aligned}$$

(3.2)

for $j=1,\ldots ,d$. Furthermore, for a given appropriate continuous function $f_d: {\mathbb {R}}^d \rightarrow {\mathbb {R}}$ and for $\lambda \in (0,1]$, we consider $u_\lambda ^d \in C([0,T] \times {\mathbb {R}}^d,{\mathbb {R}})$ given by

$$\begin{aligned} u_\lambda ^d(t,x)=E[f_d(X_t^{d,\lambda ,x})] \end{aligned}$$

(3.3)

for $t \in [0,T]$ and $x \in {\mathbb {R}}^d $, which is a solution of Kolmogorov PDE:

$$\begin{aligned} \partial _t u_\lambda ^d(t,x)={{\mathcal {L}}}^{d,\lambda } u_\lambda ^d(t,x), \end{aligned}$$

(3.4)

for all $(t,x) \in (0,T) \times {\mathbb {R}}^d$ and $u_\lambda ^d(0,\cdot )=f_{d}(\cdot )$, where ${{\mathcal {L}}}^{d,\lambda }$ is the following second order differential operator:

$$\begin{aligned} {{\mathcal {L}}}^{d,\lambda }=\sum _{j=1}^d \mu _{d}^{\lambda ,j}(\cdot )\frac{\partial }{\partial x_j}+\frac{1}{2}\sum _{i,j_1,j_2=1}^d \sigma _{d,i}^{\lambda ,j_1}(\cdot ) \sigma _{d,i}^{\lambda ,j_2}(\cdot ) \frac{\partial ^2}{\partial x_{j_1} \partial x_{j_2}}. \end{aligned}$$

(3.5)

Our purpose is to show a new spatial approximation scheme of $u_\lambda ^d(t,\cdot )$ for $t>0$ by using asymptotic expansion and deep neural network approximation. The main theorem (Theorem 1) is stated at the end of this section.

3.1 Asymptotic expansion

We first put the following assumptions on $\{ \mu ^{\lambda }_{d} \}_{\lambda \in (0,1]}$, $\{ \sigma ^{\lambda }_{d} \}_{\lambda \in (0,1]}$ and $f_d$.

Assumption 1

(Assumptions for the family of SDEs and asymptotic expansion) Let $C>0$. For $d \in {\mathbb {N}}$, let $\{ \mu ^{\lambda }_{d} \}_{\lambda \in (0,1]} \subset C_{Lip}({\mathbb {R}}^d,{\mathbb {R}}^{d})$ and $\{ \sigma ^{\lambda }_{d} \}_{\lambda \in (0,1]} \subset C_{Lip}({\mathbb {R}}^d,{\mathbb {R}}^{d\times d})$ be families of functions, and $f_d \in C_{Lip}({\mathbb {R}}^d,{\mathbb {R}})$ be a function satisfying

1.
there are $V_{d,0} \in C_b^\infty ({\mathbb {R}}^d,{\mathbb {R}}^d)$ and $V_{d}=(V_{d,1},\ldots ,V_{d,d}) \in C_b^\infty ({\mathbb {R}}^d,{\mathbb {R}}^{d\times d})$ such that (i) $\mu ^\lambda _{d}=\lambda V_{d,0}$ and $\sigma ^\lambda _{d}=\lambda V_{d}$ for all $\lambda \in (0,1]$, (ii) $C_{Lip}[V_{d,0}] \vee C_{Lip}[V_{d}]=C$ and $\Vert V_{d,0}(0) \Vert \vee \Vert V_{d}(0) \Vert \le C$, (iii) $\Vert \partial ^{\alpha } V_{d,i} \Vert _{\infty } \le C$ for any multi-index $\alpha $ and $i=0,1,\ldots ,d$;
2.
$\textstyle {\sum _{i=1}^d} \sigma ^\lambda _{d,i}(x) \otimes \sigma ^\lambda _{d,i}(x) \ge \lambda ^2 I_{d}$ for all $x \in {\mathbb {R}}^d$ and $\lambda \in (0,1]$;
3.
$C_{Lip}[f_d]= C$ and $\Vert f_d(0) \Vert \le C$.

Remark 1

Assumption 1 justify an asymptotic expansion under the uniformly elliptic condition for the solutions of the perturbed systems of PDEs. Assumption 1.3 is also useful for constructing deep neural network approximations for the family of PDE solutions.

From Assumption 1.2, we may write each SDE (3.1) for $d \in {\mathbb {N}}$ as

$$\begin{aligned} dX_t^{d,\lambda ,x}= \lambda \sum _{i=0}^d V_{d,i}(X_t^{d,\lambda ,x})dB_t^{d,i}, \end{aligned}$$

(3.6)

with $X_0^{d,\lambda ,x}=x \in {\mathbb {R}}^d$, where the notation $dB_t^{d,0}=dt$ is used. We define

$$\begin{aligned} {\mathbb {B}}_t^{d,\alpha }=\int _{0<t_1<\cdots<t_k<t} dB_{t_1}^{d,\alpha _1}\cdots dB_{t_k}^{d,\alpha _k}, \ \ t\ge 0, \ \alpha \in \{0,1,\ldots ,d \}^k, \ k \in {\mathbb {N}}, \end{aligned}$$

(3.7)

and $\textstyle {L_{d,0}{=}\sum _{j=1}^d V_{d,0}^{j}(\cdot )\frac{\partial }{\partial x_j}{+}\frac{1}{2}\sum _{i,j_1,j_2=1}^d V_{d,i}^{j_1}(\cdot ) V_{d,i}^{j_2}(\cdot ) \frac{\partial ^2}{\partial x_{j_1}\partial x_{j_2}}}$, $\textstyle {L_{d,i}{=}\sum _{j=1}^d V_{d,i}^{j}(\cdot )\frac{\partial }{\partial x_j}}$, $i=1,\ldots ,d$. We define

$$\begin{aligned} \bar{X}_t^{d,\lambda ,x}=x+\lambda \sum _{i=0}^d V_{d,i}(x)B_t^{d,i}. \end{aligned}$$

(3.8)

Proposition 1

(Asymptotic expansion and the error bound) For $m \in {\mathbb {N}} \cup \{ 0 \} $, there exists $c >0$ such that for all $d\in {\mathbb {N}}$, $t>0$, $\lambda \in (0,1]$,

$$\begin{aligned}&\sup _{x \in [a,b]^d}\Big |E [f_d(X_t^{d,\lambda ,x})]-\Big \{ E\Big [f_{d}(\bar{X}_t^{d,\lambda ,x})] \nonumber \\&\qquad + \sum _{j=1}^m \lambda ^j E\Big [f_{d}(\bar{X}_t^{d,\lambda ,x}) \sum _{\beta ^{(k)},\gamma ^{(k)}}^{(j)} H_{\gamma ^{(k)}} \Big (\sum _{i=0}^dV_{d,i}(x)B_t^{d,i},\prod _{\ell =1}^k \sum _{ | \alpha |=\beta _\ell } \hat{V}_{d,\alpha }^{\gamma _\ell }(x) {\mathbb {B}}_t^{d,\alpha } \Big ) \Big ]\Big \}\Big | \nonumber \\&\quad \le \ c d^c \lambda ^{m+1} t^{(m+1)/2}, \end{aligned}$$

(3.9)

where $\hat{V}_{d,\alpha }^{e}(x)=L_{d,\alpha _1}\cdots L_{d,\alpha _{r-1}}V_{d,\alpha _r}^{e}(x)$, $e\in \{1,\ldots ,d \}$, $\alpha \in \{1,\ldots ,d \}^p$, and

$$\begin{aligned} \sum _{\beta ^{(k)},\gamma ^{(k)}}^{(j)}=\sum _{k=1}^j \sum _{\beta ^{(k)}=(\beta _1,\ldots ,\beta _k) \ s.t. \ \beta _1+\cdots +\beta _k=j+k,\beta _i\ge 2}\sum _{\gamma ^{(k)}=(\gamma _1,\ldots ,\gamma _k)\in \{1,\ldots ,d \}^k} \frac{1}{k!}, \quad j\ge 1. \end{aligned}$$

(3.10)

Proof

See Sect. 4. $\square $

The weights in the expansion terms in Proposition 1 can be represented by some polynomials of Brownian motion. We show it through distribution theory on Wiener space. Let $d \in {\mathbb {N}}$, for $t \in (0,T]$ and $\alpha =(\alpha _1,\ldots ,\alpha _k)\in \{0,1,\ldots ,d \}^k$, $k \in {\mathbb {N}} \cap [2,\infty )$, let

$$\begin{aligned} \textbf{B}_t^{d,\alpha } =\delta ^{\alpha _k}(\textbf{B}_t^{d,(\alpha _1,\ldots ,\alpha _{k-1})})=B_t^{d,\alpha _k}\textbf{B}_t^{d,(\alpha _1,\ldots ,\alpha _{k-1})}-\int _0^t D_{\alpha _k,s}\textbf{B}_t^{d,(\alpha _1,\ldots ,\alpha _{k-1})} ds, \end{aligned}$$

(3.11)

with $\textbf{B}_t^{d,(\alpha _1)}=B_t^{d,\alpha _1}$, which can be obtained by (2.5). For example, we have $\textbf{B}_t^{d,(\alpha _1,\alpha _2)}=B_t^{d,\alpha _1}B_t^{d,\alpha _2}-t \textbf{1}_{\alpha _1=\alpha _2\ne 0}$ for $\alpha =(\alpha _1,\alpha _2) \in \{0,1,\ldots ,d \}^2$. Let $\sigma _\ell \in {\mathbb {R}}^d$, $\ell =0,1,\ldots ,d$ and $\Sigma $ be a matrix given by $\Sigma _{i,j}=\textstyle {\sum _{\ell =1}^d} \sigma _\ell ^i \sigma _\ell ^j$, $1\le i,j \le d$ and satisfying $\det \Sigma >0$. Let ${{\mathcal {T}}} \in {{\mathcal {S}}}'({\mathbb {R}}^d)$. We show an efficient computation of $\textstyle {{}_{{\mathbb {D}}^{-\infty }} \langle \mathcal{T} (\sum _{i=0}^d \sigma _i B_t^{d,i} ), H_{\gamma } (\sum _{i=0}^d \sigma _i B_t^{d,i},{\mathbb {B}}_t^{d,\alpha } )\rangle {}_{{\mathbb {D}}^\infty }}$ in order to give a polynomial representation of the Malliavin weights in the expansion terms of the asymptotic expansion in Proposition 1. Note that we have

$$\begin{aligned}&{}_{{\mathbb {D}}^{-\infty }} \Bigg \langle {{\mathcal {T}}} \Bigg (\sum _{i=0}^d \sigma _i B_t^{d,i} \Bigg ), H_{\gamma } \Bigg (\sum _{i=0}^d \sigma _i B_t^{d,i},{\mathbb {B}}_t^{d,\alpha } \Bigg )\Bigg \rangle {}_{{\mathbb {D}}^\infty } ={}_{{\mathbb {D}}^{-\infty }} \Bigg \langle \partial ^\gamma {{\mathcal {T}}} \Bigg (\sum _{i=0}^d \sigma _i B_t^{d,i} \Bigg ), {\mathbb {B}}_t^{d,\alpha } \Bigg \rangle {}_{{\mathbb {D}}^\infty } \nonumber \\&\quad = {}_{{{\mathcal {S}}}'}\langle \partial ^\gamma {{\mathcal {T}}}(\sigma _0 B_t^{d,0}+\sigma \ \cdot ), E[{\mathbb {B}}_t^{d,\alpha }|B_t^d= \ \cdot \ ]p^{B_t^d}(\cdot ) \rangle _{{{\mathcal {S}}}}, \end{aligned}$$

(3.12)

by (2.7) and (2.6), where $\sigma $ is the matrix $\sigma =(\sigma _1,\ldots ,\sigma _d)$, and for $y \in {\mathbb {R}}^d$, it holds that

$$\begin{aligned} E[{\mathbb {B}}_t^{d,\alpha }|B_t^d=y]p^{B_t^d}(y)={}_{{{\mathcal {S}}}'}\langle \delta _y, E[{\mathbb {B}}_t^{d,\alpha }|B_t^d= \ \cdot \ ]p^{B_t^d}(\cdot ) \rangle _{{{\mathcal {S}}}}={}_{{\mathbb {D}}^{-\infty }} \langle \delta _y (B_t^{d} ), {\mathbb {B}}_t^{d,\alpha } \rangle {}_{{\mathbb {D}}^{\infty }}, \end{aligned}$$

by (2.6). Also, one has

$$\begin{aligned} {}_{{\mathbb {D}}^{-\infty }} \langle \delta _y (B_t^{d} ) {\mathbb {B}}_t^{d,\alpha } \rangle {}_{{\mathbb {D}}^\infty }&= {}_{{\mathbb {D}}^{-\infty }} \langle \partial ^{\alpha ^\star } \delta _y(B_t^{d} ),1 \rangle {}_{{\mathbb {D}}^\infty } \frac{1}{k!}t^{k} \nonumber \\&= {}_{{\mathbb {D}}^{-\infty }} \langle \delta _y(B_t^{d}), H_{\alpha ^\star }(B_t^{d},1) \rangle {}_{{\mathbb {D}}^{\infty }}\frac{1}{k!}t^{k} ={}_{{\mathbb {D}}^{-\infty }} \langle \delta _y(B_t^{d} ), \frac{1}{k!} \textbf{B}_t^{d,\alpha } \rangle {}_{{\mathbb {D}}^\infty }, \end{aligned}$$

(3.13)

by (2.5), (2.7) and (2.8), where $\alpha ^\star $ is a multi-index such that $\alpha ^{\star }=(\alpha ^{\star }_1,\ldots ,\alpha ^{\star }_{\ell (\alpha )})=(\alpha _{j_1}, \ldots ,\alpha _{j_{\ell (\alpha )}})$ satisfying $\ell (\alpha )=\# \{ i; \alpha _i\ne 0 \}$ and $\alpha _{j_i} \ne 0$, $i=1,\ldots ,\ell (\alpha )$. Then, we have

$$\begin{aligned}&{}_{{\mathbb {D}}^{-\infty }}\Bigg \langle {{\mathcal {T}}} \Bigg (\sum _{i=0}^d \sigma _i B_t^{d,i} \Bigg ), H_{\gamma } \Bigg (\sum _{i=0}^d \sigma _i B_t^{d,i},{\mathbb {B}}_t^{d,\alpha } \Bigg )\Bigg \rangle {}_{{\mathbb {D}}^\infty }\nonumber \\&\quad ={}_{{{\mathcal {S}}}'}\Bigg \langle \partial ^\gamma {{\mathcal {T}}}\Bigg (\sigma _0 B_t^{d,0}+\sigma \ \cdot \Bigg ),\frac{1}{k!} E[\textbf{B}_t^{d,\alpha } |B_t^d= \ \cdot \ ]p^{B_t^d}(\cdot ) \Bigg \rangle _{{{\mathcal {S}}}} \nonumber \\&\quad = {}_{{\mathbb {D}}^{-\infty }}\Bigg \langle \partial ^\gamma {{\mathcal {T}}} \Bigg (\sum _{i=0}^d \sigma _i B_t^{d,i} \Bigg ), \frac{1}{k!} \textbf{B}_t^{d,\alpha } \Bigg \rangle {}_{{\mathbb {D}}^{\infty }} = {}_{{\mathbb {D}}^{-\infty }}\Bigg \langle {{\mathcal {T}}} \Bigg (\sum _{i=0}^d \sigma _i B_t^{d,i} \Bigg ),\nonumber \\&\quad H_{\gamma } \Bigg (\sum _{i=0}^d \sigma _i B_t^{d,i}, \frac{1}{k!} \textbf{B}_t^{d,\alpha } \Bigg ) \Bigg \rangle {}_{{\mathbb {D}}^\infty } \nonumber \\&\quad {=} {}_{{\mathbb {D}}^{{-}\infty }} \Bigg \langle {{\mathcal {T}}} \Bigg (\sum _{i{=}0}^d \sigma _i B_t^{d,i} \Bigg ), \sum _{j_1,\ldots ,j_{|\gamma |},\beta _{1},\ldots ,\beta _{|\gamma |}=1}^d \frac{1}{t^{|\gamma |}} \prod _{q=1}^{|\gamma |} \Sigma _{\gamma _q,j_q}^{-1} \sigma _{\beta _{q}}^{j_q} \frac{1}{k!} \textbf{B}_t^{d,(\alpha _1,\ldots ,\alpha _{k},\beta _1,\ldots ,\beta _{|\gamma |})} \Bigg \rangle {}_{{\mathbb {D}}^\infty }, \end{aligned}$$

(3.14)

where, we iteratively used (2.5), (2.6), (2.7) and (2.8). An explicit polynomial representation of the asymptotic expansion is derived through (3.14). For instance, the first order expansion ($m=1$) as follows:

(First order asymptotic expansion with Malliavin weight)

$$\begin{aligned}&E\Big [f_{d}(\bar{X}_t^{d,\lambda ,x})\Big \{1 + \lambda \sum _{\ell =1}^d H_{(\ell )} \Big (\sum _{i=0}^dV_{d,i}(x)B_t^{d,i}, \sum _{\alpha _1,\alpha _2=0}^d L_{d,\alpha _1}{V}_{d,\alpha _2}^{\ell }(x) {\mathbb {B}}_t^{d,(\alpha _1,\alpha _2)} \Big ) \Big \} \Big ]\\&\quad = E\Big [f_{d}(\bar{X}_t^{d,\lambda ,x})\Big ] + \lambda \sum _{\ell =1}^d \int _{{\mathbb {R}}^d} f_d(x+\lambda y) \sum _{\alpha _1,\alpha _2=0}^d L_{d,\alpha _1}{V}_{d,\alpha _2}^{\ell }(x) \\&{}_{{\mathbb {D}}^{-\infty }} \Big \langle \delta _y\left( \sum _{i=0}^dV_{d,i}(x)B_t^{d,i}\right) , H_{(\ell )} \Big (\sum _{i=0}^dV_{d,i}(x)B_t^{d,i}, {\mathbb {B}}_t^{d,(\alpha _1,\alpha _2)} \Big ) \Big \rangle {}_{{\mathbb {D}}^{\infty }} dy\\&\quad = E\Big [f_{d}(\bar{X}_t^{d,\lambda ,x})\Big ] + \lambda \sum _{\ell =1}^d \int _{{\mathbb {R}}^d} f_d(x+\lambda y) \sum _{\alpha _1,\alpha _2=0}^d L_{d,\alpha _1}{V}_{d,\alpha _2}^{\ell }(x) \\&{}_{{\mathbb {D}}^{-\infty }} \Big \langle \delta _y\left( \sum _{i=0}^dV_{d,i}(x)B_t^{d,i}\right) , \sum _{\alpha _3=1}^d \sum _{j=1}^d \frac{1}{2t} [A_d^{-1}]_{\ell j}(x) V^j_{d,\alpha _3}(x) \textbf{B}_t^{d,(\alpha _1,\alpha _2,\alpha _3)} \Big \rangle {}_{{\mathbb {D}}^{\infty }} dy\\&\quad = E\left[ f_{d}(\bar{X}_t^{d,\lambda ,x}) \left\{ 1 + \lambda \sum _{\ell ,j=1}^d \sum _{\alpha _1,\alpha _2=0}^d \sum _{\alpha _3=1}^d L_{d,\alpha _1}{V}_{d,\alpha _2}^{\ell }(x) \frac{1}{2t} [A_d^{-1}]_{\ell j}(x) V^j_{d,\alpha _3}(x) \textbf{B}_t^{d,(\alpha _1,\alpha _2,\alpha _3)}\right\} \right] . \end{aligned}$$

Thus, the first order expansion is expressed with a Malliavin weight given by third order polynomials of Brownian motion. In general, we have the following representation.

Proposition 2

For $m \in {\mathbb {N}}$, $d \in {\mathbb {N}}$, $\lambda \in (0,1]$, $t \in (0,T]$ and $x \in {\mathbb {R}}^d$, there exists a Malliavin weight ${{\mathcal {M}}}^m_{d,\lambda }(t,x,B_t^d)$ such that

$$\begin{aligned}&E[f_{d}(\bar{X}_t^{d,\lambda ,x}) {{\mathcal {M}}}^m_{d,\lambda }(t,x,B_t^d) ] \nonumber \\&\quad = E\Big [f_{d}(\bar{X}_t^{d,\lambda ,x})\Big \{1 + \sum _{j=1}^m \lambda ^j \sum _{\beta ^{(k)},\gamma ^{(k)}}^{(j)} H_{\gamma ^{(k)}} \Big (\sum _{i=0}^dV_{d,i}(x)B_t^{d,i},\prod _{\ell =1}^k \sum _{ | \alpha |=\beta _\ell } \hat{V}_{d,\alpha }^{\gamma _\ell }(x) {\mathbb {B}}_t^{d,\alpha } \Big ) \Big \} \Big ], \end{aligned}$$

(3.15)

and

$$\begin{aligned} {{\mathcal {M}}}^m_{d,\lambda }(t,x,B_t^d)=\textstyle {1+\sum _{e\le n(m)}} \lambda ^{p(e)} g_e(t) h_{e}(x)\textrm{Poly}_e({B}_t^{d}) \end{aligned}$$

(3.16)

for some integers $n(m)\in {\mathbb {N}}$ and $p(e) \in {\mathbb {N}}$, $e=1,\ldots ,n(m)$, polynomials $\textrm{Poly}_e:{\mathbb {R}}^d \rightarrow {\mathbb {R}}$, $e=1,\ldots ,n(m)$, continuous functions $g_e: (0,T] \rightarrow {\mathbb {R}}$, $e=1,\ldots ,n(m)$, and continuous functions $h_{e}:{\mathbb {R}}^d \rightarrow {\mathbb {R}}$, $e=1,\ldots ,n(m)$ constructed by some products of $A^{-1}_{d}$, $\{V_{d,i}\}_{0\le i \le d}$ and $\{ \partial ^\alpha V_{d,i}\}_{0\le i \le d,\alpha \in \{1,\ldots ,d \}^{\ell },\ell \le 2m}$ given in Assumption 1 of the form:

$$\begin{aligned} x \mapsto h_{e}(x)=c_e \textstyle {\prod \limits _{\ell =1}^{q_e}} L_{d,\alpha ^e_{\ell ,1}} \cdots L_{d,\alpha ^e_{\ell ,p^e_\ell -1}} {V}_{d,\alpha ^e_{\ell ,p^e_\ell }}^{\gamma ^e_{\ell }}(x) \textstyle {\sum \limits _{\xi ,\iota =1}^d} [A^{-1}_{d}]_{\gamma ^e_{\ell },\xi }(x)V_{d,\iota }^{\xi }(x) \end{aligned}$$

(3.17)

with some constants $c_e \in (0,\infty )$, $q_e \in {\mathbb {N}}$ and some multi-indices $(\gamma ^e_{1},\ldots ,\gamma ^e_{\ell }) \in \{1,\ldots ,d \}^{\ell }$ and $(\alpha ^e_{\ell ,1},\ldots ,\alpha ^e_{\ell ,p^e_\ell }) \in \{0,1,\ldots ,d \}^{p^e_\ell }$ with $p^e_\ell \in {\mathbb {N}}$, $\ell =1,\ldots ,e$, which satisfies that for $p\ge 1$,

$$\begin{aligned} \sup _{(t,x)\in (0,T] \times [a,b]^d, \lambda \in (0,1]}\Vert \mathcal{M}^{m}_{d,\lambda }(t,x,B_t^d) \Vert _p \le cd^c \ \ \end{aligned}$$

(3.18)

for some constant $c>0$ independent of d.

Proof

See Sect. 4. $\square $

Remark 2

(Remark on computation of Malliavin weights) Malliavin weight is initially used in Fournie et al. [7] in sensitivity analysis in financial mathematics, especially in Monte-Carlo computation of “Greeks". Then a discretization scheme for probabilistic automatic differentiation using Malliavin weights is analyzed in Gobet and Munos [10]. The computation of asymptotic expansion with Malliavin weights is developed in Takahashi and Yamada [35, 37], and is further extended to weak approximation of SDEs in Takahashi and Yamada [38]. Note that a PDE expansion is shown in Takahashi and Yamada [36] to partially connect it with the stochastic calculus approach. The computation method of the expansion with Malliavin weights is improved in Yamada [41], Yamada and Yamamoto [42], Naito and Yamada [27, 28], Iguchi and Yamada [17, 18], and Takahashi et al. [34] where technique of stochastic calculus is refined. The main advantages of the stochastic calculus approach are that (i) it provides efficient computation scheme using Watanabe distributions on Wiener space as in (3.13) and (3.14), (ii) it enables us to give precise bounds for approximations of expectations or the corresponding solutions of PDEs. Actually, the computational effort of the expansions is much reduced in the sense that Itô’s iterated integrals are transformed into simple polynomials of Brownian motion, and also the desired deep neural network approximation will be obtained in the next subsection through the approach.

3.2 Deep neural network approximation

In order to construct a deep neural network approximation for the function with respect to the space variable of the asymptotic expansion, i.e. $x \mapsto E[f_{d}(\bar{X}_t^{d,\lambda ,x}) \mathcal{M}^m_{d,\lambda }(t,x,B_t^d) ]$, we consider the further assumptions.

Assumption 2

(Assumptions for deep neural network approximation) Suppose that Assumption 1 holds. There exist a constant $\kappa >0$ and sets of networks $\{ \psi _{\varepsilon ,d}^{V_{d,i}} \}_{\varepsilon \in (0,1),d \in {\mathbb {N}},i \in \{0,1,\ldots ,d\}} \subset {{\mathcal {N}}}$, $\{ \psi _{\varepsilon ,d}^{\partial ^\alpha V_{d,i}} \}_{\varepsilon \in (0,1),d \in {\mathbb {N}},i \in \{0,1,\ldots ,d\},\alpha \in \{1,\ldots ,d\}^{{\mathbb {N}}}} \subset {{\mathcal {N}}}$, $\{ \psi _{\varepsilon }^{A_d^{-1}} \}_{\varepsilon \in (0,1),d \in {\mathbb {N}}} \subset {{\mathcal {N}}}$ and $\{ \psi _{\varepsilon }^{f_d} \}_{\varepsilon \in (0,1),}{d \in {\mathbb {N}}} \subset {{\mathcal {N}}}$ such that

1.
for all $\varepsilon \in (0,1)$, $d \in {\mathbb {N}}$, ${{\mathcal {C}}}(\psi _{\varepsilon ,d}^{V_{d,i}}) \le \kappa d^\kappa \varepsilon ^{-\kappa }$, $i=0,1,\ldots ,d$, ${{\mathcal {C}}}(\psi _{\varepsilon ,d}^{\partial ^\alpha {V}_{d,i}}) \le \kappa d^\kappa \varepsilon ^{-\kappa }$, $i=0,1,\ldots ,d$, $\alpha \in \{1,\ldots ,d \}^\ell $, $\ell \in {\mathbb {N}}$, ${{\mathcal {C}}}(\psi _{\varepsilon }^{A_d^{-1}}) \le \kappa d^\kappa \varepsilon ^{-\kappa }$, and ${{\mathcal {C}}}(\psi _{\varepsilon }^{f_d}) \le \kappa d^\kappa \varepsilon ^{-\kappa }$;
2.
for all $\varepsilon \in (0,1)$, $d \in {\mathbb {N}}$, $x \in {\mathbb {R}}^d$, $\Vert V_{d,i}(x)-V_{d,i}^{\varepsilon }(x)\Vert \le \varepsilon \kappa d^\kappa $, $i=0,1,\ldots ,d$, and $\Vert \partial ^\alpha V_{d,i}(x)-V_{d,i,\alpha }^{\varepsilon }(x)\Vert \le \varepsilon \kappa d^\kappa $, $i=0,1,\ldots ,d$, $\alpha \in \{1,\ldots ,d \}^\ell $, $\ell \in {\mathbb {N}}$, where $V_{d,i}^{\varepsilon }={{\mathcal {R}}}(\psi _{\varepsilon }^{V_{d,i}}) \in C({\mathbb {R}}^d,{\mathbb {R}}^{d})$ and $V_{d,i,\alpha }^{\varepsilon }={{\mathcal {R}}}(\psi _{\varepsilon }^{\partial ^\alpha V_{d,i}}) \in C({\mathbb {R}}^d,{\mathbb {R}}^{d})$;
3.
for all $\varepsilon \in (0,1)$, $d \in {\mathbb {N}}$, $x \in {\mathbb {R}}^d$, $\Vert A_d^{-1}(x)-A_{d,\varepsilon }^{-1}(x)\Vert \le \varepsilon \kappa d^\kappa $, where $A_d^{-1}(\cdot )$ is the inverse matrix of $A_d(\cdot ):=\textstyle {\sum _{i=1}^d} V_{d,i}(\cdot ) \otimes V_{d,i}(\cdot )$ and $A_{d,\varepsilon }^{-1}={{\mathcal {R}}}(\psi _{\varepsilon }^{A_{d}^{-1}}) \in C({\mathbb {R}}^d,{\mathbb {R}}^{d \times d})$, and for all $\varepsilon \in (0,1)$, $d \in {\mathbb {N}}$, $\textstyle {\sup _{x\in [a,b]^d}}\Vert A_{d,\varepsilon }^{-1}(x)\Vert \le \kappa d^\kappa $;
4.
for all $\varepsilon \in (0,1)$, $d \in {\mathbb {N}}$, $x \in {\mathbb {R}}^d$, $|f_d(x)-f_d^{\varepsilon }(x)|\le \varepsilon \kappa d^\kappa $, where $f_d^{\varepsilon }={{\mathcal {R}}}(\psi _{\varepsilon }^{f_d}) \in C({\mathbb {R}}^d,{\mathbb {R}})$.

Remark 3

Assumption 2 provides the deep neural network approximation of the asymptotic expansion with an appropriate complexity. Note that Assumption 1.1, 1.3, 2.2 and 2.4 give that there exists $\eta >0$ such that $\textstyle {|f_d^{\varepsilon }(x)| \le \eta d^\eta (1+\Vert x \Vert )}$ for all $\varepsilon \in (0,1)$, $d \in {\mathbb {N}}$, and $\textstyle {\sup _{x\in [a,b]^d}}\Vert V_{d,i}^{\varepsilon }(x)\Vert \le \eta d^\eta $ for all $i=0,1,\ldots ,d$, $\textstyle {\sup _{x\in [a,b]^d}}\Vert V_{d,i,\alpha }^{\varepsilon }(x)\Vert \le \eta d^\eta $ for all $i=0,1,\ldots ,d$, $\alpha \in \{1,\ldots ,d \}^\ell $ with $\ell \in {\mathbb {N}}$. In the following, Assumption 2.2, 2.3 and 2.4 plays an important role for the analysis of “product of neural networks" in the construction of the approximation with asymptotic expansion.

Remark 4

In particular, Assumption 2.3 is satisfied for the cases $A_d(x)=I_d$ and $A_d(x)=s(d)I_d$ with a function $s:{\mathbb {N}} \rightarrow {\mathbb {R}}$. For instance, the case $A_d(x)=I_d$ corresponds to the d-dimensional heat equation when $V_{d,0}\equiv 0$. Also, the SDEs with the diffusion matrix $V_d=(1/\sqrt{d})I_d$ discussed in Section 5.1 and Section 5.2 of [9] and Section 5.2 of [13] are examples of (3.1) (or (3.6)). For those cases, the neural network approximations in Assumption 2 are not necessary, since $V_{d,i}$, $i=1,\ldots ,d$ and hence $A_d$ do not depend on the state variable x, whence $\textstyle {V_{d,i,\varepsilon }}$ and $\textstyle {A^{-1}_{d,\varepsilon }}$ are $V_{d,i}$ and $A^{-1}_{d}$ themselves. Furthermore, in such cases (e.g. the high-dimensional heat equations) the asymptotic expansion will be simply obtained (usually as the Gaussian approximation), which are exactly reduced to the methods in Beck et al. [2] and Gonon et al. [11].

The main result of the paper is summarized as follows.

Theorem 1

(Deep learning-based asymptotic expansion overcomes the curse of dimensionality) Suppose that Assumptions 1 and 2 hold. Let $m \in {\mathbb {N}}$. For $d \in {\mathbb {N}}$, consider the SDE (3.1) on the d-dimensional Wiener space and let $u_\lambda ^d \in C ([0,T] \times {\mathbb {R}}^d, {\mathbb {R}})$ given by (3.3) be a solution to the Kolmogorov PDE (3.4). Then we have

$$\begin{aligned} \sup _{x \in [a,b]^d}|u_{\lambda }^d(t,x)-E[f_{d}(\bar{X}_t^{d,\lambda ,x})\mathcal{M}^m_{d,\lambda }(t,x,B_t^d)]|=O(\lambda ^{m+1} t^{(m+1)/2}). \end{aligned}$$

(3.19)

Furthermore, for $t \in (0,T]$ and $\lambda \in (0,1]$, there exist $\{ \phi ^{\varepsilon ,d} \}_{\varepsilon \in (0,1),d \in {\mathbb {N}}} \subset {{\mathcal {N}}}$ and $c>0$ which depend only on $a,b,C,m,\kappa ,t$ and $\lambda $, such that for all $\varepsilon \in (0,1)$ and $d\in {\mathbb {N}}$, we have ${{\mathcal {R}}}(\phi ^{\varepsilon ,d}) \in C({\mathbb {R}}^d,{\mathbb {R}})$, ${{\mathcal {C}}}(\phi ^{\varepsilon ,d})\le c \varepsilon ^{-c}d^c$ and

$$\begin{aligned} \sup _{x \in [a,b]^d}|E[f_{d}(\bar{X}_t^{d,\lambda ,x})\mathcal{M}^m_{d,\lambda }(t,x,B_t^d)]-{{\mathcal {R}}}(\phi ^{\varepsilon ,d})(x)|\le \varepsilon . \end{aligned}$$

(3.20)

Proof

See Sect. 4. $\square $

We provide the weight ${{\mathcal {M}}}^m_{d,\lambda }(t,x,B_t^d)$ with $m=0,1$ in Theorem 1 for our scheme (the expression for general m will be given in Sect. 4 below). That is, for $d \in {\mathbb {N}}$, $\lambda \in (0,1]$, $t>0$ and $x \in {\mathbb {R}}^d$,

$$\begin{aligned} {{\mathcal {M}}}^0_{d,\lambda }(t,x,B_t^d)&=1, \end{aligned}$$

(3.21)

$$\begin{aligned} {{\mathcal {M}}}^1_{d,\lambda }(t,x,B_t^d)&=1+\lambda \sum _{\alpha _1,\alpha _2=0}^d \sum _{\alpha _3=1}^d \sum _{\ell ,j=1}^d \frac{1}{2t} L_{d,\alpha _1}V_{d,\alpha _2}^{\ell }(x) [A_d^{-1}]_{\ell j}(x) V^j_{d,\alpha _3}(x) \nonumber \\&\quad \{ B_t^{d,\alpha _1}B_t^{d,\alpha _2}B_t^{d,\alpha _3}-t B_t^{d,\alpha _1} \textbf{1}_{\alpha _2=\alpha _3\ne 0}-t B_t^{d,\alpha _2} \textbf{1}_{\alpha _1=\alpha _3\ne 0}-t\nonumber \\&\quad \times B_t^{d,\alpha _3} \textbf{1}_{\alpha _1=\alpha _2\ne 0} \}, \end{aligned}$$

(3.22)

where

$$\begin{aligned} L_{d,0}&=\sum _{j=1}^d V_{d,0}^{j}(\cdot )\frac{\partial }{\partial x_j}+\frac{1}{2}\sum _{i,j_1,j_2=1}^d V_{d,i}^{j_1}(\cdot ) V_{d,i}^{j_2}(\cdot ) \frac{\partial ^2}{\partial x_{j_1}\partial x_{j_2}}, \end{aligned}$$

(3.23)

$$\begin{aligned} L_{d,i}&=\sum _{j=1}^d V_{d,i}^{j}(\cdot )\frac{\partial }{\partial x_j}, \ \ i=1,\ldots ,d. \end{aligned}$$

(3.24)

Hence, the weight for $m=0$, i.e. $\mathcal{M}^0_{d,\lambda }(t,x,B_t^d)=1$ provides a simple (but coarse) Gaussian approximation, and the Malliavin weight for $m=1$ will be worked as the correction term for the Gaussian approximation. The derivation is provided in the next section.

4 Proofs of Propositions 1, 2 and Theorem 1

We give the proofs of Propositions 1, 2 and Theorem 1. Before providing full proofs, we show their brief outlines below.

Proposition 1 (Asymptotic expansion)
- take a family of uniformly non-degenerate functionals $F_t^{d,\lambda ,x}=(X_t^{d,\lambda ,x}-x)/\lambda $, $\lambda \in (0,1]$, as the family $X_t^{d,\lambda ,x}$, $\lambda \in (0,1]$ itself degenerates when $\lambda \downarrow 0$, and consider the expansion $F_t^{d,\lambda ,x}=F_t^{d,0,x}+\cdots $ in ${\mathbb {D}}^\infty $.
- expand $\delta _y(F_t^{d,\lambda ,x}) \sim \delta _y(F_t^{d,0,x})+\cdots $ in ${\mathbb {D}}^{-\infty }$ and take expectation to obtain the expansion of the density $p^{F_t^{d,\lambda ,x}}(y)=E[\delta _y(F_t^{d,\lambda ,x})] \sim E[\delta _y(F_t^{d,0,x})]+\cdots $ in ${\mathbb {R}}$.
- derive precise expression of the right-hand side of $E[f_d(X_t^{d,\lambda ,x})]=c_0^{d,\lambda ,t}+ c_1^{d,\lambda ,t}+\cdots +c_m^{d,\lambda ,t} +\textrm{Residual}^{d,\lambda ,t}_m$ by using Malliavin’s integration by parts.
- give a precise estimate for $\textrm{Residual}^{d,\lambda ,t}_m(x)$ (w.r.t $\lambda $, t and the dimension d) uniformly in x by using the key inequality on Malliavin weight (Lemma 5 in Appendix A) which yields a sharp upper bound of $\textrm{Residual}^{d,\lambda ,t}_m(x)$.
Proposition 2 (Representation and property of Malliavin weight)
- use the formula (3.14) to prove that $c_0^{d,\lambda ,t}+ c_1^{d,\lambda ,t}+\cdots +c_m^{d,\lambda ,t}$ above can be represented by an expectation $E[f_{d}(\bar{X}_t^{d,\lambda ,x}){{\mathcal {M}}}^m_{d,\lambda }(t,x,B_t^d)]$ with a Malliavin weight ${{\mathcal {M}}}^m_{d,\lambda }(t,x,B_t^d)$ constructed by polynomials of Brownian motion.
- check that the moment of the Malliavin weight ${{\mathcal {M}}}^m_{d,\lambda }(t,x,B_t^d)$ grows polynomially in d from the representation.
Theorem 1 (Deep learning-based asymptotic expansion overcomes the curse of dimensionality)
- (0) for $d \in {\mathbb {N}}$, first check the expansion $E[f_{d}(\bar{X}_t^{d,\lambda ,x}){{\mathcal {M}}}^m_{d,\lambda }(t,x,B_t^d)]$ obtained in Proposition 1 and 2 gives an approximation for $u_d^\lambda (t,x)$ on the cube $[a,b]^d$ with a sharp asymptotic error bound.
- (1) for an error precision $\varepsilon $, construct an approximation $E[f_{d}(\bar{X}_t^{d,\lambda ,x}){{\mathcal {M}}}^m_{d,\lambda }(t,x,B_t^d)] \approx E[f^{\delta }_{d}(\bar{X}_t^{d,\lambda ,x,\delta }){{\mathcal {M}}}^m_{d,\lambda ,\delta }(t,x,B_t^d)]$ on the cube $[a,b]^d$ by using stochastic calculus, where $f^{\delta }_{d}$, $\bar{X}_t^{d,\lambda ,x,\delta }$ and ${{\mathcal {M}}}^m_{d,\lambda ,\delta }(t,x,B_t^d)$ are given by replacing $\{V_{d,i}\}_i$, $A_d^{-1}$, $\{V_{d,i,\alpha }\}_{i,\alpha }$ with their neural network approximations $\{V^\delta _{d,i}\}_i$, $A_{d,\delta }^{-1}$, $\{V_{d,i,\alpha ,\delta }\}_{i,\alpha }$ with $\delta =(\varepsilon ^c d^{-c})$ for some $c>0$ independent of $\varepsilon $ and d.
- (2) for an error precision $\varepsilon $, construct a realization of the Monte-Carlo approximation $E[f^{\delta }_{d}(\bar{X}_t^{d,\lambda ,x,\delta }){{\mathcal {M}}}^m_{d,\lambda ,\delta }(t,x,B_t^d)] \approx \textstyle {\frac{1}{M} \sum _{\ell =1}^M f^{\delta }_{d}(\bar{X}_t^{d,\lambda ,x,\delta ,(\ell )}(\omega _{\varepsilon ,d})){{\mathcal {M}}}^{m,\delta }_{d,\lambda }(t,x,B_t^{d,(\ell )}(\omega _{\varepsilon ,d}))}$ on the cube $[a,b]^d$ with a choice $M=O(\varepsilon ^{-c} d^{c})$ for some $c>0$ independent of $\varepsilon $ and d, by using stochastic calculus.
- (3) for an error precision $\varepsilon $, construct a realization of the deep neural network approximation $\textstyle {\frac{1}{M} \sum _{\ell =1}^M f^{\delta }_{d}(\bar{X}_t^{d,\lambda ,x,\delta ,(\ell )}(\omega _{\varepsilon ,d})){{\mathcal {M}}}^{m,\delta }_{d,\lambda }(t,x,B_t^{d,(\ell )}(\omega _{\varepsilon ,d}))} \approx {{\mathcal {R}}}(\phi _{\varepsilon ,d})(x)$ on the cube $[a,b]^d$ whose complexity is bounded by ${{\mathcal {C}}}(\phi _{\varepsilon ,d})\le c \varepsilon ^{-c}d^c$ for some $c>0$ independent of $\varepsilon $ and d, where ReLU calculus (Lemma 9, 10, 12 in Appendix B) is essentially used.
- apply (0), (1), (2) and (3) to obtain the main result.

In the proof, we frequently use an elementary result: $\textstyle {\sup _{x \in [a,b]^d}} \Vert x \Vert \le d^{1/2} \max \{ |a|,|b| \}$, which is obtained in the proof of Corollary 4.2 of [11].

4.1 Proof of Proposition 1

For $x\in {\mathbb {R}}^d$, $t \in (0,T]$ and $\lambda \in (0,1]$, let $F_t^{d,\lambda ,x}=(F_t^{d,\lambda ,x,1},\ldots ,F_t^{d,\lambda ,x,d}) \in ({\mathbb {D}}^{\infty }(\Omega ^d))^d$ be given by $F_t^{d,\lambda ,x,j}=(X_t^{d,\lambda ,x,j}-x_j)/\lambda $, $j=1,\ldots ,d$. We note that $\{ F_t^{d,\lambda ,x} \}_{\lambda }$ is a family of uniformly non-degenerate Wiener functionals (see Theorem 3.4 of [40]). Then, for ${{\mathcal {T}}} \in \mathcal{S}'({\mathbb {R}}^d)$, the composition ${{\mathcal {T}}}(F_t^{d,\lambda ,x})$ is well-defined as an element of ${\mathbb {D}}^{-\infty }(\Omega ^d)$, and the density of $F_t^{d,\lambda ,x}$, namely $p^{F_t^{d,\lambda ,x}} \in {{\mathcal {S}}}({\mathbb {R}}^d)$ has the representation $p^{F_t^{d,\lambda ,x}}(y)={}_{{\mathbb {D}}^{-\infty }} \langle \delta _y(F_t^{d,\lambda ,x}), 1 \rangle {}_{{\mathbb {D}}^{-\infty }}$ for $y \in {\mathbb {R}}^d$. Then, for $x\in {\mathbb {R}}^d$, $t>0$ and $\lambda \in (0,1]$, it holds that

$$\begin{aligned} E[f_d(X_t^{d,\lambda ,x})]=\int _{{\mathbb {R}}^d} f_d(x+\lambda y) {}_{{\mathbb {D}}^{-\infty }} \langle \delta _y(F_t^{d,\lambda ,x}), 1 \rangle {}_{{\mathbb {D}}^{-\infty }} dy. \end{aligned}$$

(4.1)

For $x\in {\mathbb {R}}^d$, $t \in (0,T]$, let $F_t^{d,0,x}=\textstyle {\sum _{i=0}^d}V_{d,i}(x)B_t^{d,i}$. Thus, for $S \in {{\mathcal {S}}}'({\mathbb {R}}^d)$, the composition $S(F_t^{d,\lambda ,x})$ is well-defined as an element of ${\mathbb {D}}^{-\infty }(\Omega ^d)$ and has an expansion:

$$\begin{aligned} {}_{{\mathbb {D}}^{-\infty }} \langle \delta _y(F_t^{d,\lambda ,x}), 1 \rangle {}_{{\mathbb {D}}^{\infty }}&={}_{{\mathbb {D}}^{-\infty }} \langle \delta _y(F_t^{d,0,x}), 1 \rangle {}_{{\mathbb {D}}^{\infty }} \nonumber \\&\quad +\sum _{j=1}^m \frac{\lambda ^j}{j!} \frac{\partial ^{j}}{\partial \lambda ^{j}} {}_{{\mathbb {D}}^{-\infty }} \langle \delta _y(F_t^{d,\lambda ,x}), 1 \rangle {}_{{\mathbb {D}}^{\infty }} |_{\lambda =0} +\lambda ^{m+1} {{\mathcal {E}}}_{m,t}^{d,\lambda ,x,y}, \end{aligned}$$

(4.2)

for $x\in {\mathbb {R}}^d$, $t>0$ and $\lambda \in (0,1]$, where

$$\begin{aligned} {{\mathcal {E}}}_{m,t}^{d,\lambda ,x,y}={\int _0^1 \frac{(1-u)^{m}}{m!} \frac{\partial ^{m+1}}{\partial \eta ^{m+1}} {}_{{\mathbb {D}}^{-\infty }} \langle \delta _y(F_t^{d,\eta ,x}),1\rangle {}_{{\mathbb {D}}^{\infty }} |_{\eta =\lambda u} du}. \end{aligned}$$

(4.3)

By the integration by parts (2.7) and Theorem 2.6 of [35] yield that

$$\begin{aligned}&\frac{1}{j!} \frac{\partial ^{j}}{\partial \lambda ^{j}} {}_{{\mathbb {D}}^{-\infty }} \langle \delta _y(F_t^{d,\lambda ,x}), 1 \rangle {}_{{\mathbb {D}}^{\infty }} |_{\lambda =0} \nonumber \\&\quad = \sum _{i^{(k)},\gamma ^{(k)}}^{j} {}_{{\mathbb {D}}^{-\infty }} \Bigg \langle \delta _y(F_t^{d,0,x}),H_{\gamma ^{(k)}} \Bigg (F_t^{d,0,x},\prod _{\ell =1}^k \frac{1}{i_\ell !} \frac{\partial ^{i_\ell }}{\partial \lambda ^{i_\ell }} F_t^{d,\lambda ,x,\gamma _\ell }|_{\lambda =0} \Bigg ) \Bigg \rangle {}_{{\mathbb {D}}^{\infty }}. \end{aligned}$$

(4.4)

where $\textstyle {\sum _{i^{(k)},\gamma ^{(k)}}^{j}=\sum _{k=1}^j \sum _{i^{(k)}=(i_1,\ldots ,i_k) \ s.t. \ i_1+\cdots +i_k=j,i_e\ge 1}\sum _{\gamma ^{(k)}=(\gamma _1,\ldots ,\gamma _k)\in \{1,\cdots ,d \}^k}\frac{1}{k!}}$ With a calculation

$$\begin{aligned} {\frac{1}{i!}\frac{\partial ^{i}}{\partial \lambda ^{i}} F_t^{d,\lambda ,x,j}|_{\lambda =0}=\sum _{ | \alpha |=i+1} L_{d,\alpha _1}\cdots L_{d,\alpha _{r-1}}V_{d,\alpha _r}^{j}(x) {\mathbb {B}}_t^{d,\alpha }} \end{aligned}$$

(4.5)

for $j=1,\ldots ,d$ and $i\in {\mathbb {N}}$, it holds that

$$\begin{aligned}&{}_{{\mathbb {D}}^{-\infty }} \langle \delta _y(F_t^{d,\lambda ,x}), 1 \rangle {}_{{\mathbb {D}}^{\infty }} = {}_{{\mathbb {D}}^{-\infty }} \langle \delta _y(F_t^{d,0,x}), 1 \rangle {}_{{\mathbb {D}}^{\infty }} \nonumber \\&\quad {+}\sum _{j{=}1}^m \lambda ^j \sum _{i^{(k)},\gamma ^{(k)}}^{j} {}_{{\mathbb {D}}^{{-}\infty }} \langle \delta _y(F_t^{d,0,x}),H_{\gamma ^{(k)}} \nonumber \\&\times \quad \Bigg (F_t^{d,0,x},\prod _{\ell {=}1}^k \sum _{ | \alpha |{=}i_\ell } L_{d,\alpha _1}\cdots L_{d,\alpha _{r{-}1}}V_{d,\alpha _r}^{\gamma _\ell }(x) {\mathbb {B}}_t^{d,\alpha } \Bigg ) \rangle {}_{{\mathbb {D}}^{\infty }} \nonumber \\&\quad +\lambda ^{m+1} {{\mathcal {E}}}_{m,t}^{d,\lambda ,x,y}, \end{aligned}$$

(4.6)

Again by the integration by parts (2.7), $\textstyle {\frac{\partial ^{m+1}}{\partial \eta ^{m+1}}} {}_{{\mathbb {D}}^{-\infty }} \langle \delta _y(F_t^{d,\lambda ,x}),1\rangle {}_{{\mathbb {D}}^{\infty }} |_{\eta =\lambda u}$ (with $\lambda u \in (0,1]$) in $\mathcal{E}_{m,t}^{d,\lambda ,x,y}$ in (4.3) is given by a linear combination of the expectations of the form

$$\begin{aligned} {}_{{\mathbb {D}}^{-\infty }} \Bigg \langle \delta _y(F_t^{d,\lambda u,x}), \textstyle {H_{\gamma }\Bigg (F_t^{d,\lambda u,x}, \prod _{\ell =1}^{k}\frac{1}{\beta _\ell !}\partial _{\eta }^{\beta _\ell } F_t^{d,\eta ,x,\gamma _\ell }}|_{\eta =\lambda u}\Bigg )\Bigg \rangle {}_{{\mathbb {D}}^{\infty }} \end{aligned}$$

with $k \le m+1$, $\gamma \in \{1,\ldots ,d \}^k$ and $\beta _1,\ldots ,\beta _k\ge 1$ such that $\textstyle {\sum _{\ell =1}^k} \beta _\ell =m+1$. By the inequality of Lemma 5 with $k=0$ in Appendix A, we have for all $p\ge 1$ and multi-index $\gamma $, there are $c>0$, $p_1,p_2,p_3>1$ and $r \in {\mathbb {N}}$ satisfying

$$\begin{aligned} \Vert H_{\gamma }(F_t^{d,\lambda ,x}, G) \Vert _p \le cd^c \Vert \det (\sigma ^{F_t^{d,\lambda ,x}})^{-1} \Vert _{p_1}^{r} \Vert DF_t^{d,\lambda ,x} \Vert ^{2dr-|\gamma |}_{|\gamma |,p_2,H^d} \Vert G \Vert _{|\gamma |,p_3}, \end{aligned}$$

(4.7)

for all $G \in {\mathbb {D}}^\infty $, $t \in (0,T]$, $\lambda \in (0,1]$ and $x \in [a,b]^d$. In order to show the upper bound of the weight appearing in the residual term of the expansion, we list the following results:

Lemma 1

1.
For all $p>1$, there exists $\kappa _1>0$ such that for all $d\in {\mathbb {N}}$, $t \in (0,T]$, $x\in [a,b]^d$ and $\lambda \in (0,1]$,
$$\begin{aligned} \Vert \det (\sigma ^{F_t^{d,\lambda ,x}})^{-1} \Vert _p \le \kappa _1 d^{\kappa _1} t^{-d}. \end{aligned}$$
(4.8)
2.
For all $p>1$, $r\in {\mathbb {N}}$, there exists $\kappa _2>0$ such that for all $d\in {\mathbb {N}}$, $t \in (0,T]$, $x\in [a,b]^d$ and $\lambda \in (0,1]$,
$$\begin{aligned} \Vert DF_t^{d,\lambda ,x} \Vert _{r,p,H}\le \kappa _2 d^\kappa _2 t^{1/2}. \end{aligned}$$
(4.9)
3.
For all $\ell \in {\mathbb {N}}$, $p>1$ and $r\in {\mathbb {N}}$, there exists $\eta >0$ such that for all $d\in {\mathbb {N}}$, $t \in (0,T]$, $x\in [a,b]^d$ and $\lambda \in (0,1]$,
$$\begin{aligned} \Vert \partial _{\lambda }^\ell F_t^{d,\lambda ,x} \Vert _{r,p} \le \eta d^\eta t^{(\ell +1)/2}. \end{aligned}$$
(4.10)

Proof

For $d\in {\mathbb {N}}$, let $V_d: {\mathbb {R}}^d \rightarrow {\mathbb {R}}^{d \times d}$ be such that $V_d=(V_{d,1},\ldots ,V_{d,d})$ and for $\lambda \in (0,1]$, let $V^{\lambda }_d: {\mathbb {R}}^d \rightarrow {\mathbb {R}}^{d \times d}$ be such that $V^{\lambda }_d=(V^{\lambda }_{d,1},\ldots ,V^{\lambda }_{d,d})$. Moreover, for $d\in {\mathbb {N}}$, we use the notation $J_{0\rightarrow t}=\textstyle {\frac{\partial }{\partial x}X_t^{d,\lambda ,x}}=(\textstyle {\frac{\partial }{\partial x_i}X_t^{d,\lambda ,x,j})_{1\le i,j \le d}}$ for $x\in {\mathbb {R}}^d$, $t>0$ and $\lambda \in (0,1]$.

1.
Note that for $d\in {\mathbb {N}}$, $t \in (0,T]$, $x \in {\mathbb {R}}^d$ and $\lambda \in (0,1]$, we have
$$\begin{aligned} \sigma ^{F_t^{d,\lambda ,x}}&= \int _0^t [D_{s} (X_t^{d,\lambda ,x}-x)/\lambda ] [D_{s} (X_t^{d,\lambda ,x}-x)/\lambda ]^{\top } ds \end{aligned}$$
(4.11)
$$\begin{aligned}&=\int _0^t J_{0 \rightarrow t} J_{0 \rightarrow s}^{-1} V_d (X_s^{d,\lambda ,x})V_d(X_s^{d,\lambda ,x})^{\top } {J_{0 \rightarrow s}^{-1}}^{\top } J_{0 \rightarrow t}^{\top } ds. \end{aligned}$$
(4.12)
Under the condition $\sigma _{d}^{\lambda }(\cdot )\sigma _{d}^{\lambda }(\cdot )^{\top } \ge \lambda ^2 I_{d}$, (i.e. $V_{d}(\cdot )V_{d}(\cdot )^{\top } \ge I_{d}$) in Assumption 1.3, we have that there is $c>0$ such that
$$\begin{aligned} \sup _{x\in [a,b]^d} \Vert (\det \sigma ^{F_t^{d,\lambda ,x}})^{-1} \Vert _p \le cd^c t^{-d}, \end{aligned}$$
(4.13)
for all $d\in {\mathbb {N}}$, $t \in (0,T]$ and $\lambda \in (0,1]$, by Theorem 3.5 of Kusuoka and Stroock [22].
2.
We recall that for $d \in {\mathbb {N}}$, $\lambda \in (0,1]$ and $0\le s<t$, $D_{s} (X_t^{d,\lambda ,x}-x)/\lambda =J_{0 \rightarrow t} J_{0 \rightarrow s}^{-1} V(X_s^{d,\lambda ,x})$. Then, there is $c>0$ such that
$$\begin{aligned} \sup _{x\in [a,b]^d} \Vert DF_t^{d,\lambda ,x} \Vert _{k,p,H^d} \le c d^c t^{1/2}, \end{aligned}$$
(4.14)
for all $d\in {\mathbb {N}}$, $t \in (0,T]$ and $\lambda \in (0,1]$, by Theorem 2.19 of Kusuoka and Stroock [22].
3.
Note that
$$\begin{aligned} \frac{1}{\ell !}\frac{\partial ^\ell }{\partial \lambda ^\ell }X_t^{d,\lambda ,x,r}&= \sum _{i^{(k)},\gamma ^{(k)}}^{\ell -1}\int _0^t \prod _{e=1}^k \frac{1}{i_e !}\frac{\partial ^{i_e}}{\partial \lambda ^{i_e} }X_t^{d,\lambda ,x,\gamma _e}\sum _{j=0}^d \partial ^{\gamma ^{(k)}} V_j^r(X_s^{d,\lambda ,x})dB_s^{d,j} \end{aligned}$$
(4.15)
$$\begin{aligned}&\quad +\lambda \sum _{i^{(k)},\gamma ^{(k)}}^{\ell }\int _0^t \prod _{e=1}^k \frac{1}{i_e !}\frac{\partial ^{i_e}}{\partial \lambda ^{i_e} }X_t^{d,\lambda ,x,\gamma _e}\sum _{j=0}^d \partial ^{\gamma ^{(k)}} V_j^r(X_s^{d,\lambda ,x})dB_s^{d,j}. \end{aligned}$$
(4.16)
Since the above is a linear SDE, it has the explicit form and we have
$$\begin{aligned} \sup _{x \in [a,b]^d}\Big \Vert \frac{1}{\ell !}\frac{\partial ^\ell }{\partial \lambda ^\ell }X_t^{d,\lambda ,x} \Big \Vert _{k,p} \le c d^c t^{\ell /2}, \end{aligned}$$
(4.17)
for some $c>0$ independent of t and d, due to the result:
$$\begin{aligned} \sup _{x \in [a,b]^d} \Big \Vert&\sum _{i^{(k)},\gamma ^{(k)}}^{\ell -1}\int _0^t J_{0\rightarrow t}J_{0\rightarrow s}^{-1} \prod _{e=1}^k \frac{1}{i_e !}\frac{\partial ^{i_e}}{\partial \lambda ^{i_e} }X_t^{d,\lambda ,x,\gamma _e}\sum _{j=0}^d \partial ^{\gamma ^{(k)}} V_j(X_s^{d,\lambda ,x})dB_s^{d,j} \Big \Vert _{k,p}\nonumber \\&\le c d^c t^{\ell /2}, \end{aligned}$$
(4.18)
which is obtained by using Lemmas 6 and 7 in Appendix A. Then, the process
$$\begin{aligned} \frac{1}{\ell !}\frac{\partial ^\ell }{\partial \lambda ^\ell }F_t^{d,\lambda ,x}&= \sum _{i^{(k)},\gamma ^{(k)}}^{\ell }\int _0^t \prod _{e=1}^k \frac{1}{i_e !}\frac{\partial ^{i_e}}{\partial \lambda ^{i_e} }X_t^{d,\lambda ,x,\gamma _e}\sum _{j=0}^d \partial ^{\gamma ^{(k)}} V_j(X_s^{d,\lambda ,x})dB_s^{d,j}, \ \ t\nonumber \\&\ge 0, x \in {\mathbb {R}}^d \end{aligned}$$
(4.19)
satisfies
$$\begin{aligned} \sup _{x \in [a,b]^d} \Big \Vert \frac{1}{\ell !}\frac{\partial ^\ell }{\partial \lambda ^\ell }F_t^{d,\lambda ,x} \Big \Vert _{k,p} \le c d^c t^{(\ell +1)/2}, \end{aligned}$$
(4.20)
for some $c>0$ independent of t and d.

$\square $

Using above, we have that for all $k \le m+1$, $\gamma \in \{1,\ldots ,d \}^k$ and $\beta _1,\ldots ,\beta _k\ge 1$ such that $\textstyle {\sum _{\ell =1}^k} \beta _\ell =m+1$, $p>1$ and multi-index $\gamma $, there exists $\nu >0$ such that

$$\begin{aligned} \Vert H_{\gamma }(F_t^{d,\lambda ,x}, \textstyle {\prod _{\ell =1}^{k}\frac{1}{\beta _\ell !}\partial _{\lambda }^{\beta _\ell } F_t^{d,\lambda ,x,\gamma _\ell }}) \Vert _p \le \nu d^{\nu } t^{-k/2} t^{(\beta _1+\cdots +\beta _k+k)/2}=\nu d^{\nu } t^{(m+1)/2}, \end{aligned}$$

(4.21)

for all $t \in (0,T]$, $x\in [a,b]^d$ and $\lambda \in (0,1]$. Let us define $r_{m,t}^{d,\lambda ,x}$ for $t \in (0,T]$, $x\in [a,b]^d$ and $\lambda \in (0,1]$ from (4.1) and (4.6) as

$$\begin{aligned} r_{m,t}^{d,\lambda ,x}&= E[f_d(X_t^{d,\lambda ,x})] \nonumber \\&\quad -E \Big [f_{d}(\bar{X}_t^{d,\lambda ,x}) \Big \{ 1+ \sum _{j=1}^m \lambda ^j \sum _{\beta ^{(k)},\gamma ^{(k)}}^{(j)} H_{\gamma ^{(k)}} \nonumber \\&\quad \times \Big (\sum _{i=0}^dV_{d,i}(x)B_t^{d,i},\prod _{\ell =1}^k \sum _{ | \alpha |=\beta _\ell } L_{d,\alpha _1}\cdots L_{d,\alpha _{r-1}}V_{d,\alpha _r}^{\gamma _\ell }(x) {\mathbb {B}}_t^{d,\alpha } \Big ) \Big \} \Big ]\nonumber \\&= \lambda ^{m+1}\int _0^1 \frac{(1-u)^{m}}{m!} E[f_d( \tilde{X}_t^{d,\lambda ,u,x} ) {{\mathcal {W}}}_{m+1,t}^{d,\lambda ,u,x} ] du, \end{aligned}$$

(4.22)

where $\tilde{X}_t^{d,\lambda ,u,x}=x+\lambda F_t^{d,\lambda u,x}$, $u \in [0,1]$ and

$$\begin{aligned} \mathcal{W}_{m+1,t}^{d,\lambda ,u,x}=\sum _{\beta ^{(k)},\gamma ^{(k)}}^{[m+1]} {H_{\gamma }\Bigg (F_t^{d,\lambda u,x}, \prod _{\ell =1}^{k}\frac{1}{\beta _\ell !}\partial _{\eta }^{\beta _\ell } F_t^{d,\eta ,x,\gamma _\ell }}|_{\eta =\lambda u}\Bigg ), \ \ u \in [0,1], \end{aligned}$$

(4.23)

with $\textstyle {\sum _{\beta ^{(k)},\gamma ^{(k)}}^{[m+1]}:=(m+1)! \sum _{k=1}^j \sum _{\beta ^{(k)}=(\beta _1,\ldots ,\beta _k) s.t. \sum _{\ell =1}^k \beta _\ell =j,\beta _i\ge 1}\sum _{\gamma ^{(k)}=(\gamma _1,\ldots ,\gamma _k)\in \{1,\cdots ,d \}^k}}{\frac{1}{k!}}$.

Here, $X_t^{d,\lambda ,u,x}$, $u \in [0,1]$ and $\mathcal{W}_{m+1,t}^{d,\lambda ,u,x}$, $u \in [0,1]$ satisfy that for $p \ge 1$, there exists $\eta >0$ such that

$$\begin{aligned} \textstyle {\sup _{x \in [a,b]^d, u \in [0,1]}}\Vert X_t^{d,\lambda ,u,x} \Vert _p \le \eta d^\eta \ \hbox {and} \ \textstyle {\sup _{x \in [a,b]^d, u \in [0,1]}}\Vert \mathcal{W}_{m+1,t}^{d,\lambda ,u,x} \Vert _p \le \eta d^\eta t^{(m+1)/2} \end{aligned}$$

for all $\lambda \in (0,1]$ and $t>0$. Therefore, there exists $c>0$ such that

$$\begin{aligned} \sup _{x\in [a,b]^d}|r_{m,t}^{d,\lambda ,x}| \le c d^c \lambda ^{m+1} t^{(m+1)/2}, \end{aligned}$$

(4.24)

for all $\lambda \in (0,1]$ and $t \in (0,T]$, and then the assertion of Proposition 1 holds.

4.2 Proof of Proposition 2

For $d \in {\mathbb {N}}$ and for $m \in {\mathbb {N}}$, first note that the following representation holds:

$$\begin{aligned}&E \Big [f_{d}(\bar{X}_t^{d,\lambda ,x}) H_{\gamma } \Big (\sum _{i=0}^dV_{d,i}(x)B_t^{d,i},\prod _{\ell =1}^k \sum _{ | \alpha |=\beta _\ell } L_{d,\alpha _1}\cdots L_{d,\alpha _{r-1}}V_{d,\alpha _r}^{\gamma _\ell }(x) {\mathbb {B}}_t^{d,\alpha } \Big ) \Big ] \end{aligned}$$

(4.25)

$$\begin{aligned}&\quad =\int _{{\mathbb {R}}^d} f_d(x+\lambda y) {}_{{\mathbb {D}}^{-\infty }} \Bigg \langle \delta _y \Big (\sum _{i=0}^dV_{d,i}(x)B_t^{d,i}\Big ) \end{aligned}$$

(4.26)

$$\begin{aligned}&\quad H_{\gamma } \Bigg (\sum _{i=0}^dV_{d,i}(x)B_t^{d,i},\prod _{\ell =1}^k \sum _{ | \alpha |=\beta _\ell } L_{d,\alpha _1}\cdots L_{d,\alpha _{r-1}}V_{d,\alpha _r}^{\gamma _\ell }(x) {\mathbb {B}}_t^{d,\alpha } \Bigg ) \Bigg \rangle {}_{{\mathbb {D}}^{\infty }} dy, \end{aligned}$$

(4.27)

for $t \in (0,T]$, $x \in {\mathbb {R}}^d$, $\lambda \in (0,1]$, $k=1,\ldots ,j \le m$, $\beta _1,\ldots ,\beta _k \ge 2$ such that $\beta _1+\cdots +\beta _k=j+k$, and $\gamma \in \{1,\ldots ,d \}^k$. Using the Itô formula for the products of iterated integrals (Proposition 5.2.3 of [21] for example) and the formula from (3.14): for a multi-index $\gamma \in \{1,\ldots ,d \}^p$ and a multi-index $\alpha \in \{0,1,\ldots ,d \}^q$,

$$\begin{aligned}&{}_{{\mathbb {D}}^{-\infty }}\Bigg \langle \delta _y \Bigg (\sum _{i=0}^d V_{d,i}(x) B_t^{d,i} \Bigg ), H_{\gamma } \Bigg (\sum _{i=0}^d V_{d,i}(x) B_t^{d,i},{\mathbb {B}}_t^{d,\alpha } \Bigg )\Bigg \rangle {}_{{\mathbb {D}}^\infty } \nonumber \\&\quad = {}_{{\mathbb {D}}^{-\infty }} \Bigg \langle \delta _y \Bigg (\sum _{i=0}^d V_{d,i}(x) B_t^{d,i} \Bigg ), \sum _{j_1,\ldots ,j_{|\gamma |},\beta _{1},\ldots ,\beta _{|\gamma |}=1}^d \frac{1}{t^{|\gamma |}} \prod _{q=1}^{|\gamma |} [A_d^{-1}]_{\gamma _q,j_q}(x) V_{d,\beta _{q}}^{j_q}(x)\\&\qquad \quad \frac{1}{k!} \textbf{B}_t^{d,(\alpha _1,\ldots ,\alpha _{k},\beta _1,\ldots ,\beta _{|\gamma |})} \Bigg \rangle {}_{{\mathbb {D}}^\infty } \end{aligned}$$

iteratively, we have (3.15) and the representation (3.16).

We can see that for $p\ge 1$ and $e=1,\ldots ,n(m)$, $\Vert g_e(t) \textrm{Poly}_e(B_t^d)\Vert _p=O(t^{\nu _r/2})$ for some $\nu _r \ge 1$, and by Assumption 1 and 2 and the expression of $h_e$, there is $\eta >0$ independent of d such that $|h_e(x)| \le \eta d^\eta $ for all $e=1,\ldots ,n(m)$ and $x \in [a,b]^d$. Then, for $p\ge 1$, there exists $c>0$ independent of d such that

$$\begin{aligned} \Vert {{\mathcal {M}}}^{m}_{d,\lambda }(t,x,B_t^d) \Vert _p \le cd^c, \end{aligned}$$

(4.28)

uniformly in $(t,x)\in (0,T] \times [a,b]^d$ and $\lambda \in (0,1]$.

4.3 Proof of Theorem 1

The first statement is immediately obtained by combining Propositions 1 with 2:

$$\begin{aligned} \sup _{x \in [a,b]^d}|u_{\lambda }^d(t,x)-E[f_{d}(\bar{X}_t^{d,\lambda ,x})\mathcal{M}^m_{d,\lambda }(t,x,B_t^d)]|=O(\lambda ^{m+1} t^{(m+1)/2}). \end{aligned}$$

(4.29)

Hereafter, we fix $t \in (0,T]$ and $\lambda \in (0,1]$. For $d \in {\mathbb {N}}$, $x\in {\mathbb {R}}^d$, $\delta \in (0,1)$, let

$$\begin{aligned} \bar{X}_t^{d,\lambda ,x,\delta }=x+\lambda \textstyle {\sum _{i=0}^d} V_{d,i}^{\delta }(x)B_t^{d,i} \end{aligned}$$

(4.30)

and ${{\mathcal {M}}}^{m,\delta }_{d,\lambda }(t,x,B_t^d) \in {\mathbb {D}}^\infty (\Omega ^d)$ be a functional which has the form:

$$\begin{aligned} {{\mathcal {M}}}^{m,\delta }_{d,\lambda }(t,x,B_t^d) = \textstyle {1+\sum _{e\le n(m)}} \lambda ^{p(e)} g_e(t) h^{\delta }_{e}(x)\textrm{Poly}_e({B}_t^d), \end{aligned}$$

(4.31)

where $h_{e}^{\delta }: {\mathbb {R}}^d \rightarrow {\mathbb {R}}$, $e=1,\ldots ,n(m)$ are functions constructed by some products of $A^{-1}_{d,\delta }$, $\{V^\delta _{d,i}\}_{0\le i \le d}$ and $\{V^\delta _{d,i,\alpha }\}_{0\le i \le d,\alpha \in \{1,\ldots ,d \}^{\ell },\ell \le 2m}$ in Assumption 2, by replacing with $A^{-1}_{d}$, $\{V_{d,i}\}_{0\le i \le d}$ and $\{V_{d,i,\alpha }\}_{0\le i \le d,\alpha \in \{1,\ldots ,d \}^{\ell },\ell \le 2m}$ in Proposition 2, satisfying

$$\begin{aligned}&E[f_{d}(\bar{X}_t^{d,\lambda ,x,\delta }) {{\mathcal {M}}}^{m,\delta }_{d,\lambda }(t,x,B_t^d)]\nonumber \\&\quad =E\Bigg [ f_{d}(\bar{X}_t^{d,\lambda ,x,\delta })\Bigg \{1+\sum _{j=1}^m \lambda ^j \sum _{k=1}^j \sum _{\beta _1+\cdots +\beta _k=j+k,\beta _i\ge 2}\sum _{(\gamma _1,\ldots ,\gamma _k)\in \{1,\ldots ,d \}^k}\frac{1}{k!} \nonumber \\&\quad H_{(\gamma _1,\ldots ,\gamma _k)} \Bigg (\sum _{i=1}^dV^{\delta }_{d,i}(x)B_t^{d,i},\prod _{\ell =1}^k \sum _{ | \alpha |=\beta _\ell } L^{\delta }_{d,\alpha _1}\cdots L^{\delta }_{d,\alpha _{r-1}}V_{d,\alpha _r}^{\delta ,\gamma _\ell }(x) {\mathbb {B}}_t^{d,\alpha } \Bigg )\Bigg \}\Bigg ]. \end{aligned}$$

(4.32)

Next, we prepare the following lemmas (Lemmas 2, 3 and 4) to prove the second assertion ((3.20)) in Theorem 1.

Lemma 2

There exists $c_1>0$ which depends only on $a,b,C,m,\kappa ,t$ and $\lambda $ such that for all $\varepsilon \in (0,1)$, $d\in {\mathbb {N}}$, $\delta =O(\varepsilon ^{c_1} d^{-c_1})$,

$$\begin{aligned} \sup _{x \in [a,b]^d}|E[f_{d}(\bar{X}_t^{d,\lambda ,x})\mathcal{M}^m_{d,\lambda }(t,x,B_t^d)]-E[f^{\delta }_{d}(\bar{X}_t^{d,\lambda ,x,\delta })\mathcal{M}^{m,\delta }_{d,\lambda }(t,x,B_t^d)]|\le \varepsilon ,\qquad \end{aligned}$$

(4.33)

where $f^{\delta }_{d}={{\mathcal {R}}}(\psi _{\delta }^{f_d}) \in C({\mathbb {R}}^d,{\mathbb {R}})$ is defined in Assumption 2.4.

Proof

In the proof, we use a generic constant $c>0$ which depends only on $a,b,C,m,\kappa ,t$ and $\lambda $. Note that for $x \in [a,b]^d$,

$$\begin{aligned}{} & {} |E[f_{d}(\bar{X}_t^{d,\lambda ,x})\mathcal{M}^m_{d,\lambda }(t,x,B_t^d)]-E[f^{\delta }_{d}(\bar{X}_t^{d,\lambda ,x,\delta })\mathcal{M}^{m,\delta }_{d,\lambda }(t,x,B_t^d)]|\nonumber \\{} & {} \quad \le | E[f_{d}(\bar{X}_t^{d,\lambda ,x})\mathcal{M}^m_{d,\lambda }(t,x,B_t^d)]-E[f_{d}(\bar{X}_t^{d,\lambda ,x,\delta })\mathcal{M}^m_{d,\lambda }(t,x,B_t^d)] |\nonumber \\{} & {} \qquad +| E[f_{d}(\bar{X}_t^{d,\lambda ,x,\delta })\mathcal{M}^m_{d,\lambda }(t,x,B_t^d)]-E[f^{\delta }_{d}(\bar{X}_t^{d,\lambda ,x,\delta })\mathcal{M}^m_{d,\lambda }(t,x,B_t^d)] |\nonumber \\{} & {} \qquad +| E[f^{\delta }_{d}(\bar{X}_t^{d,\lambda ,x,\delta })\mathcal{M}^m_{d,\lambda }(t,x,B_t^d)]-E[f^{\delta }_{d}(\bar{X}_t^{d,\lambda ,x,\delta })\mathcal{M}^{m,\delta }_{d,\lambda }(t,x,B_t^d)] |. \end{aligned}$$

(4.34)

By 2 of Assumption 2 (with Assumption 1), it holds that

$$\begin{aligned}{} & {} | E[f_{d}(\bar{X}_t^{d,\lambda ,x})\mathcal{M}^m_{d,\lambda }(t,x,B_t^d)]-E[f_{d}(\bar{X}_t^{d,\lambda ,x,\delta })\mathcal{M}^{m}_{d,\lambda }(t,x,B_t^d)] |\nonumber \\{} & {} \quad \le C \Vert \bar{X}_t^{d,\lambda ,x}-\bar{X}_t^{d,\lambda ,x,\delta }\Vert _2 \Vert \mathcal{M}^m_{d,\lambda }(t,x,B_t^d) \Vert _2 \le \delta c d^c, \end{aligned}$$

(4.35)

for all $x \in [a,b]^d$. By 4 of Assumption 2 (with Assumption 1), it holds that

$$\begin{aligned} | E[f_{d}(\bar{X}_t^{d,\lambda ,x,\delta })\mathcal{M}^m_{d,\lambda }(t,x,B_t^d)]-E[f^{\delta }_{d}(\bar{X}_t^{d,\lambda ,x,\delta })\mathcal{M}^m_{d,\lambda }(t,x,B_t^d)] | \le \delta c d^c, \end{aligned}$$

(4.36)

for all $x \in [a,b]^d$. Here, the estimate $ \Vert \mathcal{M}^m_{d,\lambda }(t,x,B_t^d) \Vert _2 \le cd^c$ in (3.18) is used in (4.35) and (4.36). By 2, 3, 4 of Assumption 2 (with Assumption 1), (3.16) and (4.31), we have that for $p\ge 1$,

$$\begin{aligned} \Vert {{\mathcal {M}}}^m_{d,\lambda }(t,x,B_t^d)-\mathcal{M}^{m,\delta }_{d,\lambda }(t,x,B_t^d) \Vert _p \le \delta c d^c \end{aligned}$$

(4.37)

and

$$\begin{aligned} | E[f^{\delta }_{d}(\bar{X}_t^{d,\lambda ,x,\delta })\mathcal{M}^m_{d,\lambda }(t,x,B_t^d)]-E[f^{\delta }_{d}(\bar{X}_t^{d,\lambda ,x,\delta })\mathcal{M}^{m,\delta }_{d,\lambda }(t,x,B_t^d)] | \le \delta c d^c, \end{aligned}$$

(4.38)

for all $x \in [a,b]^d$. Then, by taking $\delta =(1/3) c_1^{-1}\varepsilon ^{c_1}d^{-c_1}$ with $c_1=\max \{1,c \}$ where c is the maximum constant appearing in (4.35), (4.36) and (4.38)), we have

$$\begin{aligned} \sup _{x \in [a,b]^d}|E[f_{d}(\bar{X}_t^{d,\lambda ,x})\mathcal{M}^m_{d,\lambda }(t,x,B_t^d)]-E[f^{\delta }_{d}(\bar{X}_t^{d,\lambda ,x,\delta })\mathcal{M}^{m,\delta }_{d,\lambda }(t,x,B_t^d)]|\le \varepsilon . \ \ \end{aligned}$$

(4.39)

$\square $

Lemma 3

For $d\in {\mathbb {N}}$, $t \in (0,T]$ and $M\in {\mathbb {N}}$, let $B_t^{d,(\ell )}$, $\ell =1,\ldots ,M$ be independent identically distributed random variables such that $B_t^{d,(\ell )} \overset{\textrm{law}}{=} B_t^{d}$. There exists $c_2>0$ which depends only on $a,b,C,m,\kappa ,t$ and $\lambda $ such that for $\varepsilon \in (0,1)$, $d\in {\mathbb {N}}$ and $M=O(\varepsilon ^{-c_2} d^{c_2})$, there is $\omega _{\varepsilon ,d} \in \Omega ^d$ satisfying

$$\begin{aligned}&\sup _{x\in [a,b]^d} \Bigg | E[f^{\delta }_{d}(\bar{X}_t^{d,\lambda ,x,\delta })\mathcal{M}^{m,\delta }_{d,\lambda }(t,x,B_t^d)]-\frac{1}{M} \sum _{\ell =1}^M f^{\delta }_{d}(\bar{X}_t^{d,\lambda ,x,\delta ,(\ell )}(\omega _{\varepsilon ,d}))\mathcal{M}^{m,\delta }_{d,\lambda }(t,x,B_t^{d,(\ell )}\nonumber \\&\qquad \qquad \times (\omega _{\varepsilon ,d})) \Bigg | \le \varepsilon , \end{aligned}$$

(4.40)

where $\delta =O(\varepsilon ^{c_1}d^{-c_1})$ with the constant $c_1$ in Lemma 2.

Proof

There exists a constant $c >0$ which depends only on $a,b,C,m,\kappa ,t$ and $\lambda $ such that for all $x \in [a,b]^d$ and $M \in {\mathbb {N}}$,

$$\begin{aligned}{} & {} E\Big [\Big | E[f^{\delta }_{d}(\bar{X}_t^{d,\lambda ,x,\delta }){{\mathcal {M}}}^{m,\delta }_{d,\lambda }(t,x,B_t^d)] -\frac{1}{M} \sum _{\ell =1}^M f^{\delta }_{d}(\bar{X}_t^{d,\lambda ,x,\delta ,(\ell )})\mathcal{M}^{m,\delta }_{d,\lambda }(t,x,B_t^{d,(\ell )}) \Big |^2 \Big ]\nonumber \\ \end{aligned}$$

(4.41)

$$\begin{aligned}{} & {} \quad \le \frac{1}{M} E[|f^{\delta }_{d}(\bar{X}_t^{d,\lambda ,x,\delta })\mathcal{M}^{m,\delta }_{d,\lambda }(t,x,B_t^d)|^2] \le \frac{cd^{c}}{M}.\nonumber \\ \end{aligned}$$

(4.42)

Then, by choosing $c_2=\max \{1,c \}$, we have that for all $\varepsilon \in (0,1)$, $d \in {\mathbb {N}}$ and $M=c_2 \varepsilon ^{-c_2}d^{c_2}$,

$$\begin{aligned} E\Big [\Big | E[f^{\delta }_{d}(\bar{X}_t^{d,\lambda ,x,\delta })\mathcal{M}^{m,\delta }_{d,\lambda }(t,x,B_t^d)]-\frac{1}{M} \sum _{\ell =1}^M f^{\delta }_{d}(\bar{X}_t^{x,\delta ,(\ell )})\mathcal{M}^{m,\delta }_{d}(t,x,B_t^{(\ell )}) \Big |^2 \Big ]^{1/2} \le \varepsilon , \end{aligned}$$

(4.43)

for all $x \in [a,b]^d$, and therefore, there is $\omega _{\varepsilon ,d} \in \Omega ^d$ satisfying

$$\begin{aligned}&\sup _{x\in [a,b]^d} \Big | E[f^{\delta }_{d}(\bar{X}_t^{d,\lambda ,x,\delta })\mathcal{M}^{m,\delta }_{d,\lambda }(t,x,B_t^d)]\nonumber \\&\quad -\frac{1}{M} \sum _{\ell =1}^M f^{\delta }_{d}(\bar{X}_t^{x,\delta ,(\ell )}(\omega _{\varepsilon ,d}))\mathcal{M}^{m,\delta }_{d}(t,x,B_t^{(\ell )}(\omega _{\varepsilon ,d})) \Big | \le \varepsilon . \ \ \end{aligned}$$

(4.44)

$\square $

Lemma 4

For $d\in {\mathbb {N}}$, $t \in (0,T]$ and $M\in {\mathbb {N}}$, let $B_t^{d,(\ell )}$, $\ell =1,\ldots ,M$ be independent identically distributed random variables such that $B_t^{d,(\ell )} \overset{\textrm{law}}{=} B_t^{d}$. There exist $\{ \phi _{\varepsilon ,d} \}_{\varepsilon \in (0,1),d \in {\mathbb {N}}} \subset {{\mathcal {N}}}$ and $c>0$ (which depends only on $a,b,C,m,\kappa ,t$ and $\lambda $) such that for all $\varepsilon \in (0,1)$, $d\in {\mathbb {N}}$, we have $\mathcal{C}(\phi _{\varepsilon ,d})\le c \varepsilon ^{-c}d^c$, and for a realization $\omega _{\varepsilon ,d} \in \Omega ^d$ given in Lemma 3, it holds that

$$\begin{aligned} \sup _{x \in [a,b]^d} \Big |\frac{1}{M} \sum _{\ell =1}^M f^{\delta }_{d}(\bar{X}_t^{d,\lambda ,x,\delta ,(\ell )}(\omega _{\varepsilon ,d}))\mathcal{M}^{m,\delta }_{d}(t,x,B_t^{d,(\ell )}(\omega _{\varepsilon ,d}))-\mathcal{R}(\phi _{\varepsilon ,d})(x) \Big | \le \varepsilon , \end{aligned}$$

(4.45)

where $\delta =O(\varepsilon ^{c_1}d^{-c_1})$ and $M=O(\varepsilon ^{-c_2}d^{c_2})$ with the constants $c_1$ and $c_2$ in Lemmas 2 and 3.

Proof

In the proof, we use a generic constant $c>0$ which depends only on $a,b,C,m,\kappa ,t$ and $\lambda $. Let $\varepsilon \in (0,1)$, $d\in {\mathbb {N}}$, $\ell =1,\ldots ,M$, let $\delta =O(\varepsilon ^{c_1} d^{-c_1})$, $M=O(\varepsilon ^{-c_2} d^{c_2})$ where $c_1$ and $c_2$ are the constants appearing in Lemmas 2 and 3, let $\omega _{\varepsilon ,d}$ be a realization given in Lemma 3, and let $b^{d,(\ell )}=B_t^{d,(\ell )}(\omega _{\varepsilon ,d})$. Since there exists $\eta _{\delta ,d}^{(\ell )} \in {{\mathcal {N}}}$ such that $\mathcal{R}(\eta ^{(\ell )}_{\delta ,d})(x)=x+\lambda \mathcal{R}(\psi _{\delta ,d}^{V_0})(x)t+\lambda \textstyle {\sum _{i=1}^d} \mathcal{R}(\psi _{\delta ,d}^{V_i})(x) b^{d,(\ell ),i}$ for $x \in {\mathbb {R}}^d$ and $\mathcal{C}(\eta ^{(\ell )}_{\delta ,d})=O(\delta ^{-c}d^c)$ (by Lemma 9 in Appendix B), there exists $\psi _{1,(\ell )}^{\delta ,d} \in {{\mathcal {N}}}$ such that $\mathcal{R}(\psi _{1,(\ell )}^{\delta ,d})(x)=\mathcal{R}(\psi _{\delta ,d}^{f})(\mathcal{R}(\eta ^{(\ell )}_{\delta ,d})(x))=f_{d}^\delta (\bar{X}_t^{d,\lambda ,x,\delta }(\omega _{\varepsilon ,d}))$ for $x \in {\mathbb {R}}^d$ and $\mathcal{C}(\psi _{1,(\ell )}^{\delta ,d})=O(\delta ^{-c}d^c)$ (by Lemma 10 in Appendix B). Next, we recall that by (4.31), the weight $\mathcal{M}^{m,\delta }_{d,\lambda }(t,x,b^{d,(\ell )})$, $x \in {\mathbb {R}}^d$ has the form ${{\mathcal {M}}}^{m,\delta }_{d,\lambda }(t,x,b^{d,(\ell )})= \textstyle {1+\sum _{e\le n(m)}} \lambda ^{p(e)} g_e(t) h^{\delta }_{e}(x)\textrm{Poly}_{e}(b^{d,(\ell )})$ constructed by some products of $A^{-1}_{d,\delta }$, $\{V^{\delta }_{d,i}\}_{0\le i \le d}$ and $\{V^{\delta }_{d,i,\alpha }\}_{0\le i \le d,\alpha \in \{1,\ldots ,d \}^{\ell },\ell \le 2m}$ in Assumption 2. Using Lemmas 12, 9 in Appendix B and Assumption 2, there is a neural network $\psi ^{\varepsilon ,d}_{2,(\ell )} \in {{\mathcal {N}}}$ such that $\textstyle {\sup _{x\in [a,b]^d}}|\mathcal{M}^{m,\delta }_{d,\lambda }(t,x,b^{d,(\ell )})-\mathcal{R}(\psi ^{\varepsilon ,d}_{2,(\ell )})(x)|\le \varepsilon /2$ and ${{\mathcal {C}}}(\psi ^{\varepsilon ,d}_{2,(\ell )})=O(\varepsilon ^{-c}d^c)$. Hence, we have

$$\begin{aligned} \sup _{x\in [a,b]^d}|f_{d}^\delta (\bar{X}_t^{d,\lambda ,x,\delta ,(\ell )}(\omega _{\varepsilon ,d}))\mathcal{M}^{m,\delta }_{d,\lambda }(t,x,b^{d,(\ell )})-\mathcal{R}(\psi _{1,(\ell )}^{\delta ,d})(x)\mathcal{R}(\psi _{2,(\ell )}^{\varepsilon ,d})(x)|\le \varepsilon /2. \end{aligned}$$

(4.46)

We again use Lemma 12 in Appendix B to see that there exists $\Psi _{(\ell )}^{\varepsilon ,d} \in {{\mathcal {N}}}$ such that

$$\begin{aligned}{} & {} |{{\mathcal {R}}}(\psi _{1,(\ell )}^{\delta ,d})(x)\mathcal{R}(\psi _{2,(\ell )}^{\varepsilon ,d})(x)-\mathcal{R}(\Psi _{(\ell )}^{\varepsilon ,d})(x)| \le \varepsilon /2, \end{aligned}$$

(4.47)

for all $x \in [a,b]^d$, and $\mathcal{C}(\Psi _{(\ell )}^{\varepsilon ,d})=O(\varepsilon ^{-c}d^{c})$. Finally, applying Lemma 9 gives the desired result, i.e. there exist $\{ \phi _{\varepsilon ,d} \}_{\varepsilon \in (0,1),d \in {\mathbb {N}}} \subset {{\mathcal {N}}}$ and $c>0$ such that for all $\varepsilon \in (0,1)$, $d\in {\mathbb {N}}$, we have $\mathcal{C}(\phi ^{\varepsilon ,d})\le c \varepsilon ^{-c}d^c$, and for a realization $\omega _{\varepsilon ,d} \in \Omega ^d$ given in Lemma 3, it holds that

$$\begin{aligned} \sup _{x \in [a,b]^d} \Big |\frac{1}{M} \sum _{\ell =1}^M f^{\delta }_{d}(\bar{X}_t^{d,\lambda ,x,\delta ,(\ell )}(\omega _{\varepsilon ,d}))\mathcal{M}^{m,\delta }_{d}(x,B_t^{d,(\ell )}(\omega _{\varepsilon ,d}))-\mathcal{R}(\phi _{\varepsilon ,d})(x) \Big | \le \varepsilon . \nonumber \\ \end{aligned}$$

(4.48)

$\square $

Proof

The first assertion (in (3.19)) follows from (4.29). The second assertion (in (3.20)) is obtained by combining Lemmas 2, 3 and 4. $\square $

5 Deep learning implementation

We briefly provide the implementation scheme for the approximation in Theorem 1. Let $\xi $ be a uniformly distributed random variable, i.e. $\xi \in U([a,b]^d)$, and define $\textstyle {{\mathbb {X}}_t^{\xi }=\xi +\lambda \sum _{i=0}^d V_{i,d}(\xi )B_t^{i,d}}$, $t \ge 0$. For $t>0$, the m-th order asymptotic expansion of Theorem 1 can be represented by

$$\begin{aligned} u^{m}(t,\cdot )=\textrm{argmin}_{\psi \in C([a,b]^d)} E[ | \psi (\xi )- f({\mathbb {X}}_t^{\xi }) {{\mathcal {M}}}^{m}_{d,\lambda }(t,\xi ,B_t^d) |^2 ], \end{aligned}$$

(5.1)

which is obtained by Theorem 1 of this paper combining with Proposition 2.2 of Beck et al. [2]. We construct a deep neural network $u^{{{\mathcal {N}}}{{\mathcal {N}}},\theta ^*}(t,\cdot )$ to approximate the function $u^{m}(t,\cdot )$ given by for a depth $L \in {\mathbb {N}}$ and $N_0,N_1,\ldots ,N_L \in {\mathbb {N}}$,

$$\begin{aligned} u^{{{\mathcal {N}}}{{\mathcal {N}}},\theta }(t,x)={{\mathcal {A}}}_{W^\theta _L,B^\theta _L} \circ \varrho _{N_{L-1}} \circ \mathcal{A}_{W^\theta _{L-1},B^\theta _{L-1}} \circ \cdots \circ \varrho _{N_{1}} \circ {{\mathcal {A}}}_{W^\theta _{1},B^\theta _{1}} (x), \ x \in {\mathbb {R}}^d, \end{aligned}$$

(5.2)

where ${{\mathcal {A}}}_{W^\theta _k,B^\theta _k}(x)=W^\theta _kx+B^\theta _k$, $x \in {\mathbb {R}}^{N_{k-1}}$, $k=1,\ldots ,L$ with $((W^\theta _1,B^\theta _1),\ldots ,(W^\theta _L,B^\theta _L)) \in \mathcal{N}_L^{N_0,N_1,\ldots ,N_L}$ given by

$$\begin{aligned}&{{\mathcal {A}}}_{W^\theta _k,B^\theta _k}(x) =\left( \begin{array}{ccc} \theta ^{q+1} &{} \cdots &{} \theta ^{q+N_{k-1}} \\ \vdots &{} \ddots &{} \vdots \\ \theta ^{q+(N_{k}-1)N_{k-1}+1} &{} \cdots &{} \theta ^{q+N_{k} N_{k-1}} \\ \end{array} \right) \left( \begin{array}{c} x_1 \\ \vdots \\ x_{N_{k-1}} \\ \end{array} \right) +\left( \begin{array}{c} \theta ^{q+N_{k} N_{k-1}+1} \\ \vdots \\ \theta ^{q+N_{k} N_{k-1}+N_{k}} \\ \end{array} \right) , \end{aligned}$$

(5.3)

and the optimized parameter $\theta ^*$ obtained by the following minimization problem:

$$\begin{aligned} \theta ^*=\textrm{argmin}_{\theta \in {\mathbb {R}}^{\sum _{\ell =1}^L N_{\ell }(N_{\ell -1}+1)}} E[ | u^{{{\mathcal {N}}}{{\mathcal {N}}},\theta }(t,\xi )- f({\mathbb {X}}_t^{\xi }) {{\mathcal {M}}}^{m}_{d,\lambda }(t,\xi ,B_t^d) |^2 ]. \end{aligned}$$

(5.4)

In the implementation of the deep neural network approximation, we use stochastic gradient descent method and the Adam optimizer [20] as in Sects. 3 and 4 of Beck et al. [2]. In Appendix C, we list the sample code of the scheme for a high-dimensional PDE with a nonlinear coefficient in Sect. 6.2 (which includes linear coefficient case).

6 Numerical examples

In the section, we perform numerical experiments in order to demonstrate the accuracy of our scheme. We compare the deep learning method of Beck et al. [2] where the Euler–Maruyama scheme is used with the stochastic gradient descent method with the Adam optimizer. All experiments are performed in Google Colaboratory using Tensorflow.

6.1 High-dimensional Black–Scholes model

6.1.1 Uncorrelated case

First, we examine our scheme for a high-dimensional Black–Scholes model (geometric Brownian motion) whose corresponding PDE is given by

$$\begin{aligned} \partial _t u_\lambda ^d(t,x)=\lambda \sum _{i=1}^d \mu x_i \frac{\partial }{\partial x_i}u_\lambda ^d(t,x) + \frac{\lambda ^2}{2} \sum _{i=1}^d c_i^2 x_i^2 \frac{\partial ^2}{\partial x_i^2} u_\lambda ^d(t,x), \ \ u_\lambda ^d(0,x)=f_{d}(x), \end{aligned}$$

(6.1)

where $f_d(x)=\max \{ \max \{x_1-K \},\ldots ,\max \{x_d-K \} \}$. Let $d=100$, $t=1.0$, $a=99.0$, $b=101.0$, $K=100.0$, $\lambda =0.3$, $\mu =1/30$ (or $r:=\lambda \times \mu =0.01$), $c_i=1.0$ (or $\sigma _i:=\lambda \times c_i=0.3$), $i=1,\ldots ,100$. We approximate the function $u_\lambda ^d(t,\cdot )$ (or the maximum option price $e^{-rt}u_\lambda ^d(t,\cdot )$ in financial mathematics) on $[a,b]^d$ by constructing a deep neural network (1 input layer with d-neurons, 2 hidden layers with 2d-neurons each and 1 output layer with 1-neuron) based on Theorem 1 with $m=1$ and Sect. 5. For the experiment, we use the batch size $M=1,024$, the number of iteration steps $J=5,000$ and the learning rate $\gamma (j)=10^{-1}{} \textbf{1}_{[0,0.3J]}(j)+10^{-2}\textbf{1}_{(0.3J,0.6J]}(j)+10^{-3}{} \textbf{1}_{(0.6J,J]}(j)$, $j \le J$ for the stochastic gradient descent method. After we estimate the function $u_\lambda ^d(t,\cdot )$, we input $x_0=(100.0,\ldots ,100.0) \in [a,b]^d$ to check the accuracy. We compute the mean of 10 independent trials and estimate the relative error, i.e. $|(u_\lambda ^{deep,d}(t,x_0)-u_\lambda ^{ref,d}(t,x_0))/u_\lambda ^{ref,d}(t,x_0)|$ where the reference value $u_\lambda ^{ref,d}(t,x_0)$ is computed by the Itô formula with Monte-Carlo method with $10^7$-paths. The same experiment is applied to the method of Beck et al. [2]. Table 1 provides the numerical results (the relative errors and the runtimes) for AE $m=1$ and the method in Beck et al. [2] with the Euler–Maruyama discretization $n=16$, 32 (Beck et al. $n=16$, Beck et al. $n=32$ in the table).

Table 1 Comparison in deep learning methods for $d=100$

Full size table

6.1.2 Correlated case

We next provide a numerical example for a Black-Scholes model with correlated noise in high-dimension. Let us consider the following PDE:

$$\begin{aligned} \partial _t u_\lambda ^d(t,x){=} \lambda \sum _{i=1}^d \mu x_i \frac{\partial }{\partial x_i}u_\lambda ^d(t,x){+}\frac{\lambda ^2}{2} \sum _{i,j,k{=}1}^d \sigma _k^i \sigma _k^j x_i x_j \frac{\partial ^2}{\partial x_i \partial x_j} u_\lambda ^d(t,x), \ \ u_\lambda ^d(0,x){=}f_{d}(x),\nonumber \\ \end{aligned}$$

(6.2)

where $f_d(x)=\max \{ K-\textstyle {\frac{1}{d}\sum _{i=1}^d x_i},0 \}$ and $\sigma =[\sigma _k^j]_{k,j} \in {\mathbb {R}}^{d \times d}$ satisfies $\sigma _{ij}=0$ for $i<j$, $\sigma _{ii}>0$ for $i=1,\ldots ,d$ and

$$\begin{aligned} \sigma \sigma ^\top =\left( \begin{array}{cccc} 1&{}\rho &{}\cdots &{}\rho \\ \rho &{}1&{}\rho &{}\rho \\ \vdots &{}\vdots &{}\ddots &{}\vdots \\ \rho &{}\rho &{}\rho &{}1 \end{array}\right) \in {\mathbb {R}}^{d \times d}. \end{aligned}$$

(6.3)

Let $d=100$, $t=1.0$, $a=99.0$, $b=101.0$, $K=90.0$, $\lambda =0.3$, $\mu =0.0$, $\rho =0.5$. We approximate the function $u_\lambda ^d(t,\cdot )$ (the basket option price in financial mathematics) on $[a,b]^d$ by constructing a deep neural network (1 input layer with d-neurons, 2 hidden layers with 2d-neurons each and 1 output layer with 1-neuron) based on Theorem 1 ($m=1$) with the expansion technique of the basket option price given in Section 3.1 of Takahashi [32] and Sect. 5. For the experiment, we use the batch size $M=1,024$, the number of iteration steps $J=5,000$ and the learning rate $\gamma (j)=5.0\times 10^{-2}{} \textbf{1}_{[0,0.3J]}(j)+5.0\times 10^{-3}\textbf{1}_{(0.3J,0.6J]}(j)+5.0\times 10^{-4}{} \textbf{1}_{(0.6J,J]}(j)$, $j\le J$ for the stochastic gradient descent method. After we estimate the function $u_\lambda ^d(t,\cdot )$, we input $x_0=(100.0,\ldots ,100.0) \in [a,b]^d$ to check the accuracy. We compute the mean of 10 independent trials and estimate the relative error, i.e. $|(u_\lambda ^{deep,d}(t,x_0)-u_\lambda ^{ref,d}(t,x_0))/u_\lambda ^{ref,d}(t,x_0)|$ where the reference value $u_\lambda ^{ref,d}(t,x_0)$ is computed by the Itô formula with Monte-Carlo method with $10^7$-paths. The same experiment is applied to the method of Beck et al. [2]. Table 2 provides the numerical results (the relative errors and the runtimes) for AE $m=1$ and the method in Beck et al. [2] with the Euler–Maruyama discretization $n=32$, 64 (Beck et al. $n=32$, Beck et al. $n=64$ in the table).

Table 2 Comparison in deep learning methods for $d=100$

Full size table

6.2 High-dimensional CEV model (nonlinear volatility case)

We consider a Kolmogorov PDE with nonlinear diffusion coefficients whose corresponding stochastic process is called the CEV model:

$$\begin{aligned} \partial _t u_\lambda ^d(t,x)=\lambda \sum _{i=1}^d \mu x_i \frac{\partial }{\partial x_i}u_\lambda ^d(t,x) + \frac{\lambda ^2 }{2} \sum _{i=1}^d \gamma _i^2 c_i^2 x_i^{2\beta _i} \frac{\partial ^2}{\partial x_{i}^2} u_\lambda ^d(t,x), \ \ u_\lambda ^d(0,x)=f_{d}(x),\nonumber \\ \end{aligned}$$

(6.4)

where $f_d(x)=\max \{ \max \{x_1-K \},\ldots ,\max \{x_d-K \} \}$. Let $d=100$, $t=1.0$, $a=99.0$, $b=101.0$, $K=100.0$, $\lambda =0.3$, $\mu =1/30$ (or $r:=\lambda \times \mu =0.01$), $\beta _i=0.5$, $\gamma _i=K^{1-\beta _i}$, $c_i=1.0$ (or $\sigma _i:=\lambda \times c_i=0.3$), $i=1,\ldots ,d$. We approximate the function $u_\lambda ^d(t,\cdot )$ (or the maximum option price $e^{-rt}u_\lambda ^d(t,\cdot )$) on $[a,b]^d$ by constructing a deep neural network (1 input layer with d-neurons, 2 hidden layers with 2d-neurons each and 1 output layer with 1-neuron,) based on Theorem 1 with $m=1$. For the experiment, we use the batch size $M=1024$, the number of iteration steps $J=5000$ and the learning rate $\gamma (j)=5.0\times 10^{-1}\textbf{1}_{[0,0.3J]}(j)+5.0\times 10^{-2}{} \textbf{1}_{(0.3J,0.6J]}(j)+5.0\times 10^{-3}{} \textbf{1}_{(0.6J,J]}(j)$, $j \le J$ for the stochastic gradient descent method. After we estimate the function $u_\lambda ^d(t,\cdot )$, we input $x_0=(100.0,\ldots ,100.0) \in [a,b]^d$ to check the accuracy. We compute the mean of 10 independent trials and estimate the relative error, i.e. $|(u_\lambda ^{deep,d}(t,x_0)-u_\lambda ^{ref,d}(t,x_0))/u_\lambda ^{ref,d}(t,x_0)|$ where the reference value $u_\lambda ^{ref,d}(t,x_0)$ is computed by Monte-Carlo method with the Euler–Maruyama scheme with time-steps $2^{10}$ and $10^7$-paths. The same experiment is applied to the method of Beck et al. [2]. Table 3 provides the numerical results (the relative errors and the runtimes) for AE $m=1$ and the method in Beck et al. [2] with the Euler-Maruyama discretization $n=32$, 64 (Beck et al. $n=32$, Beck et al. $n=64$ in the table).

Table 3 Comparison in deep learning methods for $d=100$

Full size table

6.3 High-dimensional Heston model

We finally show an example for a small time asymptotic expansion for a high-dimensional Heston model:

$$\begin{aligned} \partial _t u_\lambda ^{2d}(t,x)={{\mathcal {L}}}^{2d,\lambda } u_\lambda ^{2d}(t,x), \ \ u_\lambda ^{2d}(0,x)=f_{2d}(x), \end{aligned}$$

(6.5)

where $f_{2d}(x)=\max \{ \max \{x_1-K \},\ldots ,\max \{x_{2d-1}-K \} \}$ and ${{\mathcal {L}}}^{2d,\lambda }$ is a generator given by

$$\begin{aligned} {{\mathcal {L}}}^{2d,\lambda }&= \lambda \sum _{i=1}^d \left[ \kappa _{i} (\theta _{i}-x_{2i}) \frac{\partial }{\partial x_{2i}}\right] \nonumber \\&\quad +\lambda ^2 \sum _{i=1}^d \left[ \frac{1}{2} x_{2i} x_{2i-1}^2 \frac{\partial ^2}{\partial x_{2i-1}^2} + \rho _i \nu _i x_{2i-1} x_{2i} \frac{\partial ^2}{\partial x_{2j-1} \partial x_{2i}}+\frac{1}{2} \nu _i^2 x_2 \frac{\partial ^2}{\partial x_{2i}^2}\right] . \end{aligned}$$

(6.6)

Let $d=25$ ($2d=50$), $t=0.5$, $a=99.0$, $b=101.0$, $a'=0.035$, $b'=0.045$, $K=100.0$, $\lambda =1.0$, $\kappa _i=1.0$, $\theta _i=0.04$, $\nu _i=0.1$, $\rho _i=-0.5$, $i=1,\ldots ,d$. We approximate the function $u_\lambda ^d(t,\cdot )$ on $[a,b]^d$ by constructing a deep neural network (1 input layer with 2d-neurons, 2 hidden layers with 4d-neurons each and 1 output layer with 1-neuron) based on Theorem 1 with $m=1$ and Sect. 5. For the experiment, we use the batch size $M=1,024$, the number of iteration steps $J=5,000$ and the learning rate $\gamma (j)=5.0\times 10^{-2}{} \textbf{1}_{[0,0.3J]}(j)+5.0\times 10^{-3}{} \textbf{1}_{(0.3J,0.6J]}(j)+5.0\times 10^{-4}\textbf{1}_{(0.6J,J]}(j)$, $j \le J$ for the stochastic gradient descent method. After we estimate the function $u_\lambda ^d(t,\cdot )$, we input $x_0=(100.0,0.04,\ldots ,100.0,0.04) \in ([a,b] \times [a',b'])^d$ to check the accuracy. We compute the mean of 10 independent trials and estimate the relative error, i.e. $|(u_\lambda ^{deep,d}(t,x_0)-u_\lambda ^{ref,d}(t,x_0))/u_\lambda ^{ref,d}(t,x_0)|$ where the reference value $u_\lambda ^{ref,d}(t,x_0)$ is computed by Monte-Carlo method with the Euler–Maruyama scheme with time-steps $2^{10}$ and $10^7$-paths. The same experiment is applied to the method of Beck et al. [2]. Table 4 provides the numerical results (the relative errors and the runtimes) for AE $m=1$ and the method in Beck et al. [2] with the Euler–Maruyama discretization $n=16$, 32 (Beck et al. $n=16$, Beck et al. $n=32$ in the table).

Table 4 Comparison in deep learning methods for $2d=50$

Full size table

7 Conclusion

In the paper, we introduced a new spatial approximation for solving high-dimensional PDEs without the curse of dimensionality, where an asymptotic expansion method with a deep learning-based algorithm is effectively applied. The mathematical justification for the spatial approximation was provided using Malliavin calculus and ReLU calculus. We checked the effectiveness of our method through numerical examples for high-dimensional Kolmogorov PDEs.

More accurate deep learning-based implementations based on the method of the paper should be studied as a next research topic. We believe that higher order asymptotic expansion or higher order weak approximation (discretization) will give robust computation schemes without the curse of dimensionality, which should be proved mathematically in the future work. Also, applying our method to nonlinear problems as in [14, 15] will be a challenging and important task.

Data Availability Statement

The manuscript has no associated real data.

References

Bally, V.: An elementary introduction to Malliavin calculus. INRIA (2003)
Beck, C., Becker, S., Grohs, P., Jaafari, N., Jentzen, A.: Solving the Kolmogorov PDE by means of deep learning. J. Sci. Comput. 88, 73 (2021)
Article MathSciNet MATH Google Scholar
Beck, C., Hutzenthaler, M., Jentzen, A., Kuckuck, B.: An overview on deep learning-based approximation methods for partial differential equations. Discrete Contin. Dyn. Syst. B 28(6), 3697–3746 (2023)
Berner, J., Grohs, P., Jentzen, A.: Analysis of the generalization error: empirical risk minimization over deep artificial neural networks overcomes the curse of dimensionality in the numerical approximation of Black–Scholes partial differential equations. SIAM J. Math. Data Sci. 2(3), 631–657 (2020)
Article MathSciNet MATH Google Scholar
Elbrac̈hter, D., Perekrestenko, D., Grohs, P., Bölcskei, H.: Deep neural network approximation theory. IEEE Trans. Inf. Theory 67(5) (2021)
Elbrac̈hter, D., Grohs, P., Jentzen, A., Schwab, C.: DNN expression rate analysis of high-dimensional PDEs: application to option pricing. Constr. Approx. (2021)
Fournié, E., Lasry, J.M., Lebuchoux, J., Lions, P.L., Touzi, N.: Applications of Malliavin calculus to Monte Carlo methods in finance. Finance Stoch. 3(4), 391–412 (1999)
Article MathSciNet MATH Google Scholar
Fujii, M., Takahashi, A., Takahashi, M.: Asymptotic expansion as prior knowledge in deep learning method for high dimensional BSDEs. Asia Pac. Financ. Mark. (2019)
Germain, M., Pham, H., Warin, X.: Approximation error analysis of some deep backward schemes for nonlinear PDEs. SIAM J. Sci. Comput. 44(1) (2022)
Gobet, E., Munos, R.: Sensitivity analysis using Itô–Malliavin calculus and martingales, and application to stochastic optimal control. SIAM J. Control. Optim. 43(5), 1676–1713 (2005)
Article MathSciNet MATH Google Scholar
Gonon, L., Grohs, P., Jentzen, A., Kofler, D., Siska, D.: Uniform error estimates for artificial neural network approximations for heat equations. IMA J. Numer. Anal. (2021)
Grohs, P., Hornung, F., Jentzen, A., Wurstemberger, P.: A proof that artificial neural networks overcome the curse of dimensionality in the numerical approximation of Black–Scholes partial differential equations. Mem. Amer. Math. Soc. (2021)
Huré, C., Pham, H., Warin, X.: Deep backward schemes for high-dimensional nonlinear PDEs. Math. Comput. 89, 1547–1579 (2020)
Article MathSciNet MATH Google Scholar
Hutzenthaler, M., Jentzen, A., Kruse, T., Nguyen, T.A.: A proof that rectified deep neural networks overcome the curse of dimensionality in the numerical approximation of semilinear heat equations. SN Partial Differ. Equ. Appl. 1, 1–34 (2020)
Article MathSciNet MATH Google Scholar
Hutzenthaler, M., Jentzen, A., Kruse, T., Nguyen, T.A., von Wurstemberger, P.: Overcoming the curse of dimensionality in the numerical approximation of semilinear parabolic partial differential equations. Proc. R. Soc. A (2020)
Ikeda, N., Watanabe, S.: Stochastic Differential Equations and Diffusion Processes, 2nd edn. North-Holland, Amsterdam (1989)
MATH Google Scholar
Iguchi, Y., Yamada, T.: A second order discretization for degenerate systems of stochastic differential equations. IMA J. Numer. Anal. 41(4), 2782–2829 (2021)
Article MathSciNet MATH Google Scholar
Iguchi, Y., Yamada, T.: Operator splitting around Euler–Maruyama scheme and high order discretization of heat kernels. ESAIM Math. Model. Numer. Anal. 55, 323–367 (2021)
Article MathSciNet MATH Google Scholar
Jentzen, A., Salimova, D., Welti, T.: A proof that deep artificial neural networks overcome the curse of dimensionality in the numerical approximation of Kolmogorov partial differential equations with constant diffusion and nonlinear drift coefficients. Commun. Math. Sci. (2021)
Kingma, D., Ba, J.: Adam: a method for stochastic optimization. In: Proceedings of the International Conference on Learning Representations (ICLR) (2015)
Kloeden, P.E., Platen, E.: Numerical Solution of Stochastic Differential Equations. Springer, Berlin (1992)
Book MATH Google Scholar
Kusuoka, S., Stroock, D.: Applications of the Malliavin calculus Part I. In: Stochastic Analysis (Katata/Kyoto 1982), pp. 271–306 (1984)
Kunitomo, N., Takahashi, A.: The asymptotic expansion approach to the valuation of interest rate contingent claims. Math. Financ. 11, 117–151 (2001)
Article MathSciNet MATH Google Scholar
Kunitomo, N., Takahashi, A.: On validity of the asymptotic expansion approach in contingent claim analysis. Ann. Appl. Probab. 13(3), 914–952 (2003)
Article MathSciNet MATH Google Scholar
Malliavin, P.: Stochastic Analysis. Springer, Berlin (1997)
Book MATH Google Scholar
Malliavin, P., Thalmaier, A.: Stochastic Calculus of Variations in Mathematical Finance. Springer, Berlin (2006)
MATH Google Scholar
Naito, R., Yamada, T.: A third-order weak approximation of multidimensional Itô stochastic differential equations. Monte Carlo Methods Appl. 25(2), 97–120 (2019)
Article MathSciNet MATH Google Scholar
Naito, R., Yamada, T.: A higher order weak approximation of McKean–Vlasov type SDEs. BIT Numer. Math. 62, 521–559 (2021)
Article MathSciNet MATH Google Scholar
Nualart, D.: The Malliavin Calculus and Related Topics. Springer, Berlin (2006)
MATH Google Scholar
Okano, Y., Yamada, T.: A control variate method for weak approximation of SDEs via discretization of numerical error of asymptotic expansion. Monte Carlo Methods Appl. 25(3) (2019)
Reisinger, C., Zhang, Y.: Rectified deep neural networks overcome the curse of dimensionality for nonsmooth value functions in zero-sum games of nonlinear stiff systems. Anal. Appl. 18(6), 951–999 (2020)
Article MathSciNet MATH Google Scholar
Takahashi, A.: An asymptotic expansion approach to pricing financial contingent claims. Asia-Pac. Financ. Mark. 6(2), 115–151 (1999)
Article MATH Google Scholar
Takahashi, A.: Asymptotic expansion approach in finance. In: Friz, P., Gatheral, J., Gulisashvili, A., Jacquier, A., Teichmann, J. (eds.) Large Deviations and Asymptotic Methods in Finance. Springer Proceedings in Mathematics & Statistics (2015)
Takahashi, A., Tsuchida, Y., Yamada, T.: A new efficient approximation scheme for solving high-dimensional semilinear PDEs: control variate method for Deep BSDE solver. J. Comput. Phys. 454, 110956 (2022)
Article MathSciNet MATH Google Scholar
Takahashi, A., Yamada, T.: An asymptotic expansion with push-down of Malliavin weights. SIAM J. Financ. Math. 3, 95–136 (2012)
Article MathSciNet MATH Google Scholar
Takahashi, A., Yamada, T.: A remark on approximation of the solutions to partial differential equations in finance. Recent Adv. Financ. Eng. 2011, 133–181 (2012)
Google Scholar
Takahashi, A., Yamada, T.: On error estimates for asymptotic expansions with Malliavin weights: application to stochastic volatility model. Math. Oper. Res. 40(3), 513–551 (2015)
Article MathSciNet MATH Google Scholar
Takahashi, A., Yamada, T.: A weak approximation with asymptotic expansion and multidimensional Malliavin weights. Ann. Appl. Probab. 26(2), 818–856 (2016)
Article MathSciNet MATH Google Scholar
Takahashi, A., Yoshida, N.: Monte Carlo simulation with asymptotic method. J. Japan Stat. Soc. 35(2), 171–203 (2005)
Article MathSciNet Google Scholar
Watanabe, S.: Analysis of Wiener functionals (Malliavin calculus) and its applications to heat kernels. Ann. Probab. 15, 1–39 (1987)
Article MathSciNet MATH Google Scholar
Yamada, T.: An arbitrary high order weak approximation of SDE and Malliavin Monte Carlo: application to probability distribution functions. SIAM J. Numer. Anal. 57(2), 563–591 (2019)
Article MathSciNet MATH Google Scholar
Yamada, T., Yamamoto, K.: Second order discretization of Bismut–Elworthy–Li formula: application to sensitivity analysis. SIAM/ASA J. Uncertain. Quantif. 7(1), 143–173 (2019)
Article MathSciNet MATH Google Scholar
Yoshida, N.: Asymptotic expansions of maximum likelihood estimators for small diffusions via the theory of Malliavin–Watanabe. Probab. Theory Relat. Fields 92, 275–311 (1992)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

This work is supported by JST PRESTO (Grant Number JPMJPR2029), Japan. The authors thank Riu Naito for his great help in numerical computation of the proposed method. We also thank two anonymous reviewers for valuable comments and suggestions.

Author information

Authors and Affiliations

The University of Tokyo, Tokyo, Japan
Akihiko Takahashi
Hitotsubashi University, Tokyo, Japan
Toshihiro Yamada
Japan Science and Technology Agency (JST), Tokyo, Japan
Toshihiro Yamada

Authors

Akihiko Takahashi
View author publications
You can also search for this author in PubMed Google Scholar
Toshihiro Yamada
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Toshihiro Yamada.

Additional information

The title of the first version of the paper is “Asymptotic expansion and deep neural networks overcome the curse of dimensionality in the numerical approximation of Kolmogorov partial differential equations with nonlinear coefficients”.

This article is part of the section “Computational Approaches” edited by Siddhartha Mishra.

Appendices

Appendix A: Malliavin calculus

In the following, we provide precise estimates of Wiener functionals, which are useful for proving and computing the deep learning-based approximation with our asymptotic expansion.

Lemma 5

Let $d \in {\mathbb {N}}$, $F \in ({\mathbb {D}}^{\infty }(\Omega ^d))^d$ be a non-degenerate Wiener functional, $G \in {\mathbb {D}}^{\infty }(\Omega ^d)$, $\alpha =(\alpha _1,\ldots ,\alpha _{\ell }) \in \{1,\ldots ,d \}^\ell $ with length $\ell \in {\mathbb {N}}$. For $k \in {\mathbb {N}} \cup \{ 0 \}$ and $p\ge 1$, there exist $c=c(k,p)>0$, $q_1=q_1(k,p)>1$, $q_2=q_2(k,p,d)>1$, $q_3=q_3(k,p)>1$ and $r=r(k) \in {\mathbb {N}}$ such that

$$\begin{aligned} \Vert H_{\alpha }(F,G) \Vert _{k,p} \le c d^c \Vert \det (\sigma ^{F})^{-1} \Vert _{q_1}^{r} \Vert DF \Vert _{k+|\alpha |,q_2,H^d}^{2dr-|\alpha |} \Vert G \Vert _{k+|\alpha |,q_3}. \end{aligned}$$

(A.1)

Proof

For $i \in \{1,\ldots ,d \}$, we have

$$\begin{aligned} \Vert H_{(i)}(F,G) \Vert _{k,p} \le \sum _{j=1}^d \Vert \delta ([\sigma ^{F}]^{-1}_{ij} DF^j G) \Vert _{k,p} \le c_{k,p} \sum _{j=1}^d \Vert [\sigma ^{F}]^{-1}_{ij} DF^j G \Vert _{k+1,p,H^d},\nonumber \\ \end{aligned}$$

(A.2)

for some universal constant $c_{k,p}>0$. Let $p_1$ and $p_2$ be real numbers such that $p_1^{-1}+p_2^{-1}=p^{-1}$. Hereafter, we use a generic constant $C>0$ such that $C=cd^c$ for some $c>0$ depending on k and p, whose value varies from line to line. Since it holds that

$$\begin{aligned} \Vert [\sigma ^{F}]^{-1}_{ij} DF^j \Vert _{k+1,p_1,H^d} \le C \Vert \det (\sigma ^{F})^{-1} \Vert _{2(k+2)p_1}^{e} \Vert DF \Vert _{k+1,2(2d(k+2)-1)p_1,H^d}^{2de-1}, \end{aligned}$$

(A.3)

for some $e \in {\mathbb {N}}$ depending on k, we have

$$\begin{aligned} \Vert H_{(i)}(F,G) \Vert _{k,p} \le C \Vert \det (\sigma ^{F})^{-1} \Vert _{2(k+2)p_1}^{e} \Vert DF \Vert _{k+1,2(2d(k+2)-1)p_1,H^d}^{2de-1} \Vert G \Vert _{k+1,p_2}. \end{aligned}$$

(A.4)

For $\alpha =(\alpha _1,\ldots ,\alpha _{\ell }) \in \{1,\ldots ,d \}^\ell $, we have

$$\begin{aligned}&\Vert H_{(\alpha _1,\ldots ,\alpha _{\ell })}(F,G) \Vert _{k,p}=\Vert H_{(\alpha _\ell )}(F,H_{(\alpha _1,\ldots ,\alpha _{\ell -1})}(F,G)) \Vert _{k,p} \nonumber \\&\quad \le C \Vert \det (\sigma ^{F})^{-1} \Vert _{2(k+2)p_1}^{e} \Vert DF \Vert _{k+1,2(2d(k+2)-1)p_1,H^d}^{2de-1} \Vert H_{(\alpha _1,\ldots ,\alpha _{\ell -1})}(F,G) \Vert _{k+1,p_2}. \end{aligned}$$

(A.5)

Then, iterating this procedure, we have that for $k \in {\mathbb {N}} \cup \{ 0 \}$ and $p\ge 1$, there exist $q_1,q_2,q_3>1$ and $r \in {\mathbb {N}}$ such that

$$\begin{aligned} \Vert H_{\alpha }(F,G) \Vert _{k,p} \le C \Vert \det (\sigma ^{F})^{-1} \Vert _{q_1}^{r} \Vert DF \Vert _{k+|\alpha |,q_2,H^d}^{2dr-|\alpha |} \Vert G \Vert _{k+|\alpha |,q_3}. \ \ \ \end{aligned}$$

(A.6)

$\square $

Lemma 6

For $d \in {\mathbb {N}}$, $i=1,2$, let $\{ G_t^{d,x,i} \}_{t \in (0,T],x\in {\mathbb {R}}^d} \subset {\mathbb {D}}^\infty (\Omega ^d)$ satisfy that for $k \ge 1$ and $p \in [1,\infty )$, there exist $c_i,s_i>0$ independent of d such that $\textstyle {\sup _{x \in [a,b]^d}}\Vert G_t^{d,x,i} \Vert _{k,p} \le c_i d^{c_i} t^{s_i/2}$ for all $t \in (0,T]$. Then, we have that for $k \ge 1$ and $p \in [1,\infty )$, there exists c independent of d such that for all $t \in (0,T]$, $\textstyle {\sup _{x \in [a,b]^d}}\Vert \prod _{i=1}^2 G_t^{d,x,i} \Vert _{k,p} \le r d^r t^{(s_1+s_2)/2}$ and $\textstyle {\sup _{x \in [a,b]^d}}\Vert \sum _{i=1}^2 G_t^{d,x,i} \Vert _{k,p} \le c d^c t^{ \min \{ s_1,s_2\}/2}$.

Proof

We only prove the former case. By Proposition 1.5.6 of Nualart [29], for $k \ge 1$ and $p \in [1,\infty )$, $\Vert \textstyle {\prod _{i=1}^2} G_t^{d,x,i} \Vert _{k,p} \le c_{k,p} \Vert G_t^{d,x,1} \Vert _{k,p_1} \Vert G_t^{d,x,2} \Vert _{k,p_2}$ for some constant $c_{k,p}>0$ depending only on k and p, where $p_1,p_2>1$ satisfies $1/p_1+1/p_2=1/p$. Then, by the assumptions, $\textstyle {\sup _{x \in [a,b]^d}}\Vert \prod _{i=1}^2 G_t^{d,x,i} \Vert _{k,p} \le r d^r t^{(s_1+s_2)/2}$. $\square $

Lemma 7

For $d \in {\mathbb {N}}$, let $\{ u_t^{d,x} \}_{t \in (0,T],x\in {\mathbb {R}}^d} \subset {\mathbb {D}}^\infty (\Omega ^d)$ satisfy that for $t \in (0,T]$, $x \in {\mathbb {R}}^d$, $j=1,\ldots ,d$, $\textstyle {\int _0^t u_s^{d,x} dB_s^{d,j}} \in {\mathbb {D}}^\infty (\Omega ^d)$ and that for $k \ge 1$ and $p \in [1,\infty )$, there exist $q,\nu >0$ independent of d such that $\textstyle {\sup _{x \in [a,b]^d}}\Vert u_t^{d,x} \Vert _{k,p} \le q d^{q} t^{\nu /2}$ for all $t \in (0,T]$. Then, for $k \ge 1$ and $p \in [1,\infty )$, there exists $c>0$ independent of d such that for all $t \in (0,T]$, $\textstyle {\sup _{x \in [a,b]^d}}\Vert \int _0^t u_s^{d,x} dB_s^{d,0} \Vert _{k,p} \le c d^{c} t^{(\nu +2)/2}$ and for $j=1,\ldots ,d$, $\textstyle {\sup _{x \in [a,b]^d}}\Vert \int _0^t u_s^{d,x} dB_s^{d,j} \Vert _{k,p} \le c d^{c} t^{(\nu +1)/2}$.

Proof

We only prove the latter case. Note that for $r=1,\ldots ,k$, $D^r \textstyle {\int _0^t u_s^{d,x} dB_s^{d,j}}=D^{r-1}u_\cdot ^{d,x}+\textstyle {\int _0^t D^r u_s^{d,x} dB_s^{d,j}}$. Then, it holds that $\textstyle { E[\Vert D^r \int _0^t u_s^{d,x} dB_s^{d,j} \Vert ^p_{(H^d)^{\otimes r}}]}{=E[\Vert D^{r-1} u_\cdot ^{d,x} \Vert ^p_{(H^d)^{\otimes r}}]}$ $+\textstyle { E[\Vert \int _0^t D^r u_s^{d,x} dB_s^{d,j} \Vert ^p_{(H^d)^{\otimes r}}]}$. Here, $E[\Vert D^{r-1} u_\cdot ^{d,x} \Vert ^p_{(H^d)^{\otimes r}}] \le \eta d^\eta t^{p-1} \textstyle {\int _0^t} E[ \Vert D^{r-1} u_s^{d,x} \Vert ^p_{(H^d)^{\otimes (r-1)}}] ds$ for some $\eta $ (independent of d) and $E[\Vert \textstyle {\int _0^t} D^r u_s^{d,x} dB_s^{d,j} \Vert ^p_{(H^d)^{\otimes r}}] \le c_p t^{p/2-1} \textstyle {\int _0^t} E[ \Vert D^{r} u_s^{d,x} \Vert ^p_{(H^d)^{\otimes r}}] ds$ for some $c_p>0$ (independent of d) by Hölder inequality and Burkholder-Davis-Gundy inequality. By the assumptions, $\textstyle {\sup _{x \in [a,b]^d}E[\Vert D^{r-1} u_\cdot ^{d,x} \Vert ^p_{(H^d)^{\otimes r}}]} \le \eta d^\eta t^{p-1} \textstyle {\int _0^t} q^pd^{pq} s^{p\nu /2} ds \le cd^c t^{p(\nu /2+1)}$ and $\textstyle {\sup _{x \in [a,b]^d}} E[\Vert \textstyle {\int _0^t} D^r u_s^{d,x} dB_s^{d,j} \Vert ^p_{(H^d)^{\otimes r}}] \le c_p t^{p/2-1} \textstyle {\int _0^t} q^pd^{pq} s^{p\nu /2} ds \le cd^c t^{p(\nu +1)/2}$. Then, we have $\textstyle {\sup _{x \in [a,b]^d}}\Vert \int _0^t u_s^{d,x} dB_s^{d,j} \Vert _{k,p} \le c d^{c} t^{(\nu +1)/2}$. $\square $

Appendix B: ReLU calculus

Appendix B gives some results on ReLU calculus which are basic in the analysis of our paper. We prepare the following result from Lemma A.7 of [5].

Lemma 8

Let $n,d,L \in {\mathbb {N}}$ and for $i =1,\ldots ,n$, let $d_i \in {\mathbb {N}}$ and $\phi _i \in {{\mathcal {N}}}$ with ${{\mathcal {L}}}(\phi _i)=L$, $\textrm{dim}_{\textrm{in}}(\phi _i)=d$ and $\textrm{dim}_{\textrm{out}}(\phi _i)=d_i$. Then, there exists $\psi \in {{\mathcal {N}}}$ such that ${{\mathcal {L}}}(\psi )=L$, ${{\mathcal {C}}}(\psi )\le \textstyle {\sum _{i=1}^n} {{\mathcal {C}}}(\phi _i)$, $\textrm{dim}_{\textrm{in}}(\psi )=d$ and $\textrm{dim}_{\textrm{out}}(\psi )=\textstyle {\sum _{i=1}^n} d_i$ and

$$\begin{aligned} {{\mathcal {R}}}(\psi )(x)=({{\mathcal {R}}}(\phi _1)(x),\ldots ,{{\mathcal {R}}}(\phi _n)(x)), \ \ \ x \in {\mathbb {R}}^{d}. \end{aligned}$$

(B.1)

Also, we list Lemma 5.1 in [12] and Lemma 5.3 in [6].

Lemma 9

Let $L,n,N_0,N_L \in {\mathbb {N}}$, $\{ a_\ell \}_{\ell =1}^n \subset {\mathbb {R}}$ and $\{ \phi _\ell \}_{\ell =1}^n \subset {{\mathcal {N}}}$ be DNNs such that ${{\mathcal {L}}}(\phi _\ell )=L$, $\textrm{dim}_{\textrm{in}}(\phi _\ell )=N_0$ and $\textrm{dim}_{\textrm{out}}(\phi _\ell )=N_L$ for $\ell =1,\ldots ,n$. Then, there exists $\psi \in {{\mathcal {N}}}$ such that ${{\mathcal {L}}}(\psi )=L$, ${{\mathcal {C}}}(\psi )\le n^2 {{\mathcal {C}}}(\phi _1)$ and

$$\begin{aligned} {{\mathcal {R}}}(\psi )(x)=\sum _{\ell =1}^n a_{\ell } {{\mathcal {R}}}(\phi _\ell )(x), \ \ \ x \in {\mathbb {R}}^{N_0}. \end{aligned}$$

(B.2)

Lemma 10

Let $L_1,L_2,N_0^1,N_0^2,N_{L_1}^1,N_{L_2}^2 \in {\mathbb {N}}$ and $\phi _1,\phi _2 \in {{\mathcal {N}}}$ be DNNs such that $\mathcal{L}(\phi _1)=L_1$, ${{\mathcal {L}}}(\phi _2)=L_2$, $\textrm{dim}_{\textrm{in}}(\phi _1)=N_0^1$, $\textrm{dim}_{\textrm{out}}(\phi _1)=N_{L_1}^1$, $\textrm{dim}_{\textrm{in}}(\phi _2)=N_0^2$, $\textrm{dim}_{\textrm{out}}(\phi _2)=N_{L_2}^2$ and $N_{L_2}^2=N_0^1$. Then, there exists $\psi \in {{\mathcal {N}}}$ such that ${{\mathcal {L}}}(\psi )=L_1+L_2$, ${{\mathcal {C}}}(\psi )\le 2(\mathcal{C}(\phi _1)+{{\mathcal {C}}}(\phi _2))$ and

$$\begin{aligned} {{\mathcal {R}}}(\psi )(x)={{\mathcal {R}}}(\phi _1)({{\mathcal {R}}}(\phi _2)(x)), \ \ \ x \in {\mathbb {R}}^{N_0^2}. \end{aligned}$$

(B.3)

The following result of Theorem 6.3 of [6] is useful.

Lemma 11

Let $M \in {\mathbb {N}} \cap [2,\infty )$ and $D \in [1,\infty )$. There exist DNNs $\{\psi _{\varepsilon }\}_{\varepsilon \in (0,1)} \subset {{\mathcal {N}}}$ and a constant $c>0$ (independent of M and D) such that for all $\varepsilon \in (0,1)$, $\mathcal{C}(\psi _{\varepsilon })\le c M (|\log (\varepsilon )|+M\log (D)+\log (M))$ and

$$\begin{aligned} \sup _{x_1,\ldots ,x_M \in [-D,D]}|\mathcal{R}(\psi _{\varepsilon })(x_1,\ldots ,x_M)-\prod _{i=1}^M x_i | \le \varepsilon . \end{aligned}$$

(B.4)

In our analysis, the next result will be applied.

Lemma 12

Let $a\in {\mathbb {R}}$, $b\in (a,\infty )$, $c>0$, $m \in {\mathbb {N}} \cap [2,\infty )$, $d,L \in {\mathbb {N}}$ and $\{ \phi _\ell \}_{\ell =1}^m \subset {{\mathcal {N}}}$ be DNNs such that for $i \in \{ 1,\ldots ,m \}$, ${{\mathcal {L}}}(\phi _i)=L$, $\textrm{dim}_{\textrm{in}}(\phi _i)=d$, $\textrm{dim}_{\textrm{out}}(\phi _i)=1$, ${{\mathcal {C}}}(\phi _i)\le cd^c$ and $\textstyle {\sup _{x\in [a,b]^d}}|{{\mathcal {R}}}(\phi _i)(x)|\le c d^c$. Then, there exist $\{ \psi ^{\varepsilon ,d} \}_{\varepsilon \in (0,1),d \in {\mathbb {N}}} \subset {{\mathcal {N}}}$ and $\kappa >0$ (independent of d) such that for all $\varepsilon \in (0,1)$ and $d \in {\mathbb {N}}$, we have ${{\mathcal {C}}}(\psi ^{\varepsilon ,d})\le \kappa \varepsilon ^{-1} d^{\kappa }$ and

$$\begin{aligned} \sup _{x \in [a,b]^d} \Big |\mathcal{R}(\psi ^{\varepsilon ,d})(x)-\prod _{\ell =1}^m {{\mathcal {R}}}(\phi _\ell )(x) \Big | \le \varepsilon . \end{aligned}$$

(B.5)

Proof

First we use Lemma 11. Let $\varphi (d):=cd^c$. Then, there exist a set of DNNs $\{\Psi _{\varphi (d),\varepsilon } \}_{\varepsilon \in (0,1)} \subset {{\mathcal {N}}}$ and a constant $c'>0$ (independent of m and $\varphi (d)$) such that for all $\varepsilon \in (0,1)$, ${{\mathcal {C}}}(\Psi _{\varphi (d),\varepsilon })\le c' m^2 \varepsilon ^{-1} d^c$ and

$$\begin{aligned} |{{\mathcal {R}}}(\Psi _{\varphi (d),\varepsilon })(\mathcal{R}(\phi _1)(x),\ldots ,{{\mathcal {R}}}(\phi _m)(x))-\prod _{\ell =1}^m \mathcal{R}(\phi _\ell )(x)| \le \varepsilon , \end{aligned}$$

(B.6)

for any $x \in [a,b]^d$. By Lemma 8, there exists $\Phi \in {{\mathcal {N}}}$ such that ${{\mathcal {C}}}(\Phi ) \le m cd^c$ and

$$\begin{aligned} {{\mathcal {R}}}(\Phi )(x)=({{\mathcal {R}}}(\phi _1)(x),\ldots ,{{\mathcal {R}}}(\phi _m)(x)), \ \ \ x \in {\mathbb {R}}^{d}. \end{aligned}$$

(B.7)

By Lemma 10, there exist $\{ \psi ^{\varepsilon ,d} \}_{\varepsilon \in (0,1), d \in {\mathbb {N}}} \subset {{\mathcal {N}}}$ and $\kappa >0$ such that for all $\varepsilon \in (0,1)$ and $d \in {\mathbb {N}}$, we have ${{\mathcal {C}}}(\psi ^{\varepsilon ,d})\le \kappa \varepsilon ^{-1} d^{\kappa }$,

$$\begin{aligned} {{\mathcal {R}}}(\psi ^{\varepsilon ,d})(x)=\mathcal{R}(\Psi _{\varphi (d),\varepsilon })({{\mathcal {R}}}(\Phi )(x)), \ \ \ x \in {\mathbb {R}}^{d}, \end{aligned}$$

(B.8)

and

$$\begin{aligned} \sup _{x \in [a,b]^d} \Big |\mathcal{R}(\psi ^{\varepsilon ,d})(x)-\prod _{\ell =1}^m {{\mathcal {R}}}(\phi _\ell )(x) \Big | \le \varepsilon . \ \ \ \end{aligned}$$

(B.9)

$\square $

Appendix C: Sample code

We show the sample Python code used in the numerical computation in Sect. 6.2.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Takahashi, A., Yamada, T. Solving Kolmogorov PDEs without the curse of dimensionality via deep learning and asymptotic expansion with Malliavin calculus. Partial Differ. Equ. Appl. 4, 27 (2023). https://doi.org/10.1007/s42985-023-00240-4

Download citation

Received: 22 May 2021
Accepted: 08 May 2023
Published: 08 June 2023
DOI: https://doi.org/10.1007/s42985-023-00240-4

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Solving Kolmogorov PDEs without the curse of dimensionality via deep learning and asymptotic expansion with Malliavin calculus

Abstract

Similar content being viewed by others

Domain Decomposition Algorithms for Neural Network Approximation of Partial Differential Equations

Numerical solution for high-dimensional partial differential equations based on deep learning with residual learning and data-driven learning

Improved Deep Neural Networks with Domain Decomposition in Solving Partial Differential Equations

1 Introduction

2 Preliminaries

2.1 Deep neural networks

2.2 Malliavin calculus

3 Main result

3.1 Asymptotic expansion

Assumption 1

Remark 1

Proposition 1

Proof

Proposition 2

Proof

Remark 2

3.2 Deep neural network approximation

Assumption 2

Remark 3

Remark 4

Theorem 1

Proof

4 Proofs of Propositions 1, 2 and Theorem 1

4.1 Proof of Proposition 1

Lemma 1

Proof

4.2 Proof of Proposition 2

4.3 Proof of Theorem 1

Lemma 2

Proof

Lemma 3

Proof

Lemma 4

Proof

Proof

5 Deep learning implementation

6 Numerical examples

6.1 High-dimensional Black–Scholes model

6.1.1 Uncorrelated case

6.1.2 Correlated case

6.2 High-dimensional CEV model (nonlinear volatility case)

6.3 High-dimensional Heston model

7 Conclusion

Data Availability Statement

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendices

Appendix A: Malliavin calculus

Lemma 5

Proof

Lemma 6

Proof

Lemma 7

Proof

Appendix B: ReLU calculus

Lemma 8

Lemma 9

Lemma 10

Lemma 11

Lemma 12

Proof

Appendix C: Sample code

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation