Abstract
This paper proposes a new spatial approximation method without the curse of dimensionality for solving high-dimensional partial differential equations (PDEs) by using an asymptotic expansion method with a deep learning-based algorithm. In particular, the mathematical justification on the spatial approximation is provided. Numerical examples for high-dimensional Kolmogorov PDEs show effectiveness of our method.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Recently, for solving high-dimensional partial differential equations (PDEs), deep learning-based algorithms have been actively proposed (see [2, 3] for instance). Moreover, a number of papers for mathematical justification on the deep learning-based spatial approximations have appeared, where the authors demonstrate that deep neural networks overcome the curse of dimensionality in approximations of high-dimensional PDEs. For the related literature, see [4,5,6, 11, 19] for example. In particular, these works treat some specific forms of PDEs such as high-dimensional heat equations or Kolmogorov PDEs with constant diffusion and nonlinear drift coefficient. Also, integral kernels are assumed to have explicit forms for justification of the spatial approximations for solutions to high-dimensional PDEs.
However, most high-dimensional PDEs may not have explicit integral forms in practice. In other words, integral forms of solutions themselves should be approximated by a certain method.
In the current paper, we give a new spatial approximation using an asymptotic expansion method with a deep learning-based algorithm for solving high-dimensional PDEs without the curse of dimensionality. More precisely, we follow approaches given in [40] and the literature such as [8, 17, 18, 23, 24, 26, 27, 30, 32, 33, 35, 38, 39, 41, 43]. Particularly, we provide a uniform error estimate for the asymptotic expansion for solutions of Kolmogorov PDEs with nonlinear coefficients, motivated by the works of [2, 11, 31]. For a solution to a d-dimensional Kolmogorov PDE with a small parameter \(\lambda \), namely \(u_{\lambda }:[0,T] \times {\mathbb {R}}^d \rightarrow {\mathbb {R}}\) given by \(u_\lambda (t,x)=E[f(X_t^{\lambda ,x})]\) for \((t,x) \in [0,T] \times {\mathbb {R}}^d\) where \(\{ X_t^{\lambda ,x}\}_{t\ge 0}\) is a d-dimensional diffusion process starting from x, we justify the following spatial approximation on a range \([a,b]^d\):
by applying an appropriate neural network \(\phi \). Here, for \(t>0\) and \(x \in {\mathbb {R}}^d\), \(\bar{X}_t^{\lambda , x}\) is a certain Gaussian random variable and \({{\mathcal {M}}}_t^{\lambda ,x}\) is a stochastic weight for the expansion given based on Malliavin calculus. In order to chose the network \(\phi \), the analysis of “product of neural networks" and a dimension analysis of asymptotic expansion with Malliavin calculus are crucial in our approach. We show a precise error estimate for the approximation (1.1) and prove that the complexity of the neural network grows at most polynomially in the dimension d and the reciprocal of the precision \(\varepsilon \) of the approximation (1.1). Moreover, we give an explicit form of the asymptotic expansion in (1.1) and show numerical examples to demonstrate effectiveness of the proposed scheme for high-dimensional Kolmogorov PDEs.
The organization of the paper is as follows. Section 2 is dedicated to notation, definitions and preliminary results on deep learning and Malliavin calculus. Section 3 provides the main result, namely, the deep learning-based asymptotic expansion for solving Kolmogorov PDEs. The proof is shown in Sect. 4. Section 5 introduces the deep learning implementation. Various numerical examples are shown in Sect. 6. The useful lemmas on Malliavin calculus and ReLU calculus are summarized, and furthermore the sample code is listed in Appendix.
2 Preliminaries
We first prepare notation. For \(d \in {\mathbb {N}}\) and for a vector \(x \in {\mathbb {R}}^d\), we denote by \(\Vert x \Vert \) the Euclidean norm. Also, for \(k,\ell \in {\mathbb {N}}\) and for a matrix \(A \in {\mathbb {R}}^{k \times \ell }\), we denote by \(\Vert A \Vert \) the Frobenius norm. For \(d \in {\mathbb {N}}\), let \(I_d\) be the identity matrix. For \(m,k,\ell \in {\mathbb {N}}\), let \(C({\mathbb {R}}^m, {\mathbb {R}}^{k \times \ell })\) (resp., \(C([0,T] \times {\mathbb {R}}^m, {\mathbb {R}}^{k \times \ell })\)) be the set of continuous functions \(f: {\mathbb {R}}^k \rightarrow {\mathbb {R}}^{k \times \ell }\) (resp., \(f: [0,T] \times {\mathbb {R}}^m \rightarrow {\mathbb {R}}^{k \times \ell }\)) and \(C_{Lip}({\mathbb {R}}^m, {\mathbb {R}}^{k \times \ell })\) be the set of Lipschitz continuous functions \(f: {\mathbb {R}}^m \rightarrow {\mathbb {R}}^{k \times \ell }\). Also, we define \(C^\infty _b({\mathbb {R}}^m, {\mathbb {R}}^\ell )\) as the set of smooth functions \(f: {\mathbb {R}}^m \rightarrow {\mathbb {R}}^{k \times \ell }\) with bounded derivatives of all orders. For a multi-index \(\alpha \), let \(|\alpha |\) be the length of \(\alpha \). For a bounded function \(f:{\mathbb {R}}^m \rightarrow {\mathbb {R}}^{k \times \ell }\), we define \(\Vert f \Vert _{\infty }=\textstyle {\sup _{x \in {\mathbb {R}}^{m}}} \Vert f(x) \Vert \). For \(m,k,\ell \in {\mathbb {N}}\), for a function \(f \in C_{Lip}({\mathbb {R}}^m, {\mathbb {R}}^{k \times \ell })\), we denote by \(C_{Lip}[f]\) the Lipschitz continuous constant. For \(d \in {\mathbb {N}}\) and for a smooth function \(f:{\mathbb {R}}^d \rightarrow {\mathbb {R}}\), we define \(\partial _i f=\textstyle {\frac{\partial }{\partial x_i}f}\) for \(i=1,\ldots ,d\), moreover we define \(\partial ^\alpha f=\partial _{\alpha _1}\cdots \partial _{\alpha _k}f\) for \(\alpha =(\alpha _1,\ldots ,\alpha _k) \in \{1,\ldots ,d \}^k\), \(k \in {\mathbb {N}}\). For \(a,b \in {\mathbb {R}}\), we may write \(a \vee b=\max \{ a,b \}\).
2.1 Deep neural networks
Let us prepare notation and definitions for deep neural networks. Let \({{\mathcal {N}}}\) be the set of deep neural networks (DNNs):
where \({{\mathcal {N}}}_L^{N_0,N_1,\ldots ,N_L}={\times }_{\ell =1}^{L} ({\mathbb {R}}^{N_\ell \times N_{\ell -1}} \times {\mathbb {R}}^{N_\ell })\).
Let \(\varrho \in C({\mathbb {R}},{\mathbb {R}})\) be an activation function, and for \(k\in {\mathbb {N}}\), define \(\varrho _{k}(x)=(\varrho (x_1),\ldots ,\varrho (x_k))\), \(x \in {\mathbb {R}}^k\).
We define \({{\mathcal {R}}}:{{\mathcal {N}}} \rightarrow \cup _{m,n\in {\mathbb {N}}} C({\mathbb {R}}^m,{\mathbb {R}}^n)\), \({{\mathcal {C}}}:{{\mathcal {N}}} \rightarrow {\mathbb {N}}\), \({{\mathcal {L}}}: {{\mathcal {N}}} \rightarrow {\mathbb {N}}\), \(\textrm{dim}_{\textrm{in}}:{{\mathcal {N}}} \rightarrow {\mathbb {N}}\) and \(\textrm{dim}_{\textrm{out}}:{{\mathcal {N}}} \rightarrow {\mathbb {N}}\) as follows:
For \(L \in {\mathbb {N}} \cap [2,\infty )\), \(N_0,\ldots ,N_L \in {\mathbb {N}}\), \(\psi =((W_1,B_1),\ldots ,(W_L,B_L)) \in \mathcal{N}_L^{N_0,N_1,\ldots ,N_L}\), let \({{\mathcal {L}}}(\psi )=L\), \(\textrm{dim}_{\textrm{in}}(\psi )=N_0\), \(\textrm{dim}_{\textrm{out}}(\psi )=N_L\), \(\mathcal{C}(\psi )=\textstyle {\sum _{\ell =1}^L} N_{\ell }(N_{\ell -1}+1)\), and
where \({{\mathcal {A}}}_{W_k,B_k}(x)=W_kx+B_k\), \(x \in {\mathbb {R}}^{N_{k-1}}\), \(k=1,\ldots ,L\).
2.2 Malliavin calculus
We prepare basic notation and definitions on Malliavin calculus following Bally [1] Ikeda and Watanabe [16], Malliavin [25], Malliavin and Thalmaier [26] and Nualart [29].
Let \(\Omega ^d=\{ \omega : [0,T] \rightarrow {\mathbb {R}}^d; \ \omega \ \hbox {is continuous}, \ \omega (0)=0 \}\), \(H^d=L^2([0,T],{\mathbb {R}}^d)\) and let \(\mu ^d\) be the Wiener measure on \((\Omega ^d,\mathcal {B}(\Omega ^d))\), where \(\mathcal {B}(\Omega ^d)\) is the Borel \(\sigma \)-field induced by the topology of the uniform convergence on [0, T]. We call \((\Omega ^d,H^d,\mu ^d)\) the d-dimensional Wiener space. For a Hilbert space V with the norm \(\Vert \cdot \Vert _{V}\) and \(p \in [1,\infty )\), the \(L^p\)-space of V-valued Wiener functionals is denoted by \(L^p(\Omega ^d,V)\), that is, \(L^p(\Omega ^d,V)\) is a real Banach space of all \(\mu ^d\)-measurable functionals \(F: \Omega ^d \rightarrow V\) such that \(\Vert F \Vert _p =E [\Vert F \Vert _V^p]^{1/p}< \infty \) with the identification \(F = G\) if and only if \(F(\omega )=G(\omega )\), a.s. When \(V={\mathbb {R}}\), we write \(L^p(\Omega ^d)\). For a real separable Hilbert space V and \(F: \Omega ^d \rightarrow V\), we write \(\Vert F \Vert _{p,V}=E [\Vert F\Vert _V^p]^{1/p}\), in particular, \(\Vert F \Vert _{p}\) when \(V={\mathbb {R}}\). Let \(B^d=\{B^d_t\}_t\) be a coordinate process defined by \(B^d_t(\omega )=\omega (t)\), \(\omega \in \Omega ^d\), i.e. \(B^d\) is a d-dimensional Brownian motion, and \(B^d(h)\) be the Wiener integral \(\textstyle {B^d(h)=\sum _{j=1}^d \int _{0}^{T} {h}^{j}(s) dB_s^{d,j}}\) for \(h\in H^d\).
Let \({\mathscr {S}}(\Omega ^d)\) denote the class of smooth random variables of the form \(F=f( B^d(h_{1}),\ldots ,B^d(h_{n}) )\) where \(f\in C_{b}^{\infty } ( {\mathbb {R}}^{n},{\mathbb {R}}) \), \(h_{1},\ldots ,h_{n}\in H^d\), \(n\ge 1\). For \(F\in {\mathscr {S}}(\Omega ^d)\), we define the derivative DF as the H-valued random variable \(\textstyle {DF=\sum _{j=1}^{n}\partial _{j}f( B^d(h_{1}),\ldots ,B^d(h_{n}) ) h_{j}}\), which is regarded as the stochastic process:
For \(F \in {\mathscr {S}}(\Omega ^d)\) and \(j \in {\mathbb {N}}\), we set \(D^j F\) as the \((H^d)^{\otimes j}\)-valued random variable obtained by the j-times iteration of the operator D. For a real separable Hilbert space V, consider \({\mathscr {S}}_V\) of V-valued smooth Wiener functionals of the form \(\textstyle {F = \sum _{i=1}^\ell F_i v_i}\), \(v_i \in V\), \(F_i \in {\mathscr {S}}(\Omega ^d)\), \(i \le \ell \), \(\ell \in {\mathbb {N}}\). Define \(\textstyle {D^j F = \sum _{i=1}^\ell D^j F_i \otimes v_i}\), \(j \in {\mathbb {N}}\). Then for \(j \in {\mathbb {N}}\), \(D^j\) is a closable operator from \({\mathscr {S}}_V\) into \(L^p(\Omega ^d,(H^d)^{\otimes j} \otimes V)\) for any \(p \in [1,\infty )\) (see p. 31 of Nualart [29]). For \(k \in {\mathbb {N}}\), \(p \in [1,\infty )\), we define \(\textstyle {\Vert F \Vert ^p_{k,p,V}=E [\Vert F \Vert _V^p] + \sum _{j=1}^k E [ \Vert D^j F \Vert _{(H^d)^{\otimes j} \otimes V}^p ]}\), \(F \in {\mathscr {S}}_V\). Then, the space \({\mathbb {D}}^{k,p}(\Omega ^d,V)\) is defined as the completion of \({\mathscr {S}}_V\) with respect to the norm \(\Vert \cdot \Vert _{k,p,V}\). Moreover, let \({\mathbb {D}}^\infty (\Omega ^d,V)\) be the space of smooth Wiener functionals in the sense of Malliavin \({\mathbb {D}}^\infty (\Omega ^d,V) = \cap _{p\ge 1} \cap _{k\in {\mathbb {N}}} {\mathbb {D}}^{k,p}(\Omega ^d,V)\). We write \({\mathbb {D}}^{k,p}(\Omega ^d)\), \(k \in {\mathbb {N}}\), \(p \in [1,\infty )\) and \({\mathbb {D}}^\infty (\Omega ^d)\), when \(V={\mathbb {R}}\). Let \(\delta \) be an unbounded operator from \(L^2(\Omega ^d,H^d)\) into \(L^2(\Omega ^d)\) such that the domain of \(\delta \), denoted by \(\textrm{Dom}(\delta )\), is the set of \(H^d\)-valued square integrable random variables u such that \(|E [\langle DF,u \rangle _{H^d}]| \le c\Vert F \Vert _{1,2}\) for all \(F \in {\mathbb {D}}^{1,2}(\Omega ^d)\) where c is some constant depending on u, and if \(u \in \textrm{Dom}(\delta )\), there exists \(\delta (u) \in L^2(\Omega ^d)\) satisfying
for any \(F \in {\mathbb {D}}^{1,2}(\Omega ^d)\). For \(u=(u^1,\ldots ,u^d) \in \textrm{Dom}(\delta )\), \(\delta (u)=\textstyle {\sum _{i=1}^d} \delta ^{i}(u^i)\) is called the Skorohod integral of u, and it holds that \(E[\textstyle {\int _0^T} D_{i,s}Fu^i_s ds]=E[F \delta ^i(u^i) ]\), \(i=1,\ldots ,d\) for all \(F \in {\mathbb {D}}^{1,2}\) (see Proposition 6 of Bally [1]). For all \(k \in {\mathbb {N}} \cup \{ 0 \}\) and \(p>1\), the operator \(\delta \) is continuous from \({\mathbb {D}}^{k+1,p}(\Omega ^d,H^d)\) into \({\mathbb {D}}^{k,p}(\Omega ^d)\) (see Proposition 1.5.7 of Nualart [29]). For \(G \in {\mathbb {D}}^{1,2}(\Omega ^d)\) and \(h \in \textrm{Dom}(\delta )\) such that \(Gh \in L^{2}(\Omega ^d,H^d)\), it holds that
and in particular, if \(h \in \textrm{Dom}(\delta )\) is an adapted process, \(\delta ^i(h^i)\) is given by the Itô integral, i.e. \(\delta ^i(h^i)=\textstyle {\int _0^T} h^i_s dB_s^{d,i}\) for \(i=1,\ldots ,d\) (e.g. see Section 3.1.1 of Bally [1], Proposition 1.3.3 and Proposition 1.3.11 of Nualart [29]).
For \(F=(F^1,\ldots ,F^d) \in ({\mathbb {D}}^{\infty }(\Omega ^d))^d\), define the Malliavin covariance matrix of F, \(\sigma ^F=(\sigma ^F_{ij})_{1 \le i,j \le d}\), by \(\textstyle {\sigma ^F_{ij}=\langle DF^i,DF^j \rangle _{H^d}=\sum _{k=1}^d \int _0^T D_{k,s}F^i D_{k,s}F^j ds}\), \(1\le i,j \le d\). We say that \(F\in ({\mathbb {D}}^{\infty }(\Omega ^d))^d\) is nondegenerate if the matrix \(\sigma ^F\) is invertible a.s. and satisfies \(\Vert ( \det \sigma ^F)^{-1}\Vert _p < \infty \), \(p>1\). Malliavin’s theorem claims that if \(F \in ({\mathbb {D}}^\infty (\Omega ^d))^d\) is nondegenerate, then F has the smooth density \(p^{F}(\cdot )\). Malliavin calculus is further refined by Watanabe’s theory. Let \(\mathcal {S}({\mathbb {R}}^d)\) be the Schwartz space or the space of rapidly decreasing functions and \(\mathcal {S}'({\mathbb {R}}^d)\) be the dual of \(\mathcal {S}({\mathbb {R}}^d)\), i.e. \(\mathcal {S}'({\mathbb {R}}^d)\) is the space of Schwartz tempered distributions. For a tempered distribution \({{\mathcal {T}}} \in \mathcal {S}'({\mathbb {R}}^d)\) and a nondegenerate Wiener functional in the sense of Malliavin \(F \in ({\mathbb {D}}^\infty (\Omega ^d))^d\), \({{\mathcal {T}}}(F)={{\mathcal {T}}} \circ F\) is well-defined as an element of the space of Watanabe distributions \({\mathbb {D}}^{-\infty }(\Omega ^d)\), that is the dual space of \({\mathbb {D}}^{\infty }(\Omega ^d)\) (e.g. see p. 379, Corollary of Ikeda and Watanabe [16], Theorem of Chapter III 6.2 of Malliavin [25], Theorem 7.3 of Malliavin and Thalmaier [26]). Also, for \(G \in {\mathbb {D}}^{\infty }(\Omega ^d)\), a (generalized) expectation \(E[\mathcal{T}(F)G]\) is understood as a pairing of \({{\mathcal {T}}}(F)\in {\mathbb {D}}^{-\infty }(\Omega ^d)\) and \(G\in {\mathbb {D}}^{\infty }(\Omega ^d)\), namely \({}_{{\mathbb {D}}^\infty }\langle {{\mathcal {T}}}(F),G \rangle {}_{{\mathbb {D}}^{-\infty }}\), and it holds that
where \({}_{{{\mathcal {S}}}'}\langle \cdot , \cdot \rangle _{{{\mathcal {S}}}} \) is the bilinear form on \({{\mathcal {S}}}'({\mathbb {R}}^d)\) and \(\mathcal{S}({\mathbb {R}}^d)\), \(E[ G | F= \xi ]\) is the conditional expectation of G conditioned on the set \(\{ \omega ; F(\omega )= \xi \}\) (e.g. see Chapter III 6.2.2 of Malliavin [25], (7.5) of Theorem 7.3 of Malliavin and Thalmaier [26]). In particular, we have \({}_{{\mathbb {D}}^{-\infty }}\langle \delta _y (F),1 \rangle {}_{{\mathbb {D}}^\infty }={}_{{{\mathcal {S}}}'}\langle \delta _y, p^F(\cdot ) \rangle {}_{{{\mathcal {S}}}}=p^F(y)\) for \(y \in {\mathbb {R}}^d\), and thus \(p^F\) is not only smooth but also in \(\mathcal {S}({\mathbb {R}}^d)\), i.e. a rapidly decreasing function (see Theorem 9.2 of Ikeda and Watanabe [16]), Proposition 2.1.5 of Nualart [29]). For a nondegenerate \(F \in ({\mathbb {D}}^\infty (\Omega ^d))^d\), \(G \in {\mathbb {D}}^\infty (\Omega ^d)\) and a multi-index \(\gamma =(\gamma _1,\ldots ,\gamma _k)\), there exists \(H_{\gamma }(F,G) \in {\mathbb {D}}^\infty (\Omega ^d)\) such that
for all \({{\mathcal {T}}} \in {{\mathcal {S}}}'({\mathbb {R}}^d)\) (e.g. see Chapter 4.4 and Theorem 7.3 of Malliavin and Thalmaier [26]), where \(H_{\gamma }(F,G)\) is given by \(H_{\gamma }(F,G)=H_{(\gamma _k)}(F,H_{(\gamma _1,\ldots ,\gamma _{k-1})}(F,G))\) with
3 Main result
Let \(a\in {\mathbb {R}}\), \(b\in (a,\infty )\) and \(T>0\). For \(d \in {\mathbb {N}}\), consider the solution to the following stochastic differential equation (SDE) driven by a d-dimensional Brownian motion \(B^d=(B^{d,1},\ldots ,B^{d,d})\) on the d-dimensional Wiener space \((\Omega ^d,H^d,\mu ^d)\):
where \(\mu ^{\lambda }_{d}: {\mathbb {R}}^d \rightarrow {\mathbb {R}}^d\) and \(\sigma ^{\lambda }_{d}: {\mathbb {R}}^d \rightarrow {\mathbb {R}}^{d \times d}\) are Lipschitz continuous functions depending on a parameter \(\lambda \in (0,1]\). The solution \(X_t^{d,\lambda ,x}=(X_t^{d,\lambda ,x,1},\ldots ,X_t^{d,\lambda ,x,d})\) is equivalently written in the integral form as:
for \(j=1,\ldots ,d\). Furthermore, for a given appropriate continuous function \(f_d: {\mathbb {R}}^d \rightarrow {\mathbb {R}}\) and for \(\lambda \in (0,1]\), we consider \(u_\lambda ^d \in C([0,T] \times {\mathbb {R}}^d,{\mathbb {R}})\) given by
for \(t \in [0,T]\) and \(x \in {\mathbb {R}}^d \), which is a solution of Kolmogorov PDE:
for all \((t,x) \in (0,T) \times {\mathbb {R}}^d\) and \(u_\lambda ^d(0,\cdot )=f_{d}(\cdot )\), where \({{\mathcal {L}}}^{d,\lambda }\) is the following second order differential operator:
Our purpose is to show a new spatial approximation scheme of \(u_\lambda ^d(t,\cdot )\) for \(t>0\) by using asymptotic expansion and deep neural network approximation. The main theorem (Theorem 1) is stated at the end of this section.
3.1 Asymptotic expansion
We first put the following assumptions on \(\{ \mu ^{\lambda }_{d} \}_{\lambda \in (0,1]}\), \(\{ \sigma ^{\lambda }_{d} \}_{\lambda \in (0,1]}\) and \(f_d\).
Assumption 1
(Assumptions for the family of SDEs and asymptotic expansion) Let \(C>0\). For \(d \in {\mathbb {N}}\), let \(\{ \mu ^{\lambda }_{d} \}_{\lambda \in (0,1]} \subset C_{Lip}({\mathbb {R}}^d,{\mathbb {R}}^{d})\) and \(\{ \sigma ^{\lambda }_{d} \}_{\lambda \in (0,1]} \subset C_{Lip}({\mathbb {R}}^d,{\mathbb {R}}^{d\times d})\) be families of functions, and \(f_d \in C_{Lip}({\mathbb {R}}^d,{\mathbb {R}})\) be a function satisfying
-
1.
there are \(V_{d,0} \in C_b^\infty ({\mathbb {R}}^d,{\mathbb {R}}^d)\) and \(V_{d}=(V_{d,1},\ldots ,V_{d,d}) \in C_b^\infty ({\mathbb {R}}^d,{\mathbb {R}}^{d\times d})\) such that (i) \(\mu ^\lambda _{d}=\lambda V_{d,0}\) and \(\sigma ^\lambda _{d}=\lambda V_{d}\) for all \(\lambda \in (0,1]\), (ii) \(C_{Lip}[V_{d,0}] \vee C_{Lip}[V_{d}]=C\) and \(\Vert V_{d,0}(0) \Vert \vee \Vert V_{d}(0) \Vert \le C\), (iii) \(\Vert \partial ^{\alpha } V_{d,i} \Vert _{\infty } \le C\) for any multi-index \(\alpha \) and \(i=0,1,\ldots ,d\);
-
2.
\(\textstyle {\sum _{i=1}^d} \sigma ^\lambda _{d,i}(x) \otimes \sigma ^\lambda _{d,i}(x) \ge \lambda ^2 I_{d}\) for all \(x \in {\mathbb {R}}^d\) and \(\lambda \in (0,1]\);
-
3.
\(C_{Lip}[f_d]= C\) and \(\Vert f_d(0) \Vert \le C\).
Remark 1
Assumption 1 justify an asymptotic expansion under the uniformly elliptic condition for the solutions of the perturbed systems of PDEs. Assumption 1.3 is also useful for constructing deep neural network approximations for the family of PDE solutions.
From Assumption 1.2, we may write each SDE (3.1) for \(d \in {\mathbb {N}}\) as
with \(X_0^{d,\lambda ,x}=x \in {\mathbb {R}}^d\), where the notation \(dB_t^{d,0}=dt\) is used. We define
and \(\textstyle {L_{d,0}{=}\sum _{j=1}^d V_{d,0}^{j}(\cdot )\frac{\partial }{\partial x_j}{+}\frac{1}{2}\sum _{i,j_1,j_2=1}^d V_{d,i}^{j_1}(\cdot ) V_{d,i}^{j_2}(\cdot ) \frac{\partial ^2}{\partial x_{j_1}\partial x_{j_2}}}\), \(\textstyle {L_{d,i}{=}\sum _{j=1}^d V_{d,i}^{j}(\cdot )\frac{\partial }{\partial x_j}}\), \(i=1,\ldots ,d\). We define
Proposition 1
(Asymptotic expansion and the error bound) For \(m \in {\mathbb {N}} \cup \{ 0 \} \), there exists \(c >0\) such that for all \(d\in {\mathbb {N}}\), \(t>0\), \(\lambda \in (0,1]\),
where \(\hat{V}_{d,\alpha }^{e}(x)=L_{d,\alpha _1}\cdots L_{d,\alpha _{r-1}}V_{d,\alpha _r}^{e}(x)\), \(e\in \{1,\ldots ,d \}\), \(\alpha \in \{1,\ldots ,d \}^p\), and
Proof
See Sect. 4. \(\square \)
The weights in the expansion terms in Proposition 1 can be represented by some polynomials of Brownian motion. We show it through distribution theory on Wiener space. Let \(d \in {\mathbb {N}}\), for \(t \in (0,T]\) and \(\alpha =(\alpha _1,\ldots ,\alpha _k)\in \{0,1,\ldots ,d \}^k\), \(k \in {\mathbb {N}} \cap [2,\infty )\), let
with \(\textbf{B}_t^{d,(\alpha _1)}=B_t^{d,\alpha _1}\), which can be obtained by (2.5). For example, we have \(\textbf{B}_t^{d,(\alpha _1,\alpha _2)}=B_t^{d,\alpha _1}B_t^{d,\alpha _2}-t \textbf{1}_{\alpha _1=\alpha _2\ne 0}\) for \(\alpha =(\alpha _1,\alpha _2) \in \{0,1,\ldots ,d \}^2\). Let \(\sigma _\ell \in {\mathbb {R}}^d\), \(\ell =0,1,\ldots ,d\) and \(\Sigma \) be a matrix given by \(\Sigma _{i,j}=\textstyle {\sum _{\ell =1}^d} \sigma _\ell ^i \sigma _\ell ^j\), \(1\le i,j \le d\) and satisfying \(\det \Sigma >0\). Let \({{\mathcal {T}}} \in {{\mathcal {S}}}'({\mathbb {R}}^d)\). We show an efficient computation of \(\textstyle {{}_{{\mathbb {D}}^{-\infty }} \langle \mathcal{T} (\sum _{i=0}^d \sigma _i B_t^{d,i} ), H_{\gamma } (\sum _{i=0}^d \sigma _i B_t^{d,i},{\mathbb {B}}_t^{d,\alpha } )\rangle {}_{{\mathbb {D}}^\infty }}\) in order to give a polynomial representation of the Malliavin weights in the expansion terms of the asymptotic expansion in Proposition 1. Note that we have
by (2.7) and (2.6), where \(\sigma \) is the matrix \(\sigma =(\sigma _1,\ldots ,\sigma _d)\), and for \(y \in {\mathbb {R}}^d\), it holds that
by (2.6). Also, one has
by (2.5), (2.7) and (2.8), where \(\alpha ^\star \) is a multi-index such that \(\alpha ^{\star }=(\alpha ^{\star }_1,\ldots ,\alpha ^{\star }_{\ell (\alpha )})=(\alpha _{j_1}, \ldots ,\alpha _{j_{\ell (\alpha )}})\) satisfying \(\ell (\alpha )=\# \{ i; \alpha _i\ne 0 \}\) and \(\alpha _{j_i} \ne 0\), \(i=1,\ldots ,\ell (\alpha )\). Then, we have
where, we iteratively used (2.5), (2.6), (2.7) and (2.8). An explicit polynomial representation of the asymptotic expansion is derived through (3.14). For instance, the first order expansion (\(m=1\)) as follows:
(First order asymptotic expansion with Malliavin weight)
Thus, the first order expansion is expressed with a Malliavin weight given by third order polynomials of Brownian motion. In general, we have the following representation.
Proposition 2
For \(m \in {\mathbb {N}}\), \(d \in {\mathbb {N}}\), \(\lambda \in (0,1]\), \(t \in (0,T]\) and \(x \in {\mathbb {R}}^d\), there exists a Malliavin weight \({{\mathcal {M}}}^m_{d,\lambda }(t,x,B_t^d)\) such that
and
for some integers \(n(m)\in {\mathbb {N}}\) and \(p(e) \in {\mathbb {N}}\), \(e=1,\ldots ,n(m)\), polynomials \(\textrm{Poly}_e:{\mathbb {R}}^d \rightarrow {\mathbb {R}}\), \(e=1,\ldots ,n(m)\), continuous functions \(g_e: (0,T] \rightarrow {\mathbb {R}}\), \(e=1,\ldots ,n(m)\), and continuous functions \(h_{e}:{\mathbb {R}}^d \rightarrow {\mathbb {R}}\), \(e=1,\ldots ,n(m)\) constructed by some products of \(A^{-1}_{d}\), \(\{V_{d,i}\}_{0\le i \le d}\) and \(\{ \partial ^\alpha V_{d,i}\}_{0\le i \le d,\alpha \in \{1,\ldots ,d \}^{\ell },\ell \le 2m}\) given in Assumption 1 of the form:
with some constants \(c_e \in (0,\infty )\), \(q_e \in {\mathbb {N}}\) and some multi-indices \((\gamma ^e_{1},\ldots ,\gamma ^e_{\ell }) \in \{1,\ldots ,d \}^{\ell }\) and \((\alpha ^e_{\ell ,1},\ldots ,\alpha ^e_{\ell ,p^e_\ell }) \in \{0,1,\ldots ,d \}^{p^e_\ell }\) with \(p^e_\ell \in {\mathbb {N}}\), \(\ell =1,\ldots ,e\), which satisfies that for \(p\ge 1\),
for some constant \(c>0\) independent of d.
Proof
See Sect. 4. \(\square \)
Remark 2
(Remark on computation of Malliavin weights) Malliavin weight is initially used in Fournie et al. [7] in sensitivity analysis in financial mathematics, especially in Monte-Carlo computation of “Greeks". Then a discretization scheme for probabilistic automatic differentiation using Malliavin weights is analyzed in Gobet and Munos [10]. The computation of asymptotic expansion with Malliavin weights is developed in Takahashi and Yamada [35, 37], and is further extended to weak approximation of SDEs in Takahashi and Yamada [38]. Note that a PDE expansion is shown in Takahashi and Yamada [36] to partially connect it with the stochastic calculus approach. The computation method of the expansion with Malliavin weights is improved in Yamada [41], Yamada and Yamamoto [42], Naito and Yamada [27, 28], Iguchi and Yamada [17, 18], and Takahashi et al. [34] where technique of stochastic calculus is refined. The main advantages of the stochastic calculus approach are that (i) it provides efficient computation scheme using Watanabe distributions on Wiener space as in (3.13) and (3.14), (ii) it enables us to give precise bounds for approximations of expectations or the corresponding solutions of PDEs. Actually, the computational effort of the expansions is much reduced in the sense that Itô’s iterated integrals are transformed into simple polynomials of Brownian motion, and also the desired deep neural network approximation will be obtained in the next subsection through the approach.
3.2 Deep neural network approximation
In order to construct a deep neural network approximation for the function with respect to the space variable of the asymptotic expansion, i.e. \(x \mapsto E[f_{d}(\bar{X}_t^{d,\lambda ,x}) \mathcal{M}^m_{d,\lambda }(t,x,B_t^d) ]\), we consider the further assumptions.
Assumption 2
(Assumptions for deep neural network approximation) Suppose that Assumption 1 holds. There exist a constant \(\kappa >0\) and sets of networks \(\{ \psi _{\varepsilon ,d}^{V_{d,i}} \}_{\varepsilon \in (0,1),d \in {\mathbb {N}},i \in \{0,1,\ldots ,d\}} \subset {{\mathcal {N}}}\), \(\{ \psi _{\varepsilon ,d}^{\partial ^\alpha V_{d,i}} \}_{\varepsilon \in (0,1),d \in {\mathbb {N}},i \in \{0,1,\ldots ,d\},\alpha \in \{1,\ldots ,d\}^{{\mathbb {N}}}} \subset {{\mathcal {N}}}\), \(\{ \psi _{\varepsilon }^{A_d^{-1}} \}_{\varepsilon \in (0,1),d \in {\mathbb {N}}} \subset {{\mathcal {N}}}\) and \(\{ \psi _{\varepsilon }^{f_d} \}_{\varepsilon \in (0,1),}{d \in {\mathbb {N}}} \subset {{\mathcal {N}}}\) such that
-
1.
for all \(\varepsilon \in (0,1)\), \(d \in {\mathbb {N}}\), \({{\mathcal {C}}}(\psi _{\varepsilon ,d}^{V_{d,i}}) \le \kappa d^\kappa \varepsilon ^{-\kappa }\), \(i=0,1,\ldots ,d\), \({{\mathcal {C}}}(\psi _{\varepsilon ,d}^{\partial ^\alpha {V}_{d,i}}) \le \kappa d^\kappa \varepsilon ^{-\kappa }\), \(i=0,1,\ldots ,d\), \(\alpha \in \{1,\ldots ,d \}^\ell \), \(\ell \in {\mathbb {N}}\), \({{\mathcal {C}}}(\psi _{\varepsilon }^{A_d^{-1}}) \le \kappa d^\kappa \varepsilon ^{-\kappa }\), and \({{\mathcal {C}}}(\psi _{\varepsilon }^{f_d}) \le \kappa d^\kappa \varepsilon ^{-\kappa }\);
-
2.
for all \(\varepsilon \in (0,1)\), \(d \in {\mathbb {N}}\), \(x \in {\mathbb {R}}^d\), \(\Vert V_{d,i}(x)-V_{d,i}^{\varepsilon }(x)\Vert \le \varepsilon \kappa d^\kappa \), \(i=0,1,\ldots ,d\), and \(\Vert \partial ^\alpha V_{d,i}(x)-V_{d,i,\alpha }^{\varepsilon }(x)\Vert \le \varepsilon \kappa d^\kappa \), \(i=0,1,\ldots ,d\), \(\alpha \in \{1,\ldots ,d \}^\ell \), \(\ell \in {\mathbb {N}}\), where \(V_{d,i}^{\varepsilon }={{\mathcal {R}}}(\psi _{\varepsilon }^{V_{d,i}}) \in C({\mathbb {R}}^d,{\mathbb {R}}^{d})\) and \(V_{d,i,\alpha }^{\varepsilon }={{\mathcal {R}}}(\psi _{\varepsilon }^{\partial ^\alpha V_{d,i}}) \in C({\mathbb {R}}^d,{\mathbb {R}}^{d})\);
-
3.
for all \(\varepsilon \in (0,1)\), \(d \in {\mathbb {N}}\), \(x \in {\mathbb {R}}^d\), \(\Vert A_d^{-1}(x)-A_{d,\varepsilon }^{-1}(x)\Vert \le \varepsilon \kappa d^\kappa \), where \(A_d^{-1}(\cdot )\) is the inverse matrix of \(A_d(\cdot ):=\textstyle {\sum _{i=1}^d} V_{d,i}(\cdot ) \otimes V_{d,i}(\cdot )\) and \(A_{d,\varepsilon }^{-1}={{\mathcal {R}}}(\psi _{\varepsilon }^{A_{d}^{-1}}) \in C({\mathbb {R}}^d,{\mathbb {R}}^{d \times d})\), and for all \(\varepsilon \in (0,1)\), \(d \in {\mathbb {N}}\), \(\textstyle {\sup _{x\in [a,b]^d}}\Vert A_{d,\varepsilon }^{-1}(x)\Vert \le \kappa d^\kappa \);
-
4.
for all \(\varepsilon \in (0,1)\), \(d \in {\mathbb {N}}\), \(x \in {\mathbb {R}}^d\), \(|f_d(x)-f_d^{\varepsilon }(x)|\le \varepsilon \kappa d^\kappa \), where \(f_d^{\varepsilon }={{\mathcal {R}}}(\psi _{\varepsilon }^{f_d}) \in C({\mathbb {R}}^d,{\mathbb {R}})\).
Remark 3
Assumption 2 provides the deep neural network approximation of the asymptotic expansion with an appropriate complexity. Note that Assumption 1.1, 1.3, 2.2 and 2.4 give that there exists \(\eta >0\) such that \(\textstyle {|f_d^{\varepsilon }(x)| \le \eta d^\eta (1+\Vert x \Vert )}\) for all \(\varepsilon \in (0,1)\), \(d \in {\mathbb {N}}\), and \(\textstyle {\sup _{x\in [a,b]^d}}\Vert V_{d,i}^{\varepsilon }(x)\Vert \le \eta d^\eta \) for all \(i=0,1,\ldots ,d\), \(\textstyle {\sup _{x\in [a,b]^d}}\Vert V_{d,i,\alpha }^{\varepsilon }(x)\Vert \le \eta d^\eta \) for all \(i=0,1,\ldots ,d\), \(\alpha \in \{1,\ldots ,d \}^\ell \) with \(\ell \in {\mathbb {N}}\). In the following, Assumption 2.2, 2.3 and 2.4 plays an important role for the analysis of “product of neural networks" in the construction of the approximation with asymptotic expansion.
Remark 4
In particular, Assumption 2.3 is satisfied for the cases \(A_d(x)=I_d\) and \(A_d(x)=s(d)I_d\) with a function \(s:{\mathbb {N}} \rightarrow {\mathbb {R}}\). For instance, the case \(A_d(x)=I_d\) corresponds to the d-dimensional heat equation when \(V_{d,0}\equiv 0\). Also, the SDEs with the diffusion matrix \(V_d=(1/\sqrt{d})I_d\) discussed in Section 5.1 and Section 5.2 of [9] and Section 5.2 of [13] are examples of (3.1) (or (3.6)). For those cases, the neural network approximations in Assumption 2 are not necessary, since \(V_{d,i}\), \(i=1,\ldots ,d\) and hence \(A_d\) do not depend on the state variable x, whence \(\textstyle {V_{d,i,\varepsilon }}\) and \(\textstyle {A^{-1}_{d,\varepsilon }}\) are \(V_{d,i}\) and \(A^{-1}_{d}\) themselves. Furthermore, in such cases (e.g. the high-dimensional heat equations) the asymptotic expansion will be simply obtained (usually as the Gaussian approximation), which are exactly reduced to the methods in Beck et al. [2] and Gonon et al. [11].
The main result of the paper is summarized as follows.
Theorem 1
(Deep learning-based asymptotic expansion overcomes the curse of dimensionality) Suppose that Assumptions 1 and 2 hold. Let \(m \in {\mathbb {N}}\). For \(d \in {\mathbb {N}}\), consider the SDE (3.1) on the d-dimensional Wiener space and let \(u_\lambda ^d \in C ([0,T] \times {\mathbb {R}}^d, {\mathbb {R}})\) given by (3.3) be a solution to the Kolmogorov PDE (3.4). Then we have
Furthermore, for \(t \in (0,T]\) and \(\lambda \in (0,1]\), there exist \(\{ \phi ^{\varepsilon ,d} \}_{\varepsilon \in (0,1),d \in {\mathbb {N}}} \subset {{\mathcal {N}}}\) and \(c>0\) which depend only on \(a,b,C,m,\kappa ,t\) and \(\lambda \), such that for all \(\varepsilon \in (0,1)\) and \(d\in {\mathbb {N}}\), we have \({{\mathcal {R}}}(\phi ^{\varepsilon ,d}) \in C({\mathbb {R}}^d,{\mathbb {R}})\), \({{\mathcal {C}}}(\phi ^{\varepsilon ,d})\le c \varepsilon ^{-c}d^c\) and
Proof
See Sect. 4. \(\square \)
We provide the weight \({{\mathcal {M}}}^m_{d,\lambda }(t,x,B_t^d)\) with \(m=0,1\) in Theorem 1 for our scheme (the expression for general m will be given in Sect. 4 below). That is, for \(d \in {\mathbb {N}}\), \(\lambda \in (0,1]\), \(t>0\) and \(x \in {\mathbb {R}}^d\),
where
Hence, the weight for \(m=0\), i.e. \(\mathcal{M}^0_{d,\lambda }(t,x,B_t^d)=1\) provides a simple (but coarse) Gaussian approximation, and the Malliavin weight for \(m=1\) will be worked as the correction term for the Gaussian approximation. The derivation is provided in the next section.
4 Proofs of Propositions 1, 2 and Theorem 1
We give the proofs of Propositions 1, 2 and Theorem 1. Before providing full proofs, we show their brief outlines below.
-
Proposition 1 (Asymptotic expansion)
-
take a family of uniformly non-degenerate functionals \(F_t^{d,\lambda ,x}=(X_t^{d,\lambda ,x}-x)/\lambda \), \(\lambda \in (0,1]\), as the family \(X_t^{d,\lambda ,x}\), \(\lambda \in (0,1]\) itself degenerates when \(\lambda \downarrow 0\), and consider the expansion \(F_t^{d,\lambda ,x}=F_t^{d,0,x}+\cdots \) in \({\mathbb {D}}^\infty \).
-
expand \(\delta _y(F_t^{d,\lambda ,x}) \sim \delta _y(F_t^{d,0,x})+\cdots \) in \({\mathbb {D}}^{-\infty }\) and take expectation to obtain the expansion of the density \(p^{F_t^{d,\lambda ,x}}(y)=E[\delta _y(F_t^{d,\lambda ,x})] \sim E[\delta _y(F_t^{d,0,x})]+\cdots \) in \({\mathbb {R}}\).
-
derive precise expression of the right-hand side of \(E[f_d(X_t^{d,\lambda ,x})]=c_0^{d,\lambda ,t}+ c_1^{d,\lambda ,t}+\cdots +c_m^{d,\lambda ,t} +\textrm{Residual}^{d,\lambda ,t}_m\) by using Malliavin’s integration by parts.
-
give a precise estimate for \(\textrm{Residual}^{d,\lambda ,t}_m(x)\) (w.r.t \(\lambda \), t and the dimension d) uniformly in x by using the key inequality on Malliavin weight (Lemma 5 in Appendix A) which yields a sharp upper bound of \(\textrm{Residual}^{d,\lambda ,t}_m(x)\).
-
-
Proposition 2 (Representation and property of Malliavin weight)
-
use the formula (3.14) to prove that \(c_0^{d,\lambda ,t}+ c_1^{d,\lambda ,t}+\cdots +c_m^{d,\lambda ,t}\) above can be represented by an expectation \(E[f_{d}(\bar{X}_t^{d,\lambda ,x}){{\mathcal {M}}}^m_{d,\lambda }(t,x,B_t^d)]\) with a Malliavin weight \({{\mathcal {M}}}^m_{d,\lambda }(t,x,B_t^d)\) constructed by polynomials of Brownian motion.
-
check that the moment of the Malliavin weight \({{\mathcal {M}}}^m_{d,\lambda }(t,x,B_t^d)\) grows polynomially in d from the representation.
-
-
Theorem 1 (Deep learning-based asymptotic expansion overcomes the curse of dimensionality)
-
(0) for \(d \in {\mathbb {N}}\), first check the expansion \(E[f_{d}(\bar{X}_t^{d,\lambda ,x}){{\mathcal {M}}}^m_{d,\lambda }(t,x,B_t^d)]\) obtained in Proposition 1 and 2 gives an approximation for \(u_d^\lambda (t,x)\) on the cube \([a,b]^d\) with a sharp asymptotic error bound.
-
(1) for an error precision \(\varepsilon \), construct an approximation \(E[f_{d}(\bar{X}_t^{d,\lambda ,x}){{\mathcal {M}}}^m_{d,\lambda }(t,x,B_t^d)] \approx E[f^{\delta }_{d}(\bar{X}_t^{d,\lambda ,x,\delta }){{\mathcal {M}}}^m_{d,\lambda ,\delta }(t,x,B_t^d)]\) on the cube \([a,b]^d\) by using stochastic calculus, where \(f^{\delta }_{d}\), \(\bar{X}_t^{d,\lambda ,x,\delta }\) and \({{\mathcal {M}}}^m_{d,\lambda ,\delta }(t,x,B_t^d)\) are given by replacing \(\{V_{d,i}\}_i\), \(A_d^{-1}\), \(\{V_{d,i,\alpha }\}_{i,\alpha }\) with their neural network approximations \(\{V^\delta _{d,i}\}_i\), \(A_{d,\delta }^{-1}\), \(\{V_{d,i,\alpha ,\delta }\}_{i,\alpha }\) with \(\delta =(\varepsilon ^c d^{-c})\) for some \(c>0\) independent of \(\varepsilon \) and d.
-
(2) for an error precision \(\varepsilon \), construct a realization of the Monte-Carlo approximation \(E[f^{\delta }_{d}(\bar{X}_t^{d,\lambda ,x,\delta }){{\mathcal {M}}}^m_{d,\lambda ,\delta }(t,x,B_t^d)] \approx \textstyle {\frac{1}{M} \sum _{\ell =1}^M f^{\delta }_{d}(\bar{X}_t^{d,\lambda ,x,\delta ,(\ell )}(\omega _{\varepsilon ,d})){{\mathcal {M}}}^{m,\delta }_{d,\lambda }(t,x,B_t^{d,(\ell )}(\omega _{\varepsilon ,d}))}\) on the cube \([a,b]^d\) with a choice \(M=O(\varepsilon ^{-c} d^{c})\) for some \(c>0\) independent of \(\varepsilon \) and d, by using stochastic calculus.
-
(3) for an error precision \(\varepsilon \), construct a realization of the deep neural network approximation \(\textstyle {\frac{1}{M} \sum _{\ell =1}^M f^{\delta }_{d}(\bar{X}_t^{d,\lambda ,x,\delta ,(\ell )}(\omega _{\varepsilon ,d})){{\mathcal {M}}}^{m,\delta }_{d,\lambda }(t,x,B_t^{d,(\ell )}(\omega _{\varepsilon ,d}))} \approx {{\mathcal {R}}}(\phi _{\varepsilon ,d})(x)\) on the cube \([a,b]^d\) whose complexity is bounded by \({{\mathcal {C}}}(\phi _{\varepsilon ,d})\le c \varepsilon ^{-c}d^c\) for some \(c>0\) independent of \(\varepsilon \) and d, where ReLU calculus (Lemma 9, 10, 12 in Appendix B) is essentially used.
-
apply (0), (1), (2) and (3) to obtain the main result.
-
In the proof, we frequently use an elementary result: \(\textstyle {\sup _{x \in [a,b]^d}} \Vert x \Vert \le d^{1/2} \max \{ |a|,|b| \}\), which is obtained in the proof of Corollary 4.2 of [11].
4.1 Proof of Proposition 1
For \(x\in {\mathbb {R}}^d\), \(t \in (0,T]\) and \(\lambda \in (0,1]\), let \(F_t^{d,\lambda ,x}=(F_t^{d,\lambda ,x,1},\ldots ,F_t^{d,\lambda ,x,d}) \in ({\mathbb {D}}^{\infty }(\Omega ^d))^d\) be given by \(F_t^{d,\lambda ,x,j}=(X_t^{d,\lambda ,x,j}-x_j)/\lambda \), \(j=1,\ldots ,d\). We note that \(\{ F_t^{d,\lambda ,x} \}_{\lambda }\) is a family of uniformly non-degenerate Wiener functionals (see Theorem 3.4 of [40]). Then, for \({{\mathcal {T}}} \in \mathcal{S}'({\mathbb {R}}^d)\), the composition \({{\mathcal {T}}}(F_t^{d,\lambda ,x})\) is well-defined as an element of \({\mathbb {D}}^{-\infty }(\Omega ^d)\), and the density of \(F_t^{d,\lambda ,x}\), namely \(p^{F_t^{d,\lambda ,x}} \in {{\mathcal {S}}}({\mathbb {R}}^d)\) has the representation \(p^{F_t^{d,\lambda ,x}}(y)={}_{{\mathbb {D}}^{-\infty }} \langle \delta _y(F_t^{d,\lambda ,x}), 1 \rangle {}_{{\mathbb {D}}^{-\infty }}\) for \(y \in {\mathbb {R}}^d\). Then, for \(x\in {\mathbb {R}}^d\), \(t>0\) and \(\lambda \in (0,1]\), it holds that
For \(x\in {\mathbb {R}}^d\), \(t \in (0,T]\), let \(F_t^{d,0,x}=\textstyle {\sum _{i=0}^d}V_{d,i}(x)B_t^{d,i}\). Thus, for \(S \in {{\mathcal {S}}}'({\mathbb {R}}^d)\), the composition \(S(F_t^{d,\lambda ,x})\) is well-defined as an element of \({\mathbb {D}}^{-\infty }(\Omega ^d)\) and has an expansion:
for \(x\in {\mathbb {R}}^d\), \(t>0\) and \(\lambda \in (0,1]\), where
By the integration by parts (2.7) and Theorem 2.6 of [35] yield that
where \(\textstyle {\sum _{i^{(k)},\gamma ^{(k)}}^{j}=\sum _{k=1}^j \sum _{i^{(k)}=(i_1,\ldots ,i_k) \ s.t. \ i_1+\cdots +i_k=j,i_e\ge 1}\sum _{\gamma ^{(k)}=(\gamma _1,\ldots ,\gamma _k)\in \{1,\cdots ,d \}^k}\frac{1}{k!}}\) With a calculation
for \(j=1,\ldots ,d\) and \(i\in {\mathbb {N}}\), it holds that
Again by the integration by parts (2.7), \(\textstyle {\frac{\partial ^{m+1}}{\partial \eta ^{m+1}}} {}_{{\mathbb {D}}^{-\infty }} \langle \delta _y(F_t^{d,\lambda ,x}),1\rangle {}_{{\mathbb {D}}^{\infty }} |_{\eta =\lambda u}\) (with \(\lambda u \in (0,1]\)) in \(\mathcal{E}_{m,t}^{d,\lambda ,x,y}\) in (4.3) is given by a linear combination of the expectations of the form
with \(k \le m+1\), \(\gamma \in \{1,\ldots ,d \}^k\) and \(\beta _1,\ldots ,\beta _k\ge 1\) such that \(\textstyle {\sum _{\ell =1}^k} \beta _\ell =m+1\). By the inequality of Lemma 5 with \(k=0\) in Appendix A, we have for all \(p\ge 1\) and multi-index \(\gamma \), there are \(c>0\), \(p_1,p_2,p_3>1\) and \(r \in {\mathbb {N}}\) satisfying
for all \(G \in {\mathbb {D}}^\infty \), \(t \in (0,T]\), \(\lambda \in (0,1]\) and \(x \in [a,b]^d\). In order to show the upper bound of the weight appearing in the residual term of the expansion, we list the following results:
Lemma 1
-
1.
For all \(p>1\), there exists \(\kappa _1>0\) such that for all \(d\in {\mathbb {N}}\), \(t \in (0,T]\), \(x\in [a,b]^d\) and \(\lambda \in (0,1]\),
$$\begin{aligned} \Vert \det (\sigma ^{F_t^{d,\lambda ,x}})^{-1} \Vert _p \le \kappa _1 d^{\kappa _1} t^{-d}. \end{aligned}$$(4.8) -
2.
For all \(p>1\), \(r\in {\mathbb {N}}\), there exists \(\kappa _2>0\) such that for all \(d\in {\mathbb {N}}\), \(t \in (0,T]\), \(x\in [a,b]^d\) and \(\lambda \in (0,1]\),
$$\begin{aligned} \Vert DF_t^{d,\lambda ,x} \Vert _{r,p,H}\le \kappa _2 d^\kappa _2 t^{1/2}. \end{aligned}$$(4.9) -
3.
For all \(\ell \in {\mathbb {N}}\), \(p>1\) and \(r\in {\mathbb {N}}\), there exists \(\eta >0\) such that for all \(d\in {\mathbb {N}}\), \(t \in (0,T]\), \(x\in [a,b]^d\) and \(\lambda \in (0,1]\),
$$\begin{aligned} \Vert \partial _{\lambda }^\ell F_t^{d,\lambda ,x} \Vert _{r,p} \le \eta d^\eta t^{(\ell +1)/2}. \end{aligned}$$(4.10)
Proof
For \(d\in {\mathbb {N}}\), let \(V_d: {\mathbb {R}}^d \rightarrow {\mathbb {R}}^{d \times d}\) be such that \(V_d=(V_{d,1},\ldots ,V_{d,d})\) and for \(\lambda \in (0,1]\), let \(V^{\lambda }_d: {\mathbb {R}}^d \rightarrow {\mathbb {R}}^{d \times d}\) be such that \(V^{\lambda }_d=(V^{\lambda }_{d,1},\ldots ,V^{\lambda }_{d,d})\). Moreover, for \(d\in {\mathbb {N}}\), we use the notation \(J_{0\rightarrow t}=\textstyle {\frac{\partial }{\partial x}X_t^{d,\lambda ,x}}=(\textstyle {\frac{\partial }{\partial x_i}X_t^{d,\lambda ,x,j})_{1\le i,j \le d}}\) for \(x\in {\mathbb {R}}^d\), \(t>0\) and \(\lambda \in (0,1]\).
-
1.
Note that for \(d\in {\mathbb {N}}\), \(t \in (0,T]\), \(x \in {\mathbb {R}}^d\) and \(\lambda \in (0,1]\), we have
$$\begin{aligned} \sigma ^{F_t^{d,\lambda ,x}}&= \int _0^t [D_{s} (X_t^{d,\lambda ,x}-x)/\lambda ] [D_{s} (X_t^{d,\lambda ,x}-x)/\lambda ]^{\top } ds \end{aligned}$$(4.11)$$\begin{aligned}&=\int _0^t J_{0 \rightarrow t} J_{0 \rightarrow s}^{-1} V_d (X_s^{d,\lambda ,x})V_d(X_s^{d,\lambda ,x})^{\top } {J_{0 \rightarrow s}^{-1}}^{\top } J_{0 \rightarrow t}^{\top } ds. \end{aligned}$$(4.12)Under the condition \(\sigma _{d}^{\lambda }(\cdot )\sigma _{d}^{\lambda }(\cdot )^{\top } \ge \lambda ^2 I_{d}\), (i.e. \(V_{d}(\cdot )V_{d}(\cdot )^{\top } \ge I_{d}\)) in Assumption 1.3, we have that there is \(c>0\) such that
$$\begin{aligned} \sup _{x\in [a,b]^d} \Vert (\det \sigma ^{F_t^{d,\lambda ,x}})^{-1} \Vert _p \le cd^c t^{-d}, \end{aligned}$$(4.13)for all \(d\in {\mathbb {N}}\), \(t \in (0,T]\) and \(\lambda \in (0,1]\), by Theorem 3.5 of Kusuoka and Stroock [22].
-
2.
We recall that for \(d \in {\mathbb {N}}\), \(\lambda \in (0,1]\) and \(0\le s<t\), \(D_{s} (X_t^{d,\lambda ,x}-x)/\lambda =J_{0 \rightarrow t} J_{0 \rightarrow s}^{-1} V(X_s^{d,\lambda ,x})\). Then, there is \(c>0\) such that
$$\begin{aligned} \sup _{x\in [a,b]^d} \Vert DF_t^{d,\lambda ,x} \Vert _{k,p,H^d} \le c d^c t^{1/2}, \end{aligned}$$(4.14)for all \(d\in {\mathbb {N}}\), \(t \in (0,T]\) and \(\lambda \in (0,1]\), by Theorem 2.19 of Kusuoka and Stroock [22].
-
3.
Note that
$$\begin{aligned} \frac{1}{\ell !}\frac{\partial ^\ell }{\partial \lambda ^\ell }X_t^{d,\lambda ,x,r}&= \sum _{i^{(k)},\gamma ^{(k)}}^{\ell -1}\int _0^t \prod _{e=1}^k \frac{1}{i_e !}\frac{\partial ^{i_e}}{\partial \lambda ^{i_e} }X_t^{d,\lambda ,x,\gamma _e}\sum _{j=0}^d \partial ^{\gamma ^{(k)}} V_j^r(X_s^{d,\lambda ,x})dB_s^{d,j} \end{aligned}$$(4.15)$$\begin{aligned}&\quad +\lambda \sum _{i^{(k)},\gamma ^{(k)}}^{\ell }\int _0^t \prod _{e=1}^k \frac{1}{i_e !}\frac{\partial ^{i_e}}{\partial \lambda ^{i_e} }X_t^{d,\lambda ,x,\gamma _e}\sum _{j=0}^d \partial ^{\gamma ^{(k)}} V_j^r(X_s^{d,\lambda ,x})dB_s^{d,j}. \end{aligned}$$(4.16)Since the above is a linear SDE, it has the explicit form and we have
$$\begin{aligned} \sup _{x \in [a,b]^d}\Big \Vert \frac{1}{\ell !}\frac{\partial ^\ell }{\partial \lambda ^\ell }X_t^{d,\lambda ,x} \Big \Vert _{k,p} \le c d^c t^{\ell /2}, \end{aligned}$$(4.17)for some \(c>0\) independent of t and d, due to the result:
$$\begin{aligned} \sup _{x \in [a,b]^d} \Big \Vert&\sum _{i^{(k)},\gamma ^{(k)}}^{\ell -1}\int _0^t J_{0\rightarrow t}J_{0\rightarrow s}^{-1} \prod _{e=1}^k \frac{1}{i_e !}\frac{\partial ^{i_e}}{\partial \lambda ^{i_e} }X_t^{d,\lambda ,x,\gamma _e}\sum _{j=0}^d \partial ^{\gamma ^{(k)}} V_j(X_s^{d,\lambda ,x})dB_s^{d,j} \Big \Vert _{k,p}\nonumber \\&\le c d^c t^{\ell /2}, \end{aligned}$$(4.18)which is obtained by using Lemmas 6 and 7 in Appendix A. Then, the process
$$\begin{aligned} \frac{1}{\ell !}\frac{\partial ^\ell }{\partial \lambda ^\ell }F_t^{d,\lambda ,x}&= \sum _{i^{(k)},\gamma ^{(k)}}^{\ell }\int _0^t \prod _{e=1}^k \frac{1}{i_e !}\frac{\partial ^{i_e}}{\partial \lambda ^{i_e} }X_t^{d,\lambda ,x,\gamma _e}\sum _{j=0}^d \partial ^{\gamma ^{(k)}} V_j(X_s^{d,\lambda ,x})dB_s^{d,j}, \ \ t\nonumber \\&\ge 0, x \in {\mathbb {R}}^d \end{aligned}$$(4.19)satisfies
$$\begin{aligned} \sup _{x \in [a,b]^d} \Big \Vert \frac{1}{\ell !}\frac{\partial ^\ell }{\partial \lambda ^\ell }F_t^{d,\lambda ,x} \Big \Vert _{k,p} \le c d^c t^{(\ell +1)/2}, \end{aligned}$$(4.20)for some \(c>0\) independent of t and d.
\(\square \)
Using above, we have that for all \(k \le m+1\), \(\gamma \in \{1,\ldots ,d \}^k\) and \(\beta _1,\ldots ,\beta _k\ge 1\) such that \(\textstyle {\sum _{\ell =1}^k} \beta _\ell =m+1\), \(p>1\) and multi-index \(\gamma \), there exists \(\nu >0\) such that
for all \(t \in (0,T]\), \(x\in [a,b]^d\) and \(\lambda \in (0,1]\). Let us define \(r_{m,t}^{d,\lambda ,x}\) for \(t \in (0,T]\), \(x\in [a,b]^d\) and \(\lambda \in (0,1]\) from (4.1) and (4.6) as
where \(\tilde{X}_t^{d,\lambda ,u,x}=x+\lambda F_t^{d,\lambda u,x}\), \(u \in [0,1]\) and
with \(\textstyle {\sum _{\beta ^{(k)},\gamma ^{(k)}}^{[m+1]}:=(m+1)! \sum _{k=1}^j \sum _{\beta ^{(k)}=(\beta _1,\ldots ,\beta _k) s.t. \sum _{\ell =1}^k \beta _\ell =j,\beta _i\ge 1}\sum _{\gamma ^{(k)}=(\gamma _1,\ldots ,\gamma _k)\in \{1,\cdots ,d \}^k}}{\frac{1}{k!}}\).
Here, \(X_t^{d,\lambda ,u,x}\), \(u \in [0,1]\) and \(\mathcal{W}_{m+1,t}^{d,\lambda ,u,x}\), \(u \in [0,1]\) satisfy that for \(p \ge 1\), there exists \(\eta >0\) such that
for all \(\lambda \in (0,1]\) and \(t>0\). Therefore, there exists \(c>0\) such that
for all \(\lambda \in (0,1]\) and \(t \in (0,T]\), and then the assertion of Proposition 1 holds.
4.2 Proof of Proposition 2
For \(d \in {\mathbb {N}}\) and for \(m \in {\mathbb {N}}\), first note that the following representation holds:
for \(t \in (0,T]\), \(x \in {\mathbb {R}}^d\), \(\lambda \in (0,1]\), \(k=1,\ldots ,j \le m\), \(\beta _1,\ldots ,\beta _k \ge 2\) such that \(\beta _1+\cdots +\beta _k=j+k\), and \(\gamma \in \{1,\ldots ,d \}^k\). Using the Itô formula for the products of iterated integrals (Proposition 5.2.3 of [21] for example) and the formula from (3.14): for a multi-index \(\gamma \in \{1,\ldots ,d \}^p\) and a multi-index \(\alpha \in \{0,1,\ldots ,d \}^q\),
iteratively, we have (3.15) and the representation (3.16).
We can see that for \(p\ge 1\) and \(e=1,\ldots ,n(m)\), \(\Vert g_e(t) \textrm{Poly}_e(B_t^d)\Vert _p=O(t^{\nu _r/2})\) for some \(\nu _r \ge 1\), and by Assumption 1 and 2 and the expression of \(h_e\), there is \(\eta >0\) independent of d such that \(|h_e(x)| \le \eta d^\eta \) for all \(e=1,\ldots ,n(m)\) and \(x \in [a,b]^d\). Then, for \(p\ge 1\), there exists \(c>0\) independent of d such that
uniformly in \((t,x)\in (0,T] \times [a,b]^d\) and \(\lambda \in (0,1]\).
4.3 Proof of Theorem 1
The first statement is immediately obtained by combining Propositions 1 with 2:
Hereafter, we fix \(t \in (0,T]\) and \(\lambda \in (0,1]\). For \(d \in {\mathbb {N}}\), \(x\in {\mathbb {R}}^d\), \(\delta \in (0,1)\), let
and \({{\mathcal {M}}}^{m,\delta }_{d,\lambda }(t,x,B_t^d) \in {\mathbb {D}}^\infty (\Omega ^d)\) be a functional which has the form:
where \(h_{e}^{\delta }: {\mathbb {R}}^d \rightarrow {\mathbb {R}}\), \(e=1,\ldots ,n(m)\) are functions constructed by some products of \(A^{-1}_{d,\delta }\), \(\{V^\delta _{d,i}\}_{0\le i \le d}\) and \(\{V^\delta _{d,i,\alpha }\}_{0\le i \le d,\alpha \in \{1,\ldots ,d \}^{\ell },\ell \le 2m}\) in Assumption 2, by replacing with \(A^{-1}_{d}\), \(\{V_{d,i}\}_{0\le i \le d}\) and \(\{V_{d,i,\alpha }\}_{0\le i \le d,\alpha \in \{1,\ldots ,d \}^{\ell },\ell \le 2m}\) in Proposition 2, satisfying
Next, we prepare the following lemmas (Lemmas 2, 3 and 4) to prove the second assertion ((3.20)) in Theorem 1.
Lemma 2
There exists \(c_1>0\) which depends only on \(a,b,C,m,\kappa ,t\) and \(\lambda \) such that for all \(\varepsilon \in (0,1)\), \(d\in {\mathbb {N}}\), \(\delta =O(\varepsilon ^{c_1} d^{-c_1})\),
where \(f^{\delta }_{d}={{\mathcal {R}}}(\psi _{\delta }^{f_d}) \in C({\mathbb {R}}^d,{\mathbb {R}})\) is defined in Assumption 2.4.
Proof
In the proof, we use a generic constant \(c>0\) which depends only on \(a,b,C,m,\kappa ,t\) and \(\lambda \). Note that for \(x \in [a,b]^d\),
By 2 of Assumption 2 (with Assumption 1), it holds that
for all \(x \in [a,b]^d\). By 4 of Assumption 2 (with Assumption 1), it holds that
for all \(x \in [a,b]^d\). Here, the estimate \( \Vert \mathcal{M}^m_{d,\lambda }(t,x,B_t^d) \Vert _2 \le cd^c\) in (3.18) is used in (4.35) and (4.36). By 2, 3, 4 of Assumption 2 (with Assumption 1), (3.16) and (4.31), we have that for \(p\ge 1\),
and
for all \(x \in [a,b]^d\). Then, by taking \(\delta =(1/3) c_1^{-1}\varepsilon ^{c_1}d^{-c_1}\) with \(c_1=\max \{1,c \}\) where c is the maximum constant appearing in (4.35), (4.36) and (4.38)), we have
\(\square \)
Lemma 3
For \(d\in {\mathbb {N}}\), \(t \in (0,T]\) and \(M\in {\mathbb {N}}\), let \(B_t^{d,(\ell )}\), \(\ell =1,\ldots ,M\) be independent identically distributed random variables such that \(B_t^{d,(\ell )} \overset{\textrm{law}}{=} B_t^{d}\). There exists \(c_2>0\) which depends only on \(a,b,C,m,\kappa ,t\) and \(\lambda \) such that for \(\varepsilon \in (0,1)\), \(d\in {\mathbb {N}}\) and \(M=O(\varepsilon ^{-c_2} d^{c_2})\), there is \(\omega _{\varepsilon ,d} \in \Omega ^d\) satisfying
where \(\delta =O(\varepsilon ^{c_1}d^{-c_1})\) with the constant \(c_1\) in Lemma 2.
Proof
There exists a constant \(c >0\) which depends only on \(a,b,C,m,\kappa ,t\) and \(\lambda \) such that for all \(x \in [a,b]^d\) and \(M \in {\mathbb {N}}\),
Then, by choosing \(c_2=\max \{1,c \}\), we have that for all \(\varepsilon \in (0,1)\), \(d \in {\mathbb {N}}\) and \(M=c_2 \varepsilon ^{-c_2}d^{c_2}\),
for all \(x \in [a,b]^d\), and therefore, there is \(\omega _{\varepsilon ,d} \in \Omega ^d\) satisfying
\(\square \)
Lemma 4
For \(d\in {\mathbb {N}}\), \(t \in (0,T]\) and \(M\in {\mathbb {N}}\), let \(B_t^{d,(\ell )}\), \(\ell =1,\ldots ,M\) be independent identically distributed random variables such that \(B_t^{d,(\ell )} \overset{\textrm{law}}{=} B_t^{d}\). There exist \(\{ \phi _{\varepsilon ,d} \}_{\varepsilon \in (0,1),d \in {\mathbb {N}}} \subset {{\mathcal {N}}}\) and \(c>0\) (which depends only on \(a,b,C,m,\kappa ,t\) and \(\lambda \)) such that for all \(\varepsilon \in (0,1)\), \(d\in {\mathbb {N}}\), we have \(\mathcal{C}(\phi _{\varepsilon ,d})\le c \varepsilon ^{-c}d^c\), and for a realization \(\omega _{\varepsilon ,d} \in \Omega ^d\) given in Lemma 3, it holds that
where \(\delta =O(\varepsilon ^{c_1}d^{-c_1})\) and \(M=O(\varepsilon ^{-c_2}d^{c_2})\) with the constants \(c_1\) and \(c_2\) in Lemmas 2 and 3.
Proof
In the proof, we use a generic constant \(c>0\) which depends only on \(a,b,C,m,\kappa ,t\) and \(\lambda \). Let \(\varepsilon \in (0,1)\), \(d\in {\mathbb {N}}\), \(\ell =1,\ldots ,M\), let \(\delta =O(\varepsilon ^{c_1} d^{-c_1})\), \(M=O(\varepsilon ^{-c_2} d^{c_2})\) where \(c_1\) and \(c_2\) are the constants appearing in Lemmas 2 and 3, let \(\omega _{\varepsilon ,d}\) be a realization given in Lemma 3, and let \(b^{d,(\ell )}=B_t^{d,(\ell )}(\omega _{\varepsilon ,d})\). Since there exists \(\eta _{\delta ,d}^{(\ell )} \in {{\mathcal {N}}}\) such that \(\mathcal{R}(\eta ^{(\ell )}_{\delta ,d})(x)=x+\lambda \mathcal{R}(\psi _{\delta ,d}^{V_0})(x)t+\lambda \textstyle {\sum _{i=1}^d} \mathcal{R}(\psi _{\delta ,d}^{V_i})(x) b^{d,(\ell ),i}\) for \(x \in {\mathbb {R}}^d\) and \(\mathcal{C}(\eta ^{(\ell )}_{\delta ,d})=O(\delta ^{-c}d^c)\) (by Lemma 9 in Appendix B), there exists \(\psi _{1,(\ell )}^{\delta ,d} \in {{\mathcal {N}}}\) such that \(\mathcal{R}(\psi _{1,(\ell )}^{\delta ,d})(x)=\mathcal{R}(\psi _{\delta ,d}^{f})(\mathcal{R}(\eta ^{(\ell )}_{\delta ,d})(x))=f_{d}^\delta (\bar{X}_t^{d,\lambda ,x,\delta }(\omega _{\varepsilon ,d}))\) for \(x \in {\mathbb {R}}^d\) and \(\mathcal{C}(\psi _{1,(\ell )}^{\delta ,d})=O(\delta ^{-c}d^c)\) (by Lemma 10 in Appendix B). Next, we recall that by (4.31), the weight \(\mathcal{M}^{m,\delta }_{d,\lambda }(t,x,b^{d,(\ell )})\), \(x \in {\mathbb {R}}^d\) has the form \({{\mathcal {M}}}^{m,\delta }_{d,\lambda }(t,x,b^{d,(\ell )})= \textstyle {1+\sum _{e\le n(m)}} \lambda ^{p(e)} g_e(t) h^{\delta }_{e}(x)\textrm{Poly}_{e}(b^{d,(\ell )})\) constructed by some products of \(A^{-1}_{d,\delta }\), \(\{V^{\delta }_{d,i}\}_{0\le i \le d}\) and \(\{V^{\delta }_{d,i,\alpha }\}_{0\le i \le d,\alpha \in \{1,\ldots ,d \}^{\ell },\ell \le 2m}\) in Assumption 2. Using Lemmas 12, 9 in Appendix B and Assumption 2, there is a neural network \(\psi ^{\varepsilon ,d}_{2,(\ell )} \in {{\mathcal {N}}}\) such that \(\textstyle {\sup _{x\in [a,b]^d}}|\mathcal{M}^{m,\delta }_{d,\lambda }(t,x,b^{d,(\ell )})-\mathcal{R}(\psi ^{\varepsilon ,d}_{2,(\ell )})(x)|\le \varepsilon /2\) and \({{\mathcal {C}}}(\psi ^{\varepsilon ,d}_{2,(\ell )})=O(\varepsilon ^{-c}d^c)\). Hence, we have
We again use Lemma 12 in Appendix B to see that there exists \(\Psi _{(\ell )}^{\varepsilon ,d} \in {{\mathcal {N}}}\) such that
for all \(x \in [a,b]^d\), and \(\mathcal{C}(\Psi _{(\ell )}^{\varepsilon ,d})=O(\varepsilon ^{-c}d^{c})\). Finally, applying Lemma 9 gives the desired result, i.e. there exist \(\{ \phi _{\varepsilon ,d} \}_{\varepsilon \in (0,1),d \in {\mathbb {N}}} \subset {{\mathcal {N}}}\) and \(c>0\) such that for all \(\varepsilon \in (0,1)\), \(d\in {\mathbb {N}}\), we have \(\mathcal{C}(\phi ^{\varepsilon ,d})\le c \varepsilon ^{-c}d^c\), and for a realization \(\omega _{\varepsilon ,d} \in \Omega ^d\) given in Lemma 3, it holds that
\(\square \)
Proof
The first assertion (in (3.19)) follows from (4.29). The second assertion (in (3.20)) is obtained by combining Lemmas 2, 3 and 4. \(\square \)
5 Deep learning implementation
We briefly provide the implementation scheme for the approximation in Theorem 1. Let \(\xi \) be a uniformly distributed random variable, i.e. \(\xi \in U([a,b]^d)\), and define \(\textstyle {{\mathbb {X}}_t^{\xi }=\xi +\lambda \sum _{i=0}^d V_{i,d}(\xi )B_t^{i,d}}\), \(t \ge 0\). For \(t>0\), the m-th order asymptotic expansion of Theorem 1 can be represented by
which is obtained by Theorem 1 of this paper combining with Proposition 2.2 of Beck et al. [2]. We construct a deep neural network \(u^{{{\mathcal {N}}}{{\mathcal {N}}},\theta ^*}(t,\cdot )\) to approximate the function \(u^{m}(t,\cdot )\) given by for a depth \(L \in {\mathbb {N}}\) and \(N_0,N_1,\ldots ,N_L \in {\mathbb {N}}\),
where \({{\mathcal {A}}}_{W^\theta _k,B^\theta _k}(x)=W^\theta _kx+B^\theta _k\), \(x \in {\mathbb {R}}^{N_{k-1}}\), \(k=1,\ldots ,L\) with \(((W^\theta _1,B^\theta _1),\ldots ,(W^\theta _L,B^\theta _L)) \in \mathcal{N}_L^{N_0,N_1,\ldots ,N_L}\) given by
and the optimized parameter \(\theta ^*\) obtained by the following minimization problem:
In the implementation of the deep neural network approximation, we use stochastic gradient descent method and the Adam optimizer [20] as in Sects. 3 and 4 of Beck et al. [2]. In Appendix C, we list the sample code of the scheme for a high-dimensional PDE with a nonlinear coefficient in Sect. 6.2 (which includes linear coefficient case).
6 Numerical examples
In the section, we perform numerical experiments in order to demonstrate the accuracy of our scheme. We compare the deep learning method of Beck et al. [2] where the Euler–Maruyama scheme is used with the stochastic gradient descent method with the Adam optimizer. All experiments are performed in Google Colaboratory using Tensorflow.
6.1 High-dimensional Black–Scholes model
6.1.1 Uncorrelated case
First, we examine our scheme for a high-dimensional Black–Scholes model (geometric Brownian motion) whose corresponding PDE is given by
where \(f_d(x)=\max \{ \max \{x_1-K \},\ldots ,\max \{x_d-K \} \}\). Let \(d=100\), \(t=1.0\), \(a=99.0\), \(b=101.0\), \(K=100.0\), \(\lambda =0.3\), \(\mu =1/30\) (or \(r:=\lambda \times \mu =0.01\)), \(c_i=1.0\) (or \(\sigma _i:=\lambda \times c_i=0.3\)), \(i=1,\ldots ,100\). We approximate the function \(u_\lambda ^d(t,\cdot )\) (or the maximum option price \(e^{-rt}u_\lambda ^d(t,\cdot )\) in financial mathematics) on \([a,b]^d\) by constructing a deep neural network (1 input layer with d-neurons, 2 hidden layers with 2d-neurons each and 1 output layer with 1-neuron) based on Theorem 1 with \(m=1\) and Sect. 5. For the experiment, we use the batch size \(M=1,024\), the number of iteration steps \(J=5,000\) and the learning rate \(\gamma (j)=10^{-1}{} \textbf{1}_{[0,0.3J]}(j)+10^{-2}\textbf{1}_{(0.3J,0.6J]}(j)+10^{-3}{} \textbf{1}_{(0.6J,J]}(j)\), \(j \le J\) for the stochastic gradient descent method. After we estimate the function \(u_\lambda ^d(t,\cdot )\), we input \(x_0=(100.0,\ldots ,100.0) \in [a,b]^d\) to check the accuracy. We compute the mean of 10 independent trials and estimate the relative error, i.e. \(|(u_\lambda ^{deep,d}(t,x_0)-u_\lambda ^{ref,d}(t,x_0))/u_\lambda ^{ref,d}(t,x_0)|\) where the reference value \(u_\lambda ^{ref,d}(t,x_0)\) is computed by the Itô formula with Monte-Carlo method with \(10^7\)-paths. The same experiment is applied to the method of Beck et al. [2]. Table 1 provides the numerical results (the relative errors and the runtimes) for AE \(m=1\) and the method in Beck et al. [2] with the Euler–Maruyama discretization \(n=16\), 32 (Beck et al. \(n=16\), Beck et al. \(n=32\) in the table).
6.1.2 Correlated case
We next provide a numerical example for a Black-Scholes model with correlated noise in high-dimension. Let us consider the following PDE:
where \(f_d(x)=\max \{ K-\textstyle {\frac{1}{d}\sum _{i=1}^d x_i},0 \}\) and \(\sigma =[\sigma _k^j]_{k,j} \in {\mathbb {R}}^{d \times d}\) satisfies \(\sigma _{ij}=0\) for \(i<j\), \(\sigma _{ii}>0\) for \(i=1,\ldots ,d\) and
Let \(d=100\), \(t=1.0\), \(a=99.0\), \(b=101.0\), \(K=90.0\), \(\lambda =0.3\), \(\mu =0.0\), \(\rho =0.5\). We approximate the function \(u_\lambda ^d(t,\cdot )\) (the basket option price in financial mathematics) on \([a,b]^d\) by constructing a deep neural network (1 input layer with d-neurons, 2 hidden layers with 2d-neurons each and 1 output layer with 1-neuron) based on Theorem 1 (\(m=1\)) with the expansion technique of the basket option price given in Section 3.1 of Takahashi [32] and Sect. 5. For the experiment, we use the batch size \(M=1,024\), the number of iteration steps \(J=5,000\) and the learning rate \(\gamma (j)=5.0\times 10^{-2}{} \textbf{1}_{[0,0.3J]}(j)+5.0\times 10^{-3}\textbf{1}_{(0.3J,0.6J]}(j)+5.0\times 10^{-4}{} \textbf{1}_{(0.6J,J]}(j)\), \(j\le J\) for the stochastic gradient descent method. After we estimate the function \(u_\lambda ^d(t,\cdot )\), we input \(x_0=(100.0,\ldots ,100.0) \in [a,b]^d\) to check the accuracy. We compute the mean of 10 independent trials and estimate the relative error, i.e. \(|(u_\lambda ^{deep,d}(t,x_0)-u_\lambda ^{ref,d}(t,x_0))/u_\lambda ^{ref,d}(t,x_0)|\) where the reference value \(u_\lambda ^{ref,d}(t,x_0)\) is computed by the Itô formula with Monte-Carlo method with \(10^7\)-paths. The same experiment is applied to the method of Beck et al. [2]. Table 2 provides the numerical results (the relative errors and the runtimes) for AE \(m=1\) and the method in Beck et al. [2] with the Euler–Maruyama discretization \(n=32\), 64 (Beck et al. \(n=32\), Beck et al. \(n=64\) in the table).
6.2 High-dimensional CEV model (nonlinear volatility case)
We consider a Kolmogorov PDE with nonlinear diffusion coefficients whose corresponding stochastic process is called the CEV model:
where \(f_d(x)=\max \{ \max \{x_1-K \},\ldots ,\max \{x_d-K \} \}\). Let \(d=100\), \(t=1.0\), \(a=99.0\), \(b=101.0\), \(K=100.0\), \(\lambda =0.3\), \(\mu =1/30\) (or \(r:=\lambda \times \mu =0.01\)), \(\beta _i=0.5\), \(\gamma _i=K^{1-\beta _i}\), \(c_i=1.0\) (or \(\sigma _i:=\lambda \times c_i=0.3\)), \(i=1,\ldots ,d\). We approximate the function \(u_\lambda ^d(t,\cdot )\) (or the maximum option price \(e^{-rt}u_\lambda ^d(t,\cdot )\)) on \([a,b]^d\) by constructing a deep neural network (1 input layer with d-neurons, 2 hidden layers with 2d-neurons each and 1 output layer with 1-neuron,) based on Theorem 1 with \(m=1\). For the experiment, we use the batch size \(M=1024\), the number of iteration steps \(J=5000\) and the learning rate \(\gamma (j)=5.0\times 10^{-1}\textbf{1}_{[0,0.3J]}(j)+5.0\times 10^{-2}{} \textbf{1}_{(0.3J,0.6J]}(j)+5.0\times 10^{-3}{} \textbf{1}_{(0.6J,J]}(j)\), \(j \le J\) for the stochastic gradient descent method. After we estimate the function \(u_\lambda ^d(t,\cdot )\), we input \(x_0=(100.0,\ldots ,100.0) \in [a,b]^d\) to check the accuracy. We compute the mean of 10 independent trials and estimate the relative error, i.e. \(|(u_\lambda ^{deep,d}(t,x_0)-u_\lambda ^{ref,d}(t,x_0))/u_\lambda ^{ref,d}(t,x_0)|\) where the reference value \(u_\lambda ^{ref,d}(t,x_0)\) is computed by Monte-Carlo method with the Euler–Maruyama scheme with time-steps \(2^{10}\) and \(10^7\)-paths. The same experiment is applied to the method of Beck et al. [2]. Table 3 provides the numerical results (the relative errors and the runtimes) for AE \(m=1\) and the method in Beck et al. [2] with the Euler-Maruyama discretization \(n=32\), 64 (Beck et al. \(n=32\), Beck et al. \(n=64\) in the table).
6.3 High-dimensional Heston model
We finally show an example for a small time asymptotic expansion for a high-dimensional Heston model:
where \(f_{2d}(x)=\max \{ \max \{x_1-K \},\ldots ,\max \{x_{2d-1}-K \} \}\) and \({{\mathcal {L}}}^{2d,\lambda }\) is a generator given by
Let \(d=25\) (\(2d=50\)), \(t=0.5\), \(a=99.0\), \(b=101.0\), \(a'=0.035\), \(b'=0.045\), \(K=100.0\), \(\lambda =1.0\), \(\kappa _i=1.0\), \(\theta _i=0.04\), \(\nu _i=0.1\), \(\rho _i=-0.5\), \(i=1,\ldots ,d\). We approximate the function \(u_\lambda ^d(t,\cdot )\) on \([a,b]^d\) by constructing a deep neural network (1 input layer with 2d-neurons, 2 hidden layers with 4d-neurons each and 1 output layer with 1-neuron) based on Theorem 1 with \(m=1\) and Sect. 5. For the experiment, we use the batch size \(M=1,024\), the number of iteration steps \(J=5,000\) and the learning rate \(\gamma (j)=5.0\times 10^{-2}{} \textbf{1}_{[0,0.3J]}(j)+5.0\times 10^{-3}{} \textbf{1}_{(0.3J,0.6J]}(j)+5.0\times 10^{-4}\textbf{1}_{(0.6J,J]}(j)\), \(j \le J\) for the stochastic gradient descent method. After we estimate the function \(u_\lambda ^d(t,\cdot )\), we input \(x_0=(100.0,0.04,\ldots ,100.0,0.04) \in ([a,b] \times [a',b'])^d\) to check the accuracy. We compute the mean of 10 independent trials and estimate the relative error, i.e. \(|(u_\lambda ^{deep,d}(t,x_0)-u_\lambda ^{ref,d}(t,x_0))/u_\lambda ^{ref,d}(t,x_0)|\) where the reference value \(u_\lambda ^{ref,d}(t,x_0)\) is computed by Monte-Carlo method with the Euler–Maruyama scheme with time-steps \(2^{10}\) and \(10^7\)-paths. The same experiment is applied to the method of Beck et al. [2]. Table 4 provides the numerical results (the relative errors and the runtimes) for AE \(m=1\) and the method in Beck et al. [2] with the Euler–Maruyama discretization \(n=16\), 32 (Beck et al. \(n=16\), Beck et al. \(n=32\) in the table).
7 Conclusion
In the paper, we introduced a new spatial approximation for solving high-dimensional PDEs without the curse of dimensionality, where an asymptotic expansion method with a deep learning-based algorithm is effectively applied. The mathematical justification for the spatial approximation was provided using Malliavin calculus and ReLU calculus. We checked the effectiveness of our method through numerical examples for high-dimensional Kolmogorov PDEs.
More accurate deep learning-based implementations based on the method of the paper should be studied as a next research topic. We believe that higher order asymptotic expansion or higher order weak approximation (discretization) will give robust computation schemes without the curse of dimensionality, which should be proved mathematically in the future work. Also, applying our method to nonlinear problems as in [14, 15] will be a challenging and important task.
Data Availability Statement
The manuscript has no associated real data.
References
Bally, V.: An elementary introduction to Malliavin calculus. INRIA (2003)
Beck, C., Becker, S., Grohs, P., Jaafari, N., Jentzen, A.: Solving the Kolmogorov PDE by means of deep learning. J. Sci. Comput. 88, 73 (2021)
Beck, C., Hutzenthaler, M., Jentzen, A., Kuckuck, B.: An overview on deep learning-based approximation methods for partial differential equations. Discrete Contin. Dyn. Syst. B 28(6), 3697–3746 (2023)
Berner, J., Grohs, P., Jentzen, A.: Analysis of the generalization error: empirical risk minimization over deep artificial neural networks overcomes the curse of dimensionality in the numerical approximation of Black–Scholes partial differential equations. SIAM J. Math. Data Sci. 2(3), 631–657 (2020)
Elbrac̈hter, D., Perekrestenko, D., Grohs, P., Bölcskei, H.: Deep neural network approximation theory. IEEE Trans. Inf. Theory 67(5) (2021)
Elbrac̈hter, D., Grohs, P., Jentzen, A., Schwab, C.: DNN expression rate analysis of high-dimensional PDEs: application to option pricing. Constr. Approx. (2021)
Fournié, E., Lasry, J.M., Lebuchoux, J., Lions, P.L., Touzi, N.: Applications of Malliavin calculus to Monte Carlo methods in finance. Finance Stoch. 3(4), 391–412 (1999)
Fujii, M., Takahashi, A., Takahashi, M.: Asymptotic expansion as prior knowledge in deep learning method for high dimensional BSDEs. Asia Pac. Financ. Mark. (2019)
Germain, M., Pham, H., Warin, X.: Approximation error analysis of some deep backward schemes for nonlinear PDEs. SIAM J. Sci. Comput. 44(1) (2022)
Gobet, E., Munos, R.: Sensitivity analysis using Itô–Malliavin calculus and martingales, and application to stochastic optimal control. SIAM J. Control. Optim. 43(5), 1676–1713 (2005)
Gonon, L., Grohs, P., Jentzen, A., Kofler, D., Siska, D.: Uniform error estimates for artificial neural network approximations for heat equations. IMA J. Numer. Anal. (2021)
Grohs, P., Hornung, F., Jentzen, A., Wurstemberger, P.: A proof that artificial neural networks overcome the curse of dimensionality in the numerical approximation of Black–Scholes partial differential equations. Mem. Amer. Math. Soc. (2021)
Huré, C., Pham, H., Warin, X.: Deep backward schemes for high-dimensional nonlinear PDEs. Math. Comput. 89, 1547–1579 (2020)
Hutzenthaler, M., Jentzen, A., Kruse, T., Nguyen, T.A.: A proof that rectified deep neural networks overcome the curse of dimensionality in the numerical approximation of semilinear heat equations. SN Partial Differ. Equ. Appl. 1, 1–34 (2020)
Hutzenthaler, M., Jentzen, A., Kruse, T., Nguyen, T.A., von Wurstemberger, P.: Overcoming the curse of dimensionality in the numerical approximation of semilinear parabolic partial differential equations. Proc. R. Soc. A (2020)
Ikeda, N., Watanabe, S.: Stochastic Differential Equations and Diffusion Processes, 2nd edn. North-Holland, Amsterdam (1989)
Iguchi, Y., Yamada, T.: A second order discretization for degenerate systems of stochastic differential equations. IMA J. Numer. Anal. 41(4), 2782–2829 (2021)
Iguchi, Y., Yamada, T.: Operator splitting around Euler–Maruyama scheme and high order discretization of heat kernels. ESAIM Math. Model. Numer. Anal. 55, 323–367 (2021)
Jentzen, A., Salimova, D., Welti, T.: A proof that deep artificial neural networks overcome the curse of dimensionality in the numerical approximation of Kolmogorov partial differential equations with constant diffusion and nonlinear drift coefficients. Commun. Math. Sci. (2021)
Kingma, D., Ba, J.: Adam: a method for stochastic optimization. In: Proceedings of the International Conference on Learning Representations (ICLR) (2015)
Kloeden, P.E., Platen, E.: Numerical Solution of Stochastic Differential Equations. Springer, Berlin (1992)
Kusuoka, S., Stroock, D.: Applications of the Malliavin calculus Part I. In: Stochastic Analysis (Katata/Kyoto 1982), pp. 271–306 (1984)
Kunitomo, N., Takahashi, A.: The asymptotic expansion approach to the valuation of interest rate contingent claims. Math. Financ. 11, 117–151 (2001)
Kunitomo, N., Takahashi, A.: On validity of the asymptotic expansion approach in contingent claim analysis. Ann. Appl. Probab. 13(3), 914–952 (2003)
Malliavin, P.: Stochastic Analysis. Springer, Berlin (1997)
Malliavin, P., Thalmaier, A.: Stochastic Calculus of Variations in Mathematical Finance. Springer, Berlin (2006)
Naito, R., Yamada, T.: A third-order weak approximation of multidimensional Itô stochastic differential equations. Monte Carlo Methods Appl. 25(2), 97–120 (2019)
Naito, R., Yamada, T.: A higher order weak approximation of McKean–Vlasov type SDEs. BIT Numer. Math. 62, 521–559 (2021)
Nualart, D.: The Malliavin Calculus and Related Topics. Springer, Berlin (2006)
Okano, Y., Yamada, T.: A control variate method for weak approximation of SDEs via discretization of numerical error of asymptotic expansion. Monte Carlo Methods Appl. 25(3) (2019)
Reisinger, C., Zhang, Y.: Rectified deep neural networks overcome the curse of dimensionality for nonsmooth value functions in zero-sum games of nonlinear stiff systems. Anal. Appl. 18(6), 951–999 (2020)
Takahashi, A.: An asymptotic expansion approach to pricing financial contingent claims. Asia-Pac. Financ. Mark. 6(2), 115–151 (1999)
Takahashi, A.: Asymptotic expansion approach in finance. In: Friz, P., Gatheral, J., Gulisashvili, A., Jacquier, A., Teichmann, J. (eds.) Large Deviations and Asymptotic Methods in Finance. Springer Proceedings in Mathematics & Statistics (2015)
Takahashi, A., Tsuchida, Y., Yamada, T.: A new efficient approximation scheme for solving high-dimensional semilinear PDEs: control variate method for Deep BSDE solver. J. Comput. Phys. 454, 110956 (2022)
Takahashi, A., Yamada, T.: An asymptotic expansion with push-down of Malliavin weights. SIAM J. Financ. Math. 3, 95–136 (2012)
Takahashi, A., Yamada, T.: A remark on approximation of the solutions to partial differential equations in finance. Recent Adv. Financ. Eng. 2011, 133–181 (2012)
Takahashi, A., Yamada, T.: On error estimates for asymptotic expansions with Malliavin weights: application to stochastic volatility model. Math. Oper. Res. 40(3), 513–551 (2015)
Takahashi, A., Yamada, T.: A weak approximation with asymptotic expansion and multidimensional Malliavin weights. Ann. Appl. Probab. 26(2), 818–856 (2016)
Takahashi, A., Yoshida, N.: Monte Carlo simulation with asymptotic method. J. Japan Stat. Soc. 35(2), 171–203 (2005)
Watanabe, S.: Analysis of Wiener functionals (Malliavin calculus) and its applications to heat kernels. Ann. Probab. 15, 1–39 (1987)
Yamada, T.: An arbitrary high order weak approximation of SDE and Malliavin Monte Carlo: application to probability distribution functions. SIAM J. Numer. Anal. 57(2), 563–591 (2019)
Yamada, T., Yamamoto, K.: Second order discretization of Bismut–Elworthy–Li formula: application to sensitivity analysis. SIAM/ASA J. Uncertain. Quantif. 7(1), 143–173 (2019)
Yoshida, N.: Asymptotic expansions of maximum likelihood estimators for small diffusions via the theory of Malliavin–Watanabe. Probab. Theory Relat. Fields 92, 275–311 (1992)
Acknowledgements
This work is supported by JST PRESTO (Grant Number JPMJPR2029), Japan. The authors thank Riu Naito for his great help in numerical computation of the proposed method. We also thank two anonymous reviewers for valuable comments and suggestions.
Author information
Authors and Affiliations
Corresponding author
Additional information
The title of the first version of the paper is “Asymptotic expansion and deep neural networks overcome the curse of dimensionality in the numerical approximation of Kolmogorov partial differential equations with nonlinear coefficients”.
This article is part of the section “Computational Approaches” edited by Siddhartha Mishra.
Appendices
Appendix A: Malliavin calculus
In the following, we provide precise estimates of Wiener functionals, which are useful for proving and computing the deep learning-based approximation with our asymptotic expansion.
Lemma 5
Let \(d \in {\mathbb {N}}\), \(F \in ({\mathbb {D}}^{\infty }(\Omega ^d))^d\) be a non-degenerate Wiener functional, \(G \in {\mathbb {D}}^{\infty }(\Omega ^d)\), \(\alpha =(\alpha _1,\ldots ,\alpha _{\ell }) \in \{1,\ldots ,d \}^\ell \) with length \(\ell \in {\mathbb {N}}\). For \(k \in {\mathbb {N}} \cup \{ 0 \}\) and \(p\ge 1\), there exist \(c=c(k,p)>0\), \(q_1=q_1(k,p)>1\), \(q_2=q_2(k,p,d)>1\), \(q_3=q_3(k,p)>1\) and \(r=r(k) \in {\mathbb {N}}\) such that
Proof
For \(i \in \{1,\ldots ,d \}\), we have
for some universal constant \(c_{k,p}>0\). Let \(p_1\) and \(p_2\) be real numbers such that \(p_1^{-1}+p_2^{-1}=p^{-1}\). Hereafter, we use a generic constant \(C>0\) such that \(C=cd^c\) for some \(c>0\) depending on k and p, whose value varies from line to line. Since it holds that
for some \(e \in {\mathbb {N}}\) depending on k, we have
For \(\alpha =(\alpha _1,\ldots ,\alpha _{\ell }) \in \{1,\ldots ,d \}^\ell \), we have
Then, iterating this procedure, we have that for \(k \in {\mathbb {N}} \cup \{ 0 \}\) and \(p\ge 1\), there exist \(q_1,q_2,q_3>1\) and \(r \in {\mathbb {N}}\) such that
\(\square \)
Lemma 6
For \(d \in {\mathbb {N}}\), \(i=1,2\), let \(\{ G_t^{d,x,i} \}_{t \in (0,T],x\in {\mathbb {R}}^d} \subset {\mathbb {D}}^\infty (\Omega ^d)\) satisfy that for \(k \ge 1\) and \(p \in [1,\infty )\), there exist \(c_i,s_i>0\) independent of d such that \(\textstyle {\sup _{x \in [a,b]^d}}\Vert G_t^{d,x,i} \Vert _{k,p} \le c_i d^{c_i} t^{s_i/2}\) for all \(t \in (0,T]\). Then, we have that for \(k \ge 1\) and \(p \in [1,\infty )\), there exists c independent of d such that for all \(t \in (0,T]\), \(\textstyle {\sup _{x \in [a,b]^d}}\Vert \prod _{i=1}^2 G_t^{d,x,i} \Vert _{k,p} \le r d^r t^{(s_1+s_2)/2}\) and \(\textstyle {\sup _{x \in [a,b]^d}}\Vert \sum _{i=1}^2 G_t^{d,x,i} \Vert _{k,p} \le c d^c t^{ \min \{ s_1,s_2\}/2}\).
Proof
We only prove the former case. By Proposition 1.5.6 of Nualart [29], for \(k \ge 1\) and \(p \in [1,\infty )\), \(\Vert \textstyle {\prod _{i=1}^2} G_t^{d,x,i} \Vert _{k,p} \le c_{k,p} \Vert G_t^{d,x,1} \Vert _{k,p_1} \Vert G_t^{d,x,2} \Vert _{k,p_2}\) for some constant \(c_{k,p}>0\) depending only on k and p, where \(p_1,p_2>1\) satisfies \(1/p_1+1/p_2=1/p\). Then, by the assumptions, \(\textstyle {\sup _{x \in [a,b]^d}}\Vert \prod _{i=1}^2 G_t^{d,x,i} \Vert _{k,p} \le r d^r t^{(s_1+s_2)/2}\). \(\square \)
Lemma 7
For \(d \in {\mathbb {N}}\), let \(\{ u_t^{d,x} \}_{t \in (0,T],x\in {\mathbb {R}}^d} \subset {\mathbb {D}}^\infty (\Omega ^d)\) satisfy that for \(t \in (0,T]\), \(x \in {\mathbb {R}}^d\), \(j=1,\ldots ,d\), \(\textstyle {\int _0^t u_s^{d,x} dB_s^{d,j}} \in {\mathbb {D}}^\infty (\Omega ^d)\) and that for \(k \ge 1\) and \(p \in [1,\infty )\), there exist \(q,\nu >0\) independent of d such that \(\textstyle {\sup _{x \in [a,b]^d}}\Vert u_t^{d,x} \Vert _{k,p} \le q d^{q} t^{\nu /2}\) for all \(t \in (0,T]\). Then, for \(k \ge 1\) and \(p \in [1,\infty )\), there exists \(c>0\) independent of d such that for all \(t \in (0,T]\), \(\textstyle {\sup _{x \in [a,b]^d}}\Vert \int _0^t u_s^{d,x} dB_s^{d,0} \Vert _{k,p} \le c d^{c} t^{(\nu +2)/2}\) and for \(j=1,\ldots ,d\), \(\textstyle {\sup _{x \in [a,b]^d}}\Vert \int _0^t u_s^{d,x} dB_s^{d,j} \Vert _{k,p} \le c d^{c} t^{(\nu +1)/2}\).
Proof
We only prove the latter case. Note that for \(r=1,\ldots ,k\), \(D^r \textstyle {\int _0^t u_s^{d,x} dB_s^{d,j}}=D^{r-1}u_\cdot ^{d,x}+\textstyle {\int _0^t D^r u_s^{d,x} dB_s^{d,j}}\). Then, it holds that \(\textstyle { E[\Vert D^r \int _0^t u_s^{d,x} dB_s^{d,j} \Vert ^p_{(H^d)^{\otimes r}}]}{=E[\Vert D^{r-1} u_\cdot ^{d,x} \Vert ^p_{(H^d)^{\otimes r}}]}\) \(+\textstyle { E[\Vert \int _0^t D^r u_s^{d,x} dB_s^{d,j} \Vert ^p_{(H^d)^{\otimes r}}]}\). Here, \(E[\Vert D^{r-1} u_\cdot ^{d,x} \Vert ^p_{(H^d)^{\otimes r}}] \le \eta d^\eta t^{p-1} \textstyle {\int _0^t} E[ \Vert D^{r-1} u_s^{d,x} \Vert ^p_{(H^d)^{\otimes (r-1)}}] ds\) for some \(\eta \) (independent of d) and \(E[\Vert \textstyle {\int _0^t} D^r u_s^{d,x} dB_s^{d,j} \Vert ^p_{(H^d)^{\otimes r}}] \le c_p t^{p/2-1} \textstyle {\int _0^t} E[ \Vert D^{r} u_s^{d,x} \Vert ^p_{(H^d)^{\otimes r}}] ds\) for some \(c_p>0\) (independent of d) by Hölder inequality and Burkholder-Davis-Gundy inequality. By the assumptions, \(\textstyle {\sup _{x \in [a,b]^d}E[\Vert D^{r-1} u_\cdot ^{d,x} \Vert ^p_{(H^d)^{\otimes r}}]} \le \eta d^\eta t^{p-1} \textstyle {\int _0^t} q^pd^{pq} s^{p\nu /2} ds \le cd^c t^{p(\nu /2+1)}\) and \(\textstyle {\sup _{x \in [a,b]^d}} E[\Vert \textstyle {\int _0^t} D^r u_s^{d,x} dB_s^{d,j} \Vert ^p_{(H^d)^{\otimes r}}] \le c_p t^{p/2-1} \textstyle {\int _0^t} q^pd^{pq} s^{p\nu /2} ds \le cd^c t^{p(\nu +1)/2}\). Then, we have \(\textstyle {\sup _{x \in [a,b]^d}}\Vert \int _0^t u_s^{d,x} dB_s^{d,j} \Vert _{k,p} \le c d^{c} t^{(\nu +1)/2}\). \(\square \)
Appendix B: ReLU calculus
Appendix B gives some results on ReLU calculus which are basic in the analysis of our paper. We prepare the following result from Lemma A.7 of [5].
Lemma 8
Let \(n,d,L \in {\mathbb {N}}\) and for \(i =1,\ldots ,n\), let \(d_i \in {\mathbb {N}}\) and \(\phi _i \in {{\mathcal {N}}}\) with \({{\mathcal {L}}}(\phi _i)=L\), \(\textrm{dim}_{\textrm{in}}(\phi _i)=d\) and \(\textrm{dim}_{\textrm{out}}(\phi _i)=d_i\). Then, there exists \(\psi \in {{\mathcal {N}}}\) such that \({{\mathcal {L}}}(\psi )=L\), \({{\mathcal {C}}}(\psi )\le \textstyle {\sum _{i=1}^n} {{\mathcal {C}}}(\phi _i)\), \(\textrm{dim}_{\textrm{in}}(\psi )=d\) and \(\textrm{dim}_{\textrm{out}}(\psi )=\textstyle {\sum _{i=1}^n} d_i\) and
Also, we list Lemma 5.1 in [12] and Lemma 5.3 in [6].
Lemma 9
Let \(L,n,N_0,N_L \in {\mathbb {N}}\), \(\{ a_\ell \}_{\ell =1}^n \subset {\mathbb {R}}\) and \(\{ \phi _\ell \}_{\ell =1}^n \subset {{\mathcal {N}}}\) be DNNs such that \({{\mathcal {L}}}(\phi _\ell )=L\), \(\textrm{dim}_{\textrm{in}}(\phi _\ell )=N_0\) and \(\textrm{dim}_{\textrm{out}}(\phi _\ell )=N_L\) for \(\ell =1,\ldots ,n\). Then, there exists \(\psi \in {{\mathcal {N}}}\) such that \({{\mathcal {L}}}(\psi )=L\), \({{\mathcal {C}}}(\psi )\le n^2 {{\mathcal {C}}}(\phi _1)\) and
Lemma 10
Let \(L_1,L_2,N_0^1,N_0^2,N_{L_1}^1,N_{L_2}^2 \in {\mathbb {N}}\) and \(\phi _1,\phi _2 \in {{\mathcal {N}}}\) be DNNs such that \(\mathcal{L}(\phi _1)=L_1\), \({{\mathcal {L}}}(\phi _2)=L_2\), \(\textrm{dim}_{\textrm{in}}(\phi _1)=N_0^1\), \(\textrm{dim}_{\textrm{out}}(\phi _1)=N_{L_1}^1\), \(\textrm{dim}_{\textrm{in}}(\phi _2)=N_0^2\), \(\textrm{dim}_{\textrm{out}}(\phi _2)=N_{L_2}^2\) and \(N_{L_2}^2=N_0^1\). Then, there exists \(\psi \in {{\mathcal {N}}}\) such that \({{\mathcal {L}}}(\psi )=L_1+L_2\), \({{\mathcal {C}}}(\psi )\le 2(\mathcal{C}(\phi _1)+{{\mathcal {C}}}(\phi _2))\) and
The following result of Theorem 6.3 of [6] is useful.
Lemma 11
Let \(M \in {\mathbb {N}} \cap [2,\infty )\) and \(D \in [1,\infty )\). There exist DNNs \(\{\psi _{\varepsilon }\}_{\varepsilon \in (0,1)} \subset {{\mathcal {N}}}\) and a constant \(c>0\) (independent of M and D) such that for all \(\varepsilon \in (0,1)\), \(\mathcal{C}(\psi _{\varepsilon })\le c M (|\log (\varepsilon )|+M\log (D)+\log (M))\) and
In our analysis, the next result will be applied.
Lemma 12
Let \(a\in {\mathbb {R}}\), \(b\in (a,\infty )\), \(c>0\), \(m \in {\mathbb {N}} \cap [2,\infty )\), \(d,L \in {\mathbb {N}}\) and \(\{ \phi _\ell \}_{\ell =1}^m \subset {{\mathcal {N}}}\) be DNNs such that for \(i \in \{ 1,\ldots ,m \}\), \({{\mathcal {L}}}(\phi _i)=L\), \(\textrm{dim}_{\textrm{in}}(\phi _i)=d\), \(\textrm{dim}_{\textrm{out}}(\phi _i)=1\), \({{\mathcal {C}}}(\phi _i)\le cd^c\) and \(\textstyle {\sup _{x\in [a,b]^d}}|{{\mathcal {R}}}(\phi _i)(x)|\le c d^c\). Then, there exist \(\{ \psi ^{\varepsilon ,d} \}_{\varepsilon \in (0,1),d \in {\mathbb {N}}} \subset {{\mathcal {N}}}\) and \(\kappa >0\) (independent of d) such that for all \(\varepsilon \in (0,1)\) and \(d \in {\mathbb {N}}\), we have \({{\mathcal {C}}}(\psi ^{\varepsilon ,d})\le \kappa \varepsilon ^{-1} d^{\kappa }\) and
Proof
First we use Lemma 11. Let \(\varphi (d):=cd^c\). Then, there exist a set of DNNs \(\{\Psi _{\varphi (d),\varepsilon } \}_{\varepsilon \in (0,1)} \subset {{\mathcal {N}}}\) and a constant \(c'>0\) (independent of m and \(\varphi (d)\)) such that for all \(\varepsilon \in (0,1)\), \({{\mathcal {C}}}(\Psi _{\varphi (d),\varepsilon })\le c' m^2 \varepsilon ^{-1} d^c\) and
for any \(x \in [a,b]^d\). By Lemma 8, there exists \(\Phi \in {{\mathcal {N}}}\) such that \({{\mathcal {C}}}(\Phi ) \le m cd^c\) and
By Lemma 10, there exist \(\{ \psi ^{\varepsilon ,d} \}_{\varepsilon \in (0,1), d \in {\mathbb {N}}} \subset {{\mathcal {N}}}\) and \(\kappa >0\) such that for all \(\varepsilon \in (0,1)\) and \(d \in {\mathbb {N}}\), we have \({{\mathcal {C}}}(\psi ^{\varepsilon ,d})\le \kappa \varepsilon ^{-1} d^{\kappa }\),
and
\(\square \)
Appendix C: Sample code
We show the sample Python code used in the numerical computation in Sect. 6.2.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Takahashi, A., Yamada, T. Solving Kolmogorov PDEs without the curse of dimensionality via deep learning and asymptotic expansion with Malliavin calculus. Partial Differ. Equ. Appl. 4, 27 (2023). https://doi.org/10.1007/s42985-023-00240-4
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s42985-023-00240-4