Solving Kolmogorov PDEs without the curse of dimensionality via deep learning and asymptotic expansion with Malliavin calculus

This paper proposes a new spatial approximation method without the curse of dimensionality for solving high-dimensional partial differential equations (PDEs) by using an asymptotic expansion method with a deep learning-based algorithm. In particular, the mathematical justification on the spatial approximation is provided. Numerical examples for high-dimensional Kolmogorov PDEs show effectiveness of our method.


Introduction
Recently, for solving high-dimensional partial differential equations (PDEs), deep learningbased algorithms have been actively proposed (see [2,3] for instance). Moreover, a number of papers for mathematical justification on the deep learning-based spatial approximations have appeared, where the authors demonstrate that deep neural networks overcome the curse of dimensionality in approximations of high-dimensional PDEs. For the related literature, see [4-6, 11, 19] for example. In particular, these works treat some specific forms of PDEs such as high-dimensional heat equations or Kolmogorov PDEs with constant diffusion and nonlinear This article is part of the section "Computational Approaches" edited by Siddhartha Mishra.
The title of the first version of the paper is "Asymptotic expansion and deep neural networks overcome the curse of dimensionality in the numerical approximation of Kolmogorov partial differential equations with nonlinear coefficients". B Toshihiro Yamada toshihiro.yamada@r.hit-u.ac.jp 1 The University of Tokyo, Tokyo, Japan drift coefficient. Also, integral kernels are assumed to have explicit forms for justification of the spatial approximations for solutions to high-dimensional PDEs.
However, most high-dimensional PDEs may not have explicit integral forms in practice. In other words, integral forms of solutions themselves should be approximated by a certain method.
In the current paper, we give a new spatial approximation using an asymptotic expansion method with a deep learning-based algorithm for solving high-dimensional PDEs without the curse of dimensionality. More precisely, we follow approaches given in [40] and the literature such as [8,17,18,23,24,26,27,30,32,33,35,38,39,41,43]. Particularly, we provide a uniform error estimate for the asymptotic expansion for solutions of Kolmogorov PDEs with nonlinear coefficients, motivated by the works of [2,11,31]. For a solution to a d-dimensional Kolmogorov PDE with a small parameter λ, namely u λ : is a d-dimensional diffusion process starting from x, we justify the following spatial approximation on a range [a, b] d : (1.1) ≈ "deep neural network approximation" R(φ)(·), (1.2) by applying an appropriate neural network φ. Here, for t > 0 and x ∈ R d ,X λ,x t is a certain Gaussian random variable and M λ,x t is a stochastic weight for the expansion given based on Malliavin calculus. In order to chose the network φ, the analysis of "product of neural networks" and a dimension analysis of asymptotic expansion with Malliavin calculus are crucial in our approach. We show a precise error estimate for the approximation (1.1) and prove that the complexity of the neural network grows at most polynomially in the dimension d and the reciprocal of the precision ε of the approximation (1.1). Moreover, we give an explicit form of the asymptotic expansion in (1.1) and show numerical examples to demonstrate effectiveness of the proposed scheme for high-dimensional Kolmogorov PDEs.
The organization of the paper is as follows. Section 2 is dedicated to notation, definitions and preliminary results on deep learning and Malliavin calculus. Section 3 provides the main result, namely, the deep learning-based asymptotic expansion for solving Kolmogorov PDEs. The proof is shown in Sect. 4. Section 5 introduces the deep learning implementation. Various numerical examples are shown in Sect. 6. The useful lemmas on Malliavin calculus and ReLU calculus are summarized, and furthermore the sample code is listed in Appendix.

Preliminaries
We first prepare notation. For d ∈ N and for a vector x ∈ R d , we denote by x the Euclidean norm. Also, for k, ∈ N and for a matrix A ∈ R k× , we denote by A the Frobenius norm. For d ∈ N, let I d be the identity matrix. For m, k, ∈ N, let C(R m , R k× ) (resp., C([0, T ] × R m , R k× )) be the set of continuous functions f : R k → R k× (resp., f : [0, T ] × R m → R k× ) and C Li p (R m , R k× ) be the set of Lipschitz continuous functions f : R m → R k× . Also, we define C ∞ b (R m , R ) as the set of smooth functions f : R m → R k× with bounded derivatives of all orders. For a multi-index α, let |α| be the length of α. For a bounded function f : R m → R k× , we define f ∞ = sup x∈R m f (x) . For m, k, ∈ N, for a function f ∈ C Li p (R m , R k× ), we denote by C Li p [ f ] the Lipschitz continuous constant. For d ∈ N and for a smooth function f : R d → R, we define ∂ i f = ∂ ∂ x i f for i = 1, . . . , d, moreover we define ∂ α f = ∂ α 1 · · · ∂ α k f for α = (α 1 , . . . , α k ) ∈ {1, . . . , d} k , k ∈ N. For a, b ∈ R, we may write a ∨ b = max{a, b}.

Deep neural networks
Let us prepare notation and definitions for deep neural networks. Let N be the set of deep neural networks (DNNs): ∈ C(R, R) be an activation function, and for k ∈ N,
)h j , which is regarded as the stochastic process: For F ∈ S ( d ) and j ∈ N, we set D j F as the (H d ) ⊗ j -valued random variable obtained by the j-times iteration of the operator D. For a real separable Hilbert space V , consider S V of V -valued smooth Wiener functionals of the form  [29]). For ) d is nondegenerate, then F has the smooth density p F (·). Malliavin calculus is further refined by Watanabe's theory. Let S(R d ) be the Schwartz space or the space of rapidly decreasing functions and S (R d ) be the dual of S(R d ), i.e. S (R d ) is the space of Schwartz tempered distributions. For a tempered distribution T ∈ S (R d ) and a nondegenerate Wiener functional in the sense of Malliavin F ∈ (D ∞ ( d )) d , T (F) = T • F is well-defined as an element of the space of Watanabe distributions D −∞ ( d ), that is the dual space of D ∞ ( d ) (e.g. see p. 379, Corollary of Ikeda and Watanabe [16], Theorem of Chapter III 6.2 of Malliavin [25], Theorem 7.3 of Malliavin and Thalmaier [26]). Also, for  [26]). In particular, we have

and it holds that
for y ∈ R d , and thus p F is not only smooth but also in S(R d ), i.e. a rapidly decreasing function (see Theorem 9.2 of Ikeda and Watanabe [16]), Proposition 2.1.5 of Nualart [29]). For a nondegenerate F ∈ (

Main result
Let a ∈ R, b ∈ (a, ∞) and T > 0. For d ∈ N, consider the solution to the following stochastic differential equation (SDE) driven by a d-dimensional for j = 1, . . . , d. Furthermore, for a given appropriate continuous function f d : for t ∈ [0, T ] and x ∈ R d , which is a solution of Kolmogorov PDE: for all (t, x) ∈ (0, T ) × R d and u d λ (0, ·) = f d (·), where L d,λ is the following second order differential operator: Our purpose is to show a new spatial approximation scheme of u d λ (t, ·) for t > 0 by using asymptotic expansion and deep neural network approximation. The main theorem (Theorem 1) is stated at the end of this section.

Asymptotic expansion
Remark 1 Assumption 1 justify an asymptotic expansion under the uniformly elliptic condition for the solutions of the perturbed systems of PDEs. Assumption 1.3 is also useful for constructing deep neural network approximations for the family of PDE solutions.
From Assumption 1.2, we may write each SDE (3.1) for d ∈ N as and L d,

Proposition 1 (Asymptotic expansion and the error bound) For m
The weights in the expansion terms in Proposition 1 can be represented by some polynomials of Brownian motion. We show it through distribution theory on Wiener space. Let , which can be obtained by (2.5). For example, we have B

. , d and be a matrix given by
We show an efficient computation of give a polynomial representation of the Malliavin weights in the expansion terms of the asymptotic expansion in Proposition 1. Note that we have where, we iteratively used (2.5), (2.6), (2.7) and (2.8). An explicit polynomial representation of the asymptotic expansion is derived through (3.14). For instance, the first order expansion (m = 1) as follows: (First order asymptotic expansion with Malliavin weight) Thus, the first order expansion is expressed with a Malliavin weight given by third order polynomials of Brownian motion. In general, we have the following representation.

. , n(m) constructed by some products of
..,d} , ≤2m given in Assumption 1 of the form: with some constants c e ∈ (0, ∞), q e ∈ N and some multi-indices for some constant c > 0 independent of d.

Remark 2 (Remark on computation of Malliavin weights) Malliavin weight is initially used
in Fournie et al. [7] in sensitivity analysis in financial mathematics, especially in Monte-Carlo computation of "Greeks". Then a discretization scheme for probabilistic automatic differentiation using Malliavin weights is analyzed in Gobet and Munos [10]. The computation of asymptotic expansion with Malliavin weights is developed in Takahashi and Yamada [35,37], and is further extended to weak approximation of SDEs in Takahashi and Yamada [38]. Note that a PDE expansion is shown in Takahashi and Yamada [36] to partially connect it with the stochastic calculus approach. The computation method of the expansion with Malliavin weights is improved in Yamada [41], Yamada and Yamamoto [42], Naito and Yamada [27,28], Iguchi and Yamada [17,18], and Takahashi et al. [34] where technique of stochastic calculus is refined. The main advantages of the stochastic calculus approach are that (i) it provides efficient computation scheme using Watanabe distributions on Wiener space as in (3.13) and (3.14), (ii) it enables us to give precise bounds for approximations of expectations or the corresponding solutions of PDEs. Actually, the computational effort of the expansions is much reduced in the sense that Itô's iterated integrals are transformed into simple polynomials of Brownian motion, and also the desired deep neural network approximation will be obtained in the next subsection through the approach.

Deep neural network approximation
In order to construct a deep neural network approximation for the function with respect to the space variable of the asymptotic expansion, i.e. x , we consider the further assumptions.
Assumption 2 (Assumptions for deep neural network approximation) Suppose that Assumption 1 holds. There exist a constant κ > 0 and sets of networks {ψ Remark 3 Assumption 2 provides the deep neural network approximation of the asymptotic expansion with an appropriate complexity. Note that Assumption 1.1, 1.3, 2.2 and 2.4 give that there exists η > 0 such that In the following, Assumption 2.2, 2.3 and 2.4 plays an important role for the analysis of "product of neural networks" in the construction of the approximation with asymptotic expansion.

Remark 4 In particular, Assumption 2.3 is satisfied for the cases
in Section 5.1 and Section 5.2 of [9] and Section 5.2 of [13] are examples of (3.1) (or (3.6)). For those cases, the neural network approximations in Assumption 2 are not necessary, since Furthermore, in such cases (e.g. the high-dimensional heat equations) the asymptotic expansion will be simply obtained (usually as the Gaussian approximation), which are exactly reduced to the methods in Beck et al. [2] and Gonon et al. [11].
The main result of the paper is summarized as follows.
Furthermore, for t ∈ (0, T ] and λ ∈ (0, 1], there exist {φ ε,d } ε∈(0,1),d∈N ⊂ N and c > 0 which depend only on a, b, C, m, κ, t and λ, such that for all ε ∈ (0, 1) and d ∈ N, we have We provide the weight M m d,λ (t, x, B d t ) with m = 0, 1 in Theorem 1 for our scheme (the expression for general m will be given in Sect. 4 below). That is, for d ∈ N, λ ∈ (0, 1], t > 0 and x ∈ R d , where Hence, the weight for m = 0, i.e. M 0 d,λ (t, x, B d t ) = 1 provides a simple (but coarse) Gaussian approximation, and the Malliavin weight for m = 1 will be worked as the correction term for the Gaussian approximation. The derivation is provided in the next section.

Proofs of Propositions 1, 2 and Theorem 1
We give the proofs of Propositions 1, 2 and Theorem 1. Before providing full proofs, we show their brief outlines below.
• Proposition 1 (Asymptotic expansion) take a family of uniformly non-degenerate functionals F d,λ,x ) + · · · in D −∞ and take expectation to obtain the expansion of the density for some c > 0 independent of ε and d.
- (2) for an error precision ε, construct a realization of the Monte-Carlo approximation for some c > 0 independent of ε and d, by using stochastic calculus. -(3) for an error precision ε, construct a realization of the deep neural network approx- the cube [a, b] d whose complexity is bounded by C(φ ε,d ) ≤ cε −c d c for some c > 0 independent of ε and d, where ReLU calculus (Lemma 9, 10, 12 in Appendix B) is essentially used. apply (0), (1), (2) and (3) to obtain the main result.
In the proof, we frequently use an elementary result: sup x∈[a,b] d x ≤ d 1/2 max{|a|, |b|}, which is obtained in the proof of Corollary 4.2 of [11].

Proof of Proposition 1
} λ is a family of uniformly non-degenerate Wiener functionals (see Theorem 3.4 of [40]). Then, ) is well-defined as an element of D −∞ ( d ) and has an expansion: By the integration by parts (2.7) and Theorem 2.6 of [35] yield that for j = 1, . . . , d and i ∈ N, it holds that Again by the integration by parts (2.7), ∂ m+1

in (4.3) is given by a linear combination of the expectations of the form
. . , d} k and β 1 , . . . , β k ≥ 1 such that k =1 β = m + 1. By the inequality of Lemma 5 with k = 0 in Appendix A, we have for all p ≥ 1 and multi-index γ , there are c > 0, p 1 , p 2 , p 3 > 1 and r ∈ N satisfying for all G ∈ D ∞ , t ∈ (0, T ], λ ∈ (0, 1] and x ∈ [a, b] d . In order to show the upper bound of the weight appearing in the residual term of the expansion, we list the following results: 1. Note that for d ∈ N, t ∈ (0, T ], x ∈ R d and λ ∈ (0, 1], we have Since the above is a linear SDE, it has the explicit form and we have for some c > 0 independent of t and d, due to the result: which is obtained by using Lemmas 6 and 7 in Appendix A. Then, the process for some c > 0 independent of t and d.

Deep learning implementation
We briefly provide the implementation scheme for the approximation in Theorem 1. Let ξ be a uniformly distributed random variable, i.e. ξ ∈ U ([a, b] d ), and define X ξ For t > 0, the m-th order asymptotic expansion of Theorem 1 can be represented by which is obtained by Theorem 1 of this paper combining with Proposition 2.2 of Beck et al. [2]. We construct a deep neural network u N N ,θ * (t, ·) to approximate the function u m (t, ·) given by for a depth L ∈ N and N 0 , N 1 , . . . , N L ∈ N, given by . . (5.3) and the optimized parameter θ * obtained by the following minimization problem:  In the implementation of the deep neural network approximation, we use stochastic gradient descent method and the Adam optimizer [20] as in Sects. 3 and 4 of Beck et al. [2]. In Appendix C, we list the sample code of the scheme for a high-dimensional PDE with a nonlinear coefficient in Sect. 6.2 (which includes linear coefficient case).

Numerical examples
In the section, we perform numerical experiments in order to demonstrate the accuracy of our scheme. We compare the deep learning method of Beck et al. [2] where the Euler-Maruyama scheme is used with the stochastic gradient descent method with the Adam optimizer. All experiments are performed in Google Colaboratory using Tensorflow.

Uncorrelated case
First, we examine our scheme for a high-dimensional Black-Scholes model (geometric Brownian motion) whose corresponding PDE is given by  x 0 ) is computed by the Itô formula with Monte-Carlo method with 10 7 -paths. The same experiment is applied to the method of Beck et al. [2]. Table 1 provides the numerical results (the relative errors and the runtimes) for AE m = 1 and the method in Beck et al. [2] with the Euler-Maruyama discretization n = 16, 32 (

Correlated case
We next provide a numerical example for a Black-Scholes model with correlated noise in high-dimension. Let us consider the following PDE:

High-dimensional CEV model (nonlinear volatility case)
We consider a Kolmogorov PDE with nonlinear diffusion coefficients whose corresponding stochastic process is called the CEV model:

4)
Partial Differential Equations and Applications (2023) 4 :27 is computed by Monte-Carlo method with the Euler-Maruyama scheme with time-steps 2 10 and 10 7 -paths. The same experiment is applied to the method of Beck et al. [2]. Table 3

High-dimensional Heston model
We finally show an example for a small time asymptotic expansion for a high-dimensional Heston model: where f 2d (x) = max{max{x 1 − K }, . . . , max{x 2d−1 − K }} and L 2d,λ is a generator given by .
is computed by Monte-Carlo method with the Euler-Maruyama scheme with time-steps 2 10 and 10 7 -paths. The same experiment is applied to the method of Beck et al. [2]. Table 4 provides the numerical results (the relative errors and the runtimes) for AE m = 1 and the method in Beck et al. [2] with the Euler-Maruyama discretization n = 16, 32 (

Conclusion
In the paper, we introduced a new spatial approximation for solving high-dimensional PDEs without the curse of dimensionality, where an asymptotic expansion method with a deep learning-based algorithm is effectively applied. The mathematical justification for the spatial approximation was provided using Malliavin calculus and ReLU calculus. We checked the effectiveness of our method through numerical examples for high-dimensional Kolmogorov PDEs.
More accurate deep learning-based implementations based on the method of the paper should be studied as a next research topic. We believe that higher order asymptotic expansion or higher order weak approximation (discretization) will give robust computation schemes without the curse of dimensionality, which should be proved mathematically in the future work. Also, applying our method to nonlinear problems as in [14,15] will be a challenging and important task.

Data Availability Statement
The manuscript has no associated real data.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. Also, we list Lemma 5.1 in [12] and Lemma 5.3 in [6].