Multilevel Picard iterations for solving smooth semilinear parabolic heat equations

We introduce a new family of numerical algorithms for approximating solutions of general high-dimensional semilinear parabolic partial differential equations at single space-time points. The algorithm is obtained through a delicate combination of the Feynman-Kac and the Bismut-Elworthy-Li formulas, and an approximate decomposition of the Picard fixed-point iteration with multilevel accuracy. The algorithm has been tested on a variety of semilinear partial differential equations that arise in physics and finance, with very satisfactory results. Analytical tools needed for the analysis of such algorithms, including a semilinear Feynman-Kac formula, a new class of semi-norms and their recursive inequalities, are also introduced. They allow us to prove for semilinear heat equations with gradient-independent nonlinearity that the computational complexity of the proposed algorithm is bounded by $O(d\,\varepsilon^{-(4+\delta)})$ for any $\delta \in (0,\infty)$ under suitable assumptions, where $d\in \mathbb{N}$ is the dimensionality of the problem and $\varepsilon\in(0,\infty)$ is the prescribed accuracy.


Introduction and main results
High-dimensional partial differential equations (PDEs) arise naturally in many important areas including quantum mechanics, statistical physics, financial engineering, economics, etc. Yet developing efficient and practical algorithms for these high-dimensional PDEs has been a long-standing problem and indeed one of the most challenging tasks in mathematics. The difficulty lies in the "curse of dimensionality" [4], i.e., the complexity of the problem goes up exponentially as a function of dimension, which is a well-known obstacle that is also at the heart of many other important subjects such as high-dimensional statistics and the modeling of many-body systems.
For linear parabolic PDEs, the Feynman-Kac formula establishes an explicit representation of the solution of the PDE as the expectation of the solution of an appropriate stochastic differential equation (SDE). Monte Carlo methods together with suitable discretizations of the SDE (see, e.g., [34,33,30,29]) then allow to approximate the solution at any single point in space-time with a computational complexity that grows as O(dε −(2+δ) ) for any δ > 0 where d is the dimensionality of the problem and ε is the accuracy required (cf., e.g., [22,19,24,25]).
In the seminal papers [36,37,35], Pardoux & Peng established a generalized nonlinear Feynman-Kac formula that gives an explicit representation of the solutions of a semilinear parabolic PDE through the solution of an appropriate backward stochastic differential equation (BSDE). Solving the BSDEs numerically, however, requires in general suitable discretizations of nested conditional expectations (see, e.g., [7,42]) and the straightforward Monte Carlo method applied to these nested conditional expectations results in an algorithm with a computational complexity that grows polynomially in d but exponentially in ε −1 . Other discretization methods for the nested conditional expectations proposed in the literature include the quantization tree method (see [3]), the regression method based on Malliavin calculus or based on kernel estimation (see [7]), the projection on function spaces method (see [21]), the cubature on Wiener space method (see [12]), and the Wiener chaos decomposition method (see [8]). None of these algorithms meets the requirement that the computational complexity grows at most polynomially both in d and ε −1 (see [16,] for a detailed discussion of these approximation methods).
Another probabilistic representation for the solutions of some semilinear parabolic PDEs with polynomial nonlinearity has been established in Skorohod [40] by means of branching diffusion processes. Recently this classical representation has been extended to more general analytic nonlinearities [26,28,27]. This probabilistic representation has been successfully used to obtain a Monte Carlo approximation method for semilinear parabolic PDEs with a computational complexity that grows polynomially both in d and ε −1 . However, not only is this method only applicable to a special class of PDEs, it also requires the terminal/initial condition to be quite small (see [16,Subsection 6.7] for a detailed discussion).
In this paper we propose a new family of numerical algorithms for approximating solutions of general highdimensional semilinear parabolic PDEs (and BSDEs) at single space-time points; see (12) below for the definition of our approximations. For semilinear heat equations with gradient-independent nonlinearities we prove that the computational complexity (see Corollary 3.18 below for the precise meaning hereof) of our proposed algorithm is O(d ε −(4+δ) ) for any δ > 0 under suitable assumptions including the strong smoothness assumption that the constant in (86) below is finite; see Corollary 3.18 below for details. Under the assumptions of Corollary 3.18, to the best of our knowledge, no implementable approximation method was known in the literature to overcome the curse of dimensionality. The analysis of more general coefficient functions and nonlinearities is deferred to future publications. The algorithm, which we will call "multilevel Picard iteration", is a delicate combination of the Feynman-Kac and Bismut-Elworthy-Li formulas, and a decomposition of the Picard iteration with multilevels of accuracy. The efficiency and accuracy of the proposed algorithm has been tested on a variety of semilinear parabolic PDEs that arise in physics and finance. These details are presented in [16]. To get a feeling about the performance of the algorithm: To evaluate u(1, 0) for the solution of with d = 100, ε = 0.01, u(0, x) = (1 + max{|x 1 | 2 , ..., |x 100 | 2 ) −1 requires 10 seconds of runtime on a 2.8 GHz Intel i7 processor with 16 GB RAM. We also introduce the tools needed to analyze these high-dimensional algorithms. Some of these tools are quite non-standard (e.g. the semi-norms (18) and the recursive inequality (54) involving different semi-norms). Using these tools, we are able to establish rigorously the bounds for the computational complexity mentioned above.
2 Multilevel Picard iteration for semilinear parabolic PDEs

A fixed-point equation for semilinear PDEs
Inv be sufficiently regular functions, assume that u(T, x) = g(x) and ∂ t u + f (u, σ * ∇u) + µ, ∇u + 1 2 Trace(σσ * Hess u) = 0, for t ∈ [0, T ), x ∈ Ê d , let (Ω, F , P, (F t ) t∈[0,T ] ) be a stochastic basis (cf., e.g., [ (cf., e.g., [31,Chapter 5], [23], or [2] for existence and uniqueness results for stochastic differential equations of the form (3)). For every s ∈ [0, T ] the processes D s,x , x ∈ Ê d , are in a suitable sense the derivative processes of X s,x , x ∈ Ê d , with respect to x ∈ Ê d . Using the Feynman-Kac formula, we have from (2) for all (s, x) ∈ [0, T ) × Ê d . In (4) the derivative of u appears on the right-hand side and, therefore, (4 Combining (6) with (4) and (5) gives Next we define a sequence of Picard iterations associated to (6), for all k ∈ AE, s ∈ [0, T ), x ∈ Ê d . This sequence of Picard iterations has already been studied in the literature; see, e.g. Thereom 7.3.4 in [41] or [5]. Under suitable assumptions, e.g., Thereom 7.3.4 in [41] ensures that for Next we incorporate a zero expectation term to slightly reduce the variance when approximating the expectation involving g by Monte Carlo approximations. More precisely, for all k ∈ AE, s ∈ [0, T ), x ∈ Ê d it holds that In this telescope expansion, we will apply a fundamental idea of Heinrich [24,25] and Giles [18] (control variates were also used, e.g., in [32,20]) and approximate the continuous quantities (expectation and time integral) by discrete ones (Monte Carlo averages and quadrature formulas respectively) with different degrees of accuracy at different levels of the Picard iteration. Since for large l ∈ AE the difference between u l and u l−1 is small, say ρ −l , it suffices to approximate the expectation and the time integral with lower accuracy, say ρ −(k−l) , at level l ∈ {0, . . . , k − 1} for the k-th approximation. More precisely, we denote by (q k,ρ s ) k∈AE0,ρ∈(0,∞),s∈[0,T ) ⊆ Q T a family of quadrature formulas on C([0, T ], Ê) that we employ to approximate the time integrals T s . . . dt, s ∈ [0, T ], appearing on the right-hand side of (10). We denote by Θ = ∪ n∈AE Ê n a set that allows to index families of independent random variables which we need for the Monte Carlo approximations. We denote by (m k,ρ ) k∈AE0,ρ∈(0,∞) and (m k,ρ ) k∈AE0,ρ∈(0,∞) ⊆ AE families of natural numbers that specify the number of Monte Carlo samples for approximating the expectations involving g and f on the right-hand side of (10). In Section 3.1 we will take m k,ρ = m k,ρ = ρ k for every k ∈ AE 0 , ρ ∈ (0, ∞) and we take q k,ρ as the Gauß-Legendre quadrature rule with ⌊ρ⌋ nodes. Furthermore, for every k ∈ AE 0 , ρ ∈ (0, ∞), θ ∈ Θ, (s, x) ∈ [0, T ] × Ê d we denote by (X θ k,ρ (s, x, t)) t∈[s,T ] and (I θ k,ρ (s, x, t)) t∈(s,T ] the stochastic processes that we employ to approximate the processes (X s,x t ) t∈[s,T ] and 1, [σ(s,x)] * t−s ∫ t s σ(r, X s,x r ) −1 D s,x r * dW r t∈(s,T ] . More specifically, we choose for and (I θ k,ρ (s, x, t)) t∈(s,T ] such that for all t ∈ (s, T ],

The approximation scheme
Observe that the approximation scheme (12) employs Picard fixed-point iteration (cf., e.g., [5]), multilevel/multigrid techniques (see, e.g., [24,25,19,11]), discretizations of the SDE system (3), as well as quadrature approximations for the time integrals. The numerical approximations (12) are full history recursive in the sense that for every (k, ρ) ∈ AE × (0, ∞) the full history U (·) k−1,ρ needs to be computed recursively in order to compute U (·) k,ρ . In this sense the numerical approximations (12) are full history recursive multilevel Picard approximations. Finally we remark that all multilevel Picard approximations on the right-hand side of (12) are independent since all Brownian motions W θ , θ ∈ Θ, are independent. This independence is useful for the mathematical analysis and allows an implementation with a simple recursive structure (cf. Subsection 3.2).

Numerical simulations of high-dimensional semilinear PDEs
We applied the algorithm (12) to approximate the solutions at single space-time points of several semilinear PDEs from physics and financial mathematics such as (i) a PDE arising from the recursive pricing model with default risk due to Duffie, Schroder, & Skiadas [15], (ii) a PDE arising from the valuation of derivative contracts with counterparty credit risk (see, e.g., Burgard & Kjaer [9] and Henry-Labordère [26] for derivations of the PDE), (iii) a PDE arising from pricing models for financial markets with different interest rates for borrowing and lending due to Bergman [6], (iv) a version of the Allen-Cahn equation with a double well potential, and (v) a PDE with an explicit solution whose three-dimensional version has been considered in Chassagneux [10].
We took d = 100. All simulations are performed on a computer with a 2.8 GHz Intel i7 processor and 16 GB RAM. We refer to [16] for the simulation results, Matlab codes and further details concerning the numerical simulations. These results suggest that the proposed algorithm is highly efficient and quite practical for dealing with these high-dimensional PDEs.

Convergence rate for the multilevel Picard iteration
In this section we establish the convergence rate for semilinear heat equations in the case where the nonlinearity is independent of the gradient of the solution and satisfies the Lipschitz-type condition (13) below and when the Gauß-Legendre formula (15) (see, e.g., [14] for more details) is used as the quadrature rule.

Pseudocode
In this subsection a mathematical style pseudocode illustrates that the multilevel Picard approximations (17) can be easily implemented. We assume that the time horizon T ∈ (0, ∞), the dimension d ∈ AE, the terminal condition g : the basis for the number of Monte-Carlo samples M ∈ AE, the number of quadrature nodes Q ∈ AE, increasingly ordered roots c ∈ [−1, 1] Q of the Q-th Legendre polynomial, and the corresponding Legendre quadrature weights w ∈ [0, ∞) Q are global variables. For an implementation in Matlab see [16].

Algorithm 1 Multilevel Picard approximation
⊲ Increments between consecutive quadrature nodes 4: . . , M n }, of independent standard normally distributed random vectors; for l ← 0 to (n − 1) do 8: for k ← 1 to Q do 10: . . , M n−l }, of independent standard normally distributed random vectors; 11: for all s ∈ [0, T ], x ∈ Ê d (see Lemma 3.10 below). Moreover, the approximations admit the following Feynman- for all s ∈ [0, T ], x ∈ Ê d (see Lemma 3.9 below). This, (19) and the Lipschitz-type assumption (13) show that the time discretization error is bounded from above by the error of the (N − 1)-th approximation U 0 N −1,M,Q − u ∞ n+1,Q and the error of the Gauß-Legendre quadrature rule applied to the function [s, Combining this with the established bound for the Monte Carlo error (see (50) below) results in the recursive inequality for the global error (54) that can be handled using a discrete Gronwalltype inequality. The error representation for Gauß-Legendre quadrature rules allows to further simplify the global error under suitable regularity assumptions (see Corollary 3.14 below). In Section 3.
Proof. Note that (15) and the integral transformation theorem with the substitution [s, Observe that the fact that t ≤ s and the fact that ∀ i ∈ {1, . . . , n} : c n i ∈ [−1, 1] ensure that for all i ∈ {1, . . . , n} it holds that T −s 2 c n i + T +s 2 ≥ T −t 2 c n i + T +t 2 . This and the fact that ψ is non-increasing imply for all i ∈ {1, . . . , n} that ψ( T −s 2 c n i + T +s 2 ) ≤ ψ( T −t 2 c n i + T +t 2 ). Combining this with (22), (15), and the fact that T −s Lemma 3.2. Assume the setting in Subsection 3.1 and let Q ∈ AE. Then, for all n ∈ AE 0 , k ∈ AE 0 ∩ [0, 2Q − n) we have Proof. First, note that the fact that the Gauß-Legendre quadrature rule C([0, T ], Ê) ∋ ϕ → t∈[0,T ] q Q,[0,T ] (t)ϕ(t) ∈ Ê integrates polynomials of order less than 2Q exactly implies that for all s ∈ [0, T ], We now prove (24) by induction on n ∈ AE 0 . For the base case n = 0 we note that for all k ∈ AE 0 it holds that This establishes (24) in the base case n = 0. For the induction step AE 0 ∋ n → n + 1 ∈ AE we observe that (25) and the induction hypothesis imply that for all k ∈ AE 0 ∩ [0, 2Q − n − 1) it holds that This finishes the induction step AE 0 ∋ n → n + 1 ∈ AE. Induction hence establishes (24). The proof of Lemma 3.2 is thus completed.

Preliminary results for the semi-norms
We refer to a [0, ∞]-valued function as semi-norm if it is subadditive and absolutely homogeneous. In particular, we do not require semi-norms to have finite values. The proof of the following lemma is clear and therefore omitted.

Lemma 3.3 (Seminorm property).
Assume the setting in Subsection 3.1 and let k ∈ AE 0 . Then the function is a semi-norm in the sense that it is subadditive, nonnegative, and absolutely homogeneous.
The following lemma implies that Monte Carlo averages converge in our semi-norms with rate 1/2.
Lemma 3.4 (Linear combinations of iid random variables). Assume the setting in Subsection 3.1, let k ∈ AE 0 , n, Q ∈ AE, r 1 , . . . , r n ∈ Ê, and let V 1 , . . . , s, x), . . . , V n (s, x) are integrable random variables which are independent and identically distributed and which are independent of W 0 . Then Proof. The definition (18) of the semi-norm and the fact that for all (s, x) ∈ [0, T ] × Ê d it holds that (V i (s, x)) i∈{1,...,n} are independent of W 0 and are independent and identically distributed imply that Lemma 3.5 (Lipschitz property). Assume the setting in Subsection 3.1, let k ∈ AE 0 , Q ∈ AE, and let U, Proof. The definition (18) of the semi-norm and the global Lipschitz property (13) of F imply that Lemma 3.6. Assume the setting in Subsection 3.1, let k ∈ AE 0 , Q ∈ AE, and let U ∈ M(B([0, T ]×Ê d )⊗F , B(Ê)) satisfy for all (s, x) ∈ [0, T ] × Ê d that U (s, x) and W 0 are independent. Then Proof. The definition (18) of the semi-norm, the triangle inequality, independence, Lemma 3.1, and the definition (16) ofq k+1,Q yield that Proof. The definition (18) of the semi-norm, Jensen's inequality, and the hypothesis that |U | ≤ |V | imply that The following lemma specifies the values of our semi-norms of constant functions. It follows directly from the definition (18) of the semi-norms and from Lemma 3.2. Its proof is therefore omitted.

Error analysis for multilevel Picard iteration
and (ii) for all n ∈ AE 0 , θ ∈ Θ, s ∈ [0, T ] it holds that Proof. We prove (i) by induction on n ∈ AE 0 . For the base case n = 0 we note that for all θ ∈ Θ, s ∈ [0, T ], This establishes (i) in the base case n = 0. For the induction step AE 0 ∋ n → n + 1 ∈ AE let n ∈ AE 0 and assume that (i) holds for n = 0, n = 1, . . ., n = n. The induction hypothesis and (17) imply that for all θ ∈ Θ, Combining this with (13) This finishes the induction step AE 0 ∋ n → n + 1 ∈ AE. Induction hence establishes (i). Next we note that (17), the fact that (U θ n,M,Q ) n∈AE0 , θ ∈ Θ, are identically distributed, and a telescope argument yield that for all This establishes (ii). The proof of Lemma 3.9 is thus completed.
and (ii) for all s ∈ [0, T ] it holds that Proof. Note that (13) and (42) imply (i). Next Itô's formula and the PDE (14) ensure that for all s ∈ [0, T ], t ∈ [s, T ] it holds P-a.s. that This and (43) show that for all s This finishes the proof of Lemma 3.10.
Then we have .
Proof. Throughout this proof assume w.l.o.g. that the right-hand side of (48) is finite, assume w.l.o.g. that θ = 0 (the case θ = 0 follows from the case θ = 0), let ε ∈ [0, ∞) be the real number given by , and let (e n ) n∈{0,1,...,N } ⊆ [0, ∞] be the extended real numbers which satisfy for all n ∈ {0, 1, . . . , N } that First, we analyze the Monte Carlo error. Item (i) of Lemma 3.9 shows for all n ∈ AE 0 , (t, z) The triangle inequality, independence, Lemma 3.4, Lemma 3.7, Lemma 3.6, Lemma 3.8, and Lemma 3.5 imply that for all n ∈ AE, k ∈ AE 0 it holds that Next we analyze the time discretization error. Item (ii) of Lemma 3.9 and Item (ii) of Lemma 3.10 ensure that for all n ∈ AE, s ∈ [0, T ], z ∈ Ê d it holds P-a.s. that This, the triangle inequality, Lemma 3.7, Lemma 3.6, Lemma 3.5, and Lemma 3.8 demonstrate for all n ∈ AE, In the next step we combine the established bounds for the Monte Carlo error and the time discretization error to obtain a bound for the global error. More formally, observe that (50) and (52) ensure that for all n ∈ AE, Hence, we obtain that for all j ∈ AE 0 , n ∈ AE, k ∈ AE 0 ∩ [0, 2Q − 1] it holds that This shows for all n ∈ {1, 2, . . . , N } that Combining this with the discrete Gronwall-type inequality in Agarwal [1, Corollary 4.1.2] proves that This completes the proof of Theorem 3.11.
In the proof of the following result, Corollary 3.12, an upper bound for the quadrature error on the right-hand side of (48) is derived under the hypothesis that the solution of the PDE is sufficiently smooth and regular.

This and (58) show that for all
and Fubini's theorem show that for all Equation (62) (with k = 1) together with (60) (with k = 2) implies for every Induction, (60), and (62) prove that for every Equation (14) This and (63) prove that Theorem 3.11 together with (65) implies (59). The proof of Corollary 3.12 is thus completed.
The following result, Corollary 3.13, establishes an upper bound for the L 2 -error between the solution of the PDE and our approximations (17) if the sup-norm of the n-th derivative of the solution of the PDE grows sufficiently slowly as AE ∋ n → ∞.
Then it holds for all M, Q ∈ AE, N ∈ AE ∩ [0, 2Q) that This establishes (67). The proof of Corollary 3.13 is thus completed.
The next result, Corollary 3.14, provides an upper bound for the L 2 -error between the solution of the PDE and our approximations (17) if the parameters N, M, Q ∈ AE satisfy N = M = Q. Corollary 3.14 is a direct consequence of Corollary 3.13.
and let C ∈ [0, ∞] be the extended real number given by Then it holds for all N ∈ AE that Hence, we obtain that for all N ∈ AE ∩ [2, ∞) it holds that This and the fact that RN 1,1,1 ≤ 2d complete the proof of Lemma 3.15.
Then for all N ∈ AE, we have The proof of Lemma 3.16 is analogous to the proof of Lemma 3.15 and therefore omitted. In the proof of Corollary 3.17 below we combine Lemma 3.15 and Lemma 3.16 with Corollary 3.14 to obtain a bound for the computational complexity of our scheme (17) in terms of the space dimension and the prescribed approximation accuracy.
The next result, Corollary 3.17, proves under suitable assumptions that if ε ∈ (0, ∞) is the prescribed approximation accuracy and if d ∈ AE is the dimension of the considered PDE, then for every α ∈ (0, 1 /4] and every δ ∈ (0, ∞) it holds that the computational effort of the approximation method (number of function evaluations of the coefficient functions of the considered PDE and number of used independent scalar standard normal random variables, cf. Section 3.7) is at most O(d ε −( 1 α +δ) ).    . (89)