A proof that rectified deep neural networks overcome the curse of dimensionality in the numerical approximation of semilinear heat equations

Deep neural networks and other deep learning methods have very successfully been applied to the numerical approximation of high-dimensional nonlinear parabolic partial differential equations (PDEs), which are widely used in finance, engineering, and natural sciences. In particular, simulations indicate that algorithms based on deep learning overcome the curse of dimensionality in the numerical approximation of solutions of semilinear PDEs. For certain linear PDEs this has also been proved mathematically. The key contribution of this article is to rigorously prove this for the first time for a class of nonlinear PDEs. More precisely, we prove in the case of semilinear heat equations with gradient-independent nonlinearities that the numbers of parameters of the employed deep neural networks grow at most polynomially in both the PDE dimension and the reciprocal of the prescribed approximation accuracy. Our proof relies on recently introduced multilevel Picard approximations of semilinear PDEs.


Introduction
Deep neural networks (DNNs) have revolutionized a number of computational problems; see, e.g., the references in Grohs et al. [11]. In 2017 deep learning-based approximation algorithms for certain parabolic partial differential equations (PDEs) have been proposed in Han et al. [5,12] and based on these works there is now a series of deep learning-based numerical approximation algorithms for a large class of different kinds of PDEs in the scientific literature; see, e.g., [1,2,3,8,9,10,11,13,17,18,19,22,23]. There is empirical evidence that deep learningbased methods work exceptionally well for approximating solutions of high-dimensional PDEs and that these do not suffer from the curse of dimensionality; see, e.g., the simulations in [5,12,2,1]. There exist, however, only few theoretical results which prove that DNN approximations of solutions of PDEs do not suffer from the curse of dimensionality: The recent articles [11,4,16,9] prove rigorously that DNN approximations overcome the curse of dimensionality in the numerical approximation of solutions of certain linear PDEs.
The main result of this article, Theorem 4.1 below, proves for semilinear heat equations with gradient-independent nonlinearities that the number of parameters of the approximating DNN grows at most polynomially in both the PDE dimension d ∈ AE and the reciprocal of the prescribed accuracy ε > 0. Thereby we establish for the first time that there exist DNN approximations of solutions of such PDEs which indeed overcome the curse of dimensionality. To illustrate Theorem 4.1, we formulate the following special case of Theorem 4.1 using the above notation on DNNs and the notation from Subsection 1.1.
Then there exist (Ψ d,ε ) d∈AE,ε∈(0,1] ⊆ N , η ∈ (0, ∞), C : (0, 1] → (0, ∞) such that for all d ∈ AE, ε ∈ (0, 1], γ ∈ (0, 1] it holds that Theorem 1.1 is an in immediate consequence of Theorem 4.1 below. In the manner of the proof of Theorem 3.14 in [11] and the proof of Theorem 6.3 in [16], the proof of Theorem 4.1 below uses probabilistic arguments on a suitable artificial probability space. Moreover, the proof of Theorem 4.1 relies on recently introduced (full history) multilevel Picard approximations which have been proved to overcome the curse of dimensionality in the numerical approximation of solutions of semilinear heat equations at single space-time points; see [6,7,15,14]. A key step in our proof is that realizations of these random approximations can be represented by DNNs; see Lemma 3.10 below.
The remainder of this article is organized as follows. In Section 2 we provide auxiliary results on multilevel Picard approximations ensuring that these approximations are stable against perturbations in the nonlinearity f and the terminal condition g of the PDE (2). In Section 3 we show that multilevel Picard approximations can be represented by DNNs and we provide bounds for the number of parameters of the representing DNN. We use the results of Section 2 and Section 3 to prove the main result Theorem 4.1 in Section 4.

A stability result for multilevel Picard approximations
and and let Θ = n∈AE n , let u θ : Ω → [0, 1], θ ∈ Θ, be independent random variables which are The integral transformation theorem, (10), and the triangle inequality show for all t ∈ [0, T ] that Next, Jensen's inequality, Fubini's theorem, (13), the fact that W has independent and stationary increments, and (6) demonstrate that for all t ∈ [0, T ] it holds that Furthermore, Jensen's inequality, Fubini's theorem, (13), the fact that W has independent and stationary increments, the triangle inequality, (5), and (6) demonstrate for all t ∈ [0, T ] that Combining this with (14) and (15) implies that for all t ∈ [0, T ] it holds that Next, [14,Corollary 3.11] shows that This, the triangle inequality, and the fact that This, Gronwall's integral inequality, and (17) The proof of Lemma 2.2 is thus completed.
Proof of Lemma 2.3. First, (10), the triangle inequality, and the fact that W has stationary increments show for all s ∈ [0, T ], z ∈ Ê d that This, Fubini's theorem, the fact that W has independent increments, and the Lipschitz condi- This, Gronwall's lemma, and Lemma 2.2 yield for all x ∈ Ê d that Furthermore, (7), the triangle inequality, and Lemma 2.2 imply for all This, (24), and the triangle inequality yield that This, [14, Theorem 3.5] (applied with ξ = x, F = F 2 , g = g 2 , and u = u 2 in the notation of [14, Theorem 3.5]), (6), and the triangle inequality ensure that Furthermore, Lemma 2.3 shows that This, the triangle inequality, (28), the fact that B ≤ B q + 1, the assumption that q ≥ 2, and Jensen's inequality show that The proof of Corollary 2.4 is thus completed.

Deep neural networks representing multilevel Picard approximations
The main result of this section, Lemma 3.10 below, shows that multilevel Picard aproximations can be well represented by DNNs. The central tools for the proof of Lemma 3.10 are Lemmas 3.8 and 3.9 which show that DNNs are stable under compositions and summations. We formulate Lemmas 3.8 and 3.9 in terms of the operators defined in (39) and (40) below, whose properties are studied in Lemmas 3.3, 3.4, and 3.5.

Results on deep neural networks
let N and D be the sets which satisfy that let let ⊙ : D × D → D be the binary operation with the property that for all H 1 , H 2 ∈ AE,

Proof of Lemma 3.3. Throughout this proof let
The definition of ⊙ in (37) then shows that The proof of Lemma 3.3 is thus completed. and The proof of Lemma 3.4 is thus completed.
This completes the proof of Lemma 3.5.
Proof of Lemma 3.10. We prove Lemma 3.10 by induction on n ∈ AE 0 . For the base case n = 0 note that the fact that ∀t ∈ [0, T ], θ ∈ Θ : U θ 0,M (t, ·) = 0, the fact that the function 0 can be represented by a network with depth dim (L (Φ g )), and (78) imply that there exists This proves the base case n = 0.