Deep learning approximations for non-local nonlinear PDEs with Neumann boundary conditions

Nonlinear partial differential equations (PDEs) are used to model dynamical processes in a large number of scientific fields, ranging from finance to biology. In many applications standard local models are not sufficient to accurately account for certain non-local phenomena such as, e.g., interactions at a distance. In order to properly capture these phenomena non-local nonlinear PDE models are frequently employed in the literature. In this article we propose two numerical methods based on machine learning and on Picard iterations, respectively, to approximately solve non-local nonlinear PDEs. The proposed machine learning-based method is an extended variant of a deep learning-based splitting-up type approximation method previously introduced in the literature and utilizes neural networks to provide approximate solutions on a subset of the spatial domain of the solution. The Picard iterations-based method is an extended variant of the so-called full history recursive multilevel Picard approximation scheme previously introduced in the literature and provides an approximate solution for a single point of the domain. Both methods are mesh-free and allow non-local nonlinear PDEs with Neumann boundary conditions to be solved in high dimensions. In the two methods, the numerical difficulties arising due to the dimensionality of the PDEs are avoided by (i) using the correspondence between the expected trajectory of reflected stochastic processes and the solution of PDEs (given by the Feynman-Kac formula) and by (ii) using a plain vanilla Monte Carlo integration to handle the non-local term. We evaluate the performance of the two methods on five different PDEs arising in physics and biology. In all cases, the methods yield good results in up to 10 dimensions with short run times. Our work extends recently developed methods to overcome the curse of dimensionality in solving PDEs.


Abstract
Nonlinear partial differential equations (PDEs) are used to model dynamical processes in a large number of scientific fields, ranging from finance to biology. In many applications standard local models are not sufficient to accurately account for certain non-local phenomena such as, e.g., interactions at a distance. In order to properly capture these phenomena non-local nonlinear PDE models are frequently employed in the literature. In this article we propose two numerical methods based on machine learning and on Picard iterations, respectively, to approximately solve non-local nonlinear PDEs. The proposed machine learning-based method is an extended variant of a deep learning-based splitting-up type approximation method previously introduced in the literature and utilizes neural networks to provide approximate solutions on a subset of the spatial domain of the solution. The Picard iterations-based method is an extended variant of the so-called full history recursive multilevel Picard approximation scheme previously introduced in the literature and provides an approximate solution for a single point of the domain. Both methods are mesh-free and allow non-local nonlinear PDEs with Neumann boundary conditions to be solved in high dimensions. In the two methods, the numerical difficulties arising due to the dimensionality of the PDEs are avoided by (i) using the correspondence between the expected trajectory of reflected stochastic processes and the solution of PDEs (given by the Feynman-Kac formula) and by (ii) using a plain vanilla Monte Carlo integration to handle the nonlocal term. We evaluate the performance of the two methods on five different PDEs arising in physics and biology. In all cases, the methods yield good results in up to 10 dimensions with short run times. Our work extends recently developed methods to overcome the curse of dimensionality in solving PDEs.

Introduction
In this article, we derive numerical schemes to approximately solve high-dimensional nonlocal nonlinear partial differential equations (PDEs) with Neumann boundary conditions. Such PDEs have been used to describe a variety of processes in physics, engineering, finance, and biology, but can generally not be solved analytically, requiring numerical methods to provide approximate solutions. However, traditional numerical methods are for the most part computationally infeasible for high-dimensional problems, calling for the development of novel approximation methods.
The need for solving non-local nonlinear PDEs has been expressed in various fields as they provide a more general description of the dynamical systems than their local counterparts [63,29,90]. In physics and engineering, non-local nonlinear PDEs are found, e.g., in models of Ohmic heating production [66], in the investigation of the fully turbulent behavior of real flows [20], in phase field models allowing non-local interactions [7,41,25,49], or in phase transition models with conservation of mass [86,88]; see [63] for further references. In finance, non-local PDEs are used, e.g., in jump-diffusion models for the pricing of derivatives where the dynamics of stock prices are described by stochastic processes experiencing large jumps [74,22,65,1,15,90,28,26]. Penalty methods for pricing American put options such as in Kou's jump-diffusion model [58,42], considering large investors where the agent policy affects the assets prices [5,1], or considering default risks [83,55] can further introduce nonlinear terms in non-local PDEs. In economics, non-local nonlinear PDEs appear, e.g., in evolutionary game theory with the so-called replicator-mutator equation capturing continuous strategy spaces [79,62,50,3,4] or in growth models where consumption is nonlocal [6]. In biology, non-local nonlinear PDEs are used, e.g., to model processes determining the interaction and evolution of organisms. Examples include models of morphogenesis and cancer evolution [71,24,91], models of gene regulatory networks [80], population genetics models with the non-local Fisher-Kolmogorov-Petrovsky-Piskunov (Fisher-KPP) equations [38,51,18,82,17,57,92], and quantitative genetics models where populations are structured on a phenotypic and/or a geographical space [19,43,16,78,77,85,30,76]. In such models, Neumann boundary conditions are used, e.g., to model the effect of the borders of the geographical domain on the movement of the organisms.
Real world systems such as those just mentioned may be of considerable complexity and accurately capturing the dynamics of these systems may require models of high dimensionality [30], leading to complications in obtaining numerical approximations. For example, the number of dimensions of the PDEs may correspond in finance to the number of financial assets (such as stocks, commodities, exchange rates, and interest rates) in the involved portfolio; in evolutionary dynamics, to the dimension of the strategy space; and in biology, to the number of genes modelled [80] or to the dimension of the geographical or the phenotypic space over which the organisms are structured. Standard approximation methods for PDEs such as finite difference approximation methods, finite element methods, spectral Galerkin approximation methods, and sparse grid approximation methods all suffer from the so called curse of dimensionality [14], meaning that their computational costs increase exponentially in the number of dimensions of the PDE under consideration.
Numerical methods exploiting stochastic representations of the solutions of PDEs can in some cases overcome the curse of dimensionality. Specifically, simple Monte Carlo averages of the associated stochastic processes have been proposed a long time ago to solve highdimensional linear PDEs, such as, e.g., Black-Scholes and Kolmogorov PDEs [75,8]. Recently, two novel classes of methods have proved successful in dealing with high-dimensional nonlinear PDEs, namely deep learning-based and full history recursive multilevel Picard approximation methods (in the following we will abbreviate full history recursive multilevel Picard by MLP). The explosive success of deep learning in recent years across a wide range of applications [69] has inspired a variety of neural network-based approximation methods for high-dimensional PDEs; see [12] for an overview. One class of such methods is based on reformulating the PDE as a stochastic learning problem through the Feynman-Kac formula [32,52,10]. In particular, the deep splitting scheme introduced in [9] relies on splitting the differential operator into a linear and a nonlinear part and in that sense belongs to the class of splitting-up methods [27,48,56]. The PDE approximation problem is then decomposed along the time axis into a sequence of separate learning problems. The deep splitting approximation scheme has proved capable of computing reasonable approximations to the solutions of nonlinear PDEs in up to 10000 dimensions. On the other hand, the MLP approximation method, introduced in [34,60,36], utilizes the Feynman-Kac formula to reformulate the PDE problem as a fixed point equation. It further reduces the complexity of the numerical approximation of the time integral through a multilevel Monte Carlo approach. However, neither the deep splitting nor the MLP method can, until now, account for non-localness and Neumann boundary conditions. The goal of this article is to overcome these limitations and thus we generalize the deep splitting method and the MLP approximation method to approximately solve non-local nonlinear PDEs with Neumann boundary conditions. We handle the non-local term by a plain vanilla Monte Carlo integration and address Neumann boundary conditions by constructing reflected stochastic processes. While the MLP method can, in one run, only provide an approximate solution at a single point x ∈ D of the spatial domain D ⊆ R d where d ∈ N = {1, 2, . . . }, the machine learning-based method can in principle provide an approximate solution on a full subset of the spatial domain D (however, cf., e.g., [53,54,46] for results on limitations on the performance of such approximation schemes). We use both methods to solve five non-local nonlinear PDEs arising in models from biology and physics and cross-validate the results of the simulations. We manage to solve the non-local nonlinear PDEs with reasonable accuracy in up to 10 dimensions.
For an account of classical numerical methods for solving non-local PDEs, such as finite differences, finite elements, and spectral methods, we refer the reader to the recent survey [29]. Several machine-learning based schemes for solving non-local PDEs can also be found in the literature. In particular, the physics-informed neural network and deep Galerkin approaches [84,87], based on representing an approximation of the whole solution of the PDE as a neural network and using automatic differentiation to do a least-squares minimization of the residual of the PDE, have been extended to fractional PDEs and other non-local PDEs [81,72,47,2,94]. While some of these approaches use classical methods susceptible to the curse of dimensionality for the non-local part [81,72], mesh-free methods suitable for high-dimensional problems have also been investigated [47,2,94].
The literature also contains approaches that are more closely related to the machine learning-based algorithm presented here. Frey & Köck [39,40] propose an approximation method for non-local semilinear parabolic PDEs with Dirichlet boundary conditions based on and extending the deep splitting method in [9] and carry out numerical simulations for example PDEs in up to 4 dimensions. Castro [21] proposes a numerical scheme for approximately solving non-local nonlinear PDEs based on [59] and proves convergence results for this scheme. Finally, Gonon & Schwab [45] provide theoretical results showing that neural networks with ReLU activation functions have sufficient expressive power to approximate solutions of certain high-dimensional non-local linear PDEs without the curse of dimensionality.
There is a more extensive literature on machine learning-based methods for approximately solving standard PDEs without non-local terms but with various boundary conditions, going back to early works by Lagaris et al. [68,67] (see also [73]), which employed a grid-based method based on least-squares minimization of the residual and shallow neural networks to solve low-dimensional ODEs and PDEs with Dirichlet, Neumann, and mixed boundary conditions. More recently, approximation methods for PDEs with Neumann (and other) boundary conditions have been proposed using, e.g., physics-informed neural networks [72,89,93], the deep Ritz method (based on a variational formulation of certain elliptic PDEs) [37,70,23], or adversarial networks [95].
The remainder of this article is organized as follows. Section 2 discusses a special case of the proposed machine learning-based method, in order to provide a readily comprehensible exposition of the key ideas of the method. Section 3 discusses the general case, which is flexible enough to cover a larger class of PDEs and to allow more sophisticated optimization methods. Section 4 presents our extension of the MLP approximation method to non-local nonlinear PDEs, which we use to obtain reference solutions in Section 5. Section 5 provides numerical simulations for five concrete examples of (non-local) nonlinear PDEs. Section 6 provides the source codes used for the computations in Section 5.

Machine learning-based approximation method in a special case
In this section, we present in Framework 2.2 in Subsection 2.3 below a simplified version of our general machine learning-based algorithm for approximating solutions of non-local nonlinear PDEs with Neumann boundary conditions proposed in Section 3 below. This simplified version applies to a smaller class of non-local heat PDEs, specified in Subsection 2.1 below. In Subsection 2.2 we introduce some notation related to the reflection of straight lines on the boundaries of a suitable subset D ⊆ R d where d ∈ N, which will be used to describe time-discrete reflected stochastic processes that are employed in our approximations throughout the rest of the article. The simplified algorithm described in Subsection 2.3 below is limited to using neural networks of a particular architecture that are trained using plain vanilla stochastic gradient descent, whereas the full version proposed in Framework 3.1 in Subsection 3.2 below is formulated in such a way that it encompasses a wide array of neural network architectures and more sophisticated training methods, in particular Adam optimization, minibatches, and batch normalization. Stripping away some of these more intricate aspects of the full algorithm is intended to exhibit more acutely the central ideas in the proposed approximation method. The simplified algorithm described in this section as well as the more general version proposed in Framework 3.1 in Subsection 3.2 below are based on the deep splitting method introduced in Beck et al. [9], which combines operator splitting with a previous deep learning-based approximation method for Kolmogorov PDEs [10]; see also Beck et al. [12,Sections 2 and 3] for an exposition of these methods.

Partial differential equations (PDEs) under consideration
Let T ∈ (0, ∞), d ∈ N, let D ⊆ R d be a closed set with sufficiently smooth boundary ∂ D , let n : ∂ D → R d be an outer unit normal vector field associated to D, let g ∈ C(D, R), let ν x : B(D) → [0, 1], x ∈ D, be probability measures, let f : R × R → R be measurable, let u = (u(t, x)) (t,x)∈[0,T ]×D ∈ C 1,2 ([0, T ] × D, R) have at most polynomially growing partial derivatives, assume 1 for every t ∈ (0, T ], x ∈ ∂ D that n(x), (∇ x u)(t, x) = 0, and assume Our goal in this section is to approximately calculate under suitable hypotheses the solution u : [0, T ] × D → R of the PDE in (1).

Reflection principle for the simulation of time discrete reflected processes
Framework 2.1 (Reflection principle for the simulation of time discrete reflected processes). Let d ∈ N, let D ⊆ R d be a closed set with sufficiently smooth boundary ∂ D , let 1 Throughout this article we denote by ·, · : n∈N (R n × R n ) → R and · : n∈N R n → R the functions which satisfy for all n ∈ n : ∂ D → R d be a suitable outer unit normal vector field associated to D, let c : 2.3 Description of the proposed approximation method in a special case Framework 2.2 (Special case of the machine learning-based approximation method).
The goal of the optimization algorithm in Framework 2.2 above is to find a suitable parameter vector θ ∈ R d such that for every n ∈ {1, 2, . . . , N } the neural network R d x → V n (θ, x) ∈ R is a good approximation for the solution R d x → u(t n , x) ∈ R to the PDE in (12) at time t n . This is done by performing successively for each n ∈ {1, 2, . . . , N } a plain vanilla stochastic gradient descent (SGD) optimization on a suitable loss function (cf. (11)).
Observe that for every n ∈ {1, 2, . . . , N } the stochastic process Θ n : N 0 × Ω → R d describes the successive estimates computed by the SGD algorithm for the parameter vector that represents (via V n : R d × R d → R) a suitable approximation to the solution R d x → u(t n , x) ∈ R of the PDE in (12) at time t n . Next note that M ∈ N in Framework 2.2 above denotes the number of gradient descent steps taken for each n ∈ {1, 2, . . . , N } and that γ ∈ (0, ∞) denotes the learning rate employed in the SGD algorithm. Moreover, observe that for every n ∈ {1, 2, . . . , N }, m ∈ {1, 2, . . . , M } the function φ n,m : R d ×Ω → R denotes the loss function employed in the mth gradient descent step during the approximation of the solution of the PDE in (12) at time t n (cf. (10)). The loss functions employ a family of i.i.d. time-discrete stochastic processes Y m : {0, 1, . . . N } × Ω → R d , m ∈ N, which we think of as discretizations of suitable reflected Brownian motions (cf. (6)). In addition, for every n ∈ {1, 2, . . . , N }, m ∈ {1, 2, . . . , M }, x ∈ D the loss function φ n,m : R d × Ω → R employs a family of i.i.d. random variables Z n,m x,k : Ω → D, k ∈ N, which are used for the Monte Carlo approximation of the non-local term in the PDE in (12) whose solution we are trying to approximate. The number of samples used in these Monte Carlo approximations is denoted by K ∈ N in Framework 2.2 above.
Finally, for sufficiently large N, M, K ∈ N and sufficiently small γ ∈ (0, ∞) the algorithm in Framework 2.2 above yields for every n ∈ {1, 2, . . . , N } a (random) parameter vector Θ n M : Ω → R d which represents a function R d × Ω (x, ω) → V n (Θ n M (ω), x) ∈ R that we think of as providing for every x ∈ D a suitable approximation 3 Machine learning-based approximation method in the general case In this section we describe in Framework 3.1 in Subsection 3.2 below the full version of our deep learning-based method for approximating solutions of non-local nonlinear PDEs with Neumann boundary conditions (see Subsection 3.1 for a description of the class of PDEs our approximation method applies to), which generalizes the algorithm introduced in Framework 2.2 in Subsection 2.3 above and which we apply in Section 5 below to several examples of non-local nonlinear PDEs.

PDEs under consideration
Let T ∈ (0, ∞), d ∈ N, let D ⊆ R d be a closed set with sufficiently smooth boundary ∂ D , let n : ∂ D → R d be an outer unit normal vector field associated to D, let g : have at most polynomially growing partial derivatives, assume for every Our goal is to approximately calculate under suitable hypotheses the solution u : [0, T ]× D → R of the PDE in (15).

Description of the proposed approximation method in the general case
Framework 3.1 (General case of the machine learning-based approximation method).
: Ω → D, k, n, m, j ∈ N, be i.i.d. random variables which satisfy for every A ∈ B(D) that P(Z 1,1,1 x . . , N }, be functions, for every n ∈ {1, 2, . . . , N }, m ∈ N let ψ n m : R → R d and Ψ n m : R × R d → R be functions, and for every n ∈ {1, 2, . . . , N } let S n : N 0 × Ω → R ς and Ξ n : N 0 × Ω → R be stochastic processes which satisfy for every m ∈ N that In the setting of Framework 3.1 above we think under suitable hypotheses for sufficiently is a function with at most polynomially growing derivatives which satisfies for every t ∈ (0, T ], x ∈ ∂ D that (cf. (15)). Compared to the simplified algorithm in Framework 2.2 above, the major new elements introduced in Framework 3.1 are the following: (a) The numbers of gradient descent steps taken to compute approximations for the solution of the PDE at the times t n , n ∈ {1, 2, . . . , N }, are allowed to vary with n, and so are specified by a sequence (M n ) n∈N 0 ⊆ N in Framework 3.1 above.
(b) The numbers of samples used for the Monte Carlo approximation of the non-local term in the approximation for the solution of the PDE at the times t n , n ∈ {1, 2, . . . , N }, are allowed to vary with n, and so are specified by a sequence (K n ) n∈N 0 ⊆ N in Framework 3.1 above.
(c) The approximating functions V j,s n , (j, s, n) ∈ N×R ς ×{0, 1, . . . , N }, in Framework 3.1 above are not specified concretely in order to allow for a variety of neural network architectures. For the concrete choice of these functions employed in our numerical simulations, we refer the reader to Section 5.  (17)). We refer the reader to (44)  (f) For every n ∈ {1, 2, . . . , N }, m ∈ N the optimization step in (21) in Framework 3.1 above is specified generically in terms of the functions ψ n m : R → R d and Ψ n m : R × R d → R and the random variable Ξ n m : Ω → R . This generic formulation covers a variety of SGD based optimization algorithms such as Adagrad [31], RMSprop, or Adam [64]. For example, in order to implement the Adam optimization algorithm, for every n ∈ {1, 2, . . . , N }, m ∈ N the random variable Ξ n m can be used to hold suitable first and second moment estimates (see (42) and (43) in Section 5 below for the concrete specification of these functions implementing the Adam optimization algorithm).
(g) The processes S n : N 0 × Ω → R ς , n ∈ {1, 2, . . . , N }, and functions S n : . . , N }, in Framework 3.1 above can be used to implement batch normalization; see [61] for details. Loosely speaking, for every n ∈ {1, 2, . . . , N }, m ∈ N the random variable S n m : Ω → R ς then holds mean and variance estimates of the outputs of each layer of the approximating neural networks related to the minibatches that are used as inputs to the neural networks in computing the loss function at the corresponding gradient descent step.

Multilevel Picard approximation method for non-local PDEs
In this section we introduce in Framework 4.1 in Subsection 4.1 below our extension of the full history recursive multilevel Picard approximation method for approximating solutions of non-local nonlinear PDEs with Neumann boundary conditions. The MLP method was first introduced in E et al. [36] and Hutzenthaler et al. [60] and later extended in a number of directions; see E et al. [33] and Beck et al. [12] for recent surveys. We also refer the reader to Becker et al. [13] and E et al. [35] for numerical simulations illustrating the performance of MLP methods across a range of example PDE problems. In Subsection 4.2 below, we will specify five concrete examples of (non-local) nonlinear PDEs and describe how Framework 4.1 can be specialized to compute approximate solutions to these example PDEs. These computations will be used in Section 5 to obtain reference values to compare the deep learning-based approximation method proposed in Section 3 above against.

Numerical simulations
In this section we illustrate the performance of the machine learning-based approximation method proposed in Framework 3.1 in Subsection 3.2 above by means of numerical simulations for five concrete (non-local) nonlinear PDEs; see Subsections 5.1, 5.2, 5.3, 5.4, and 5.5 below. In each of these numerical simulations we employ the general machine learning-based approximation method proposed in Framework 3.1 with certain 4-layer neural networks and using the Adam optimizer (cf. (42) and (43) and we employ, in Subsection 5.5 below, multidimensional versions of the ReLU function In addition, in Subsections 5.1, 5.2, and 5.4 we use the square function R x → x 2 ∈ R as activation function just in front of the output layer and in Subsections 5.3 and 5.5 we use the identity function R x → x ∈ R as activation function just in front of the output layer. Furthermore, we employ Xavier initialization to initialize all neural network parameters; see Glorot & Bengio [44] for details. We did not employ batch normalization in our simulations. Each of the numerical experiments presented below was performed with the Julia library HighDimPDE.jl on a NVIDIA TITAN RTX GPU with 1350 MHz core clock and 24 GB GDDR6 memory with 7000 MHz clock rate where the underlying system consisted of an AMD EPYC 7742 64-core CPU with 2TB memory running Julia 1.7.2 on Ubuntu 20.04.3. We refer to Section 6 below for the employed Julia source codes.

Non-local sine-Gordon type PDEs
In this subsection we use the machine learning-based approximation method in Framework 5.1 to approximately calculate the solutions of non-local sine-Gordon type PDEs (cf., e.g., Hairer & Shen [49], Barone et al. [7], and Coleman [25]).

Replicator-mutator PDEs
In this subsection we use the machine learning-based approximation method in Framework 5.1 to approximately calculate the solutions of certain replicator-mutator PDEs describing the dynamics of a phenotype distribution under the combined effects of selection and mutation (cf., e.g., Hamel et al. [50]).