1 Introduction

In this article, we study numerical approximations of mean field games (MFGs) with fractional and general nonlocal diffusions. We consider the mean field game system

$$\begin{aligned} {\left\{ \begin{array}{ll} -u_t - {\mathcal {L}} u + H(x,Du) = F (x, m(t)), \quad &{}\text { in } (0,T)\times {\mathbb {R}}^d, \\ m_t - {\mathcal {L}}^*m - \text {div} (m D_p H(x,Du)) = 0 \quad &{}\text { in } (0,T)\times {\mathbb {R}}^d, \\ u (T,x) = G(x,m(T)), \ m(0) = m_0 \quad &{}\text { in } {\mathbb {R}}^d, \end{array}\right. } \end{aligned}$$
(1)

where

$$\begin{aligned} {\mathcal {L}} \phi (x)= \int _{\vert z \vert >0} \big [ \phi (x+z) - \phi (x) - \mathbbm {1}_{\{\vert z\vert <1\}} D \phi (x) \cdot z \big ] \hbox {d}\nu (z), \end{aligned}$$
(2)

is a nonlocal diffusion operator (possibly degenerate), \(\nu \) is a Lévy measure (see assumption \((\nu \)0)), and the adjoint \({\mathcal {L}}^{*}\) is defined as \( ( {\mathcal {L}}^{*} \phi , \psi )_{L^{2}} = ( \phi , {\mathcal {L}} \psi )_{L^2 }\) for \(\phi ,\psi \in C_{c}^{\infty } ( {\mathbb {R}}^d)\).

The first equation in (1) is a backward in time Hamilton–Jacobi–Bellman (HJB) equation with terminal data G, and the second equation is a forward in time Fokker–Planck–Kolmogorov (FPK) equation with initial data \(m_0\). Here, H is the Hamiltonian, and the system is coupled through the cost functions F and G. There are two different types of couplings: (i) Local couplings where F and G depend on point values of m, and (ii) nonlocal or smoothing couplings where they depend on distributional properties induced from m through integration or convolution. Here, we work with nonlocal couplings.

A mathematical theory of MFGs was introduced by Lasry and Lions [51] and Huang et al. [46], and describes the limiting behavior of N-player stochastic differential games when the number of players N tends to \(\infty \) [19]. In recent years, there has been significant progress on MFG systems with local (or no) diffusion, e.g., modeling, well-posedness, numerical approximations, long time behavior, convergence of Nash equilibria, and various control and game theoretic questions, see, e.g., [5, 13, 19, 29, 41, 45] and references therein. The study of MFGs with “nonlocal diffusion” is quite recent, and few results exist so far. Stationary problems with fractional Laplacians were studied in [32], and parabolic problems including (1), in [35, 39]. We refer to [50] and references therein for some development using probabilistic methods.

The difference between problem (1) and standard MFG formulations lies in the type of noise driving the underlying controlled stochastic differential equations (SDEs). Usually, Gaussian noise is considered [5, 21, 28, 51, 53], or there is no noise (the first-order case) [18, 20]. Here, the underlying SDEs are driven by pure jump Lévy processes, which leads to nonlocal operators (2) in the MFG system. In many real-world applications, jump processes model the observed noise better than Gaussian processes [9, 36, 52, 56]. Prototypical examples are symmetric \(\sigma \)-stable processes and their generators, the fractional Laplace operators \((-\triangle )^{\frac{\sigma }{2}}\). In Economy and Finance, the observed noise is not symmetric and \(\sigma \)-stable, but rather nonsymmetric and tempered. A typical example is the one-dimensional CGMY process where \(\frac{\hbox {d}\nu }{\hbox {d}z}(z)=\frac{C}{\vert z\vert ^{1+Y}}\hbox {e}^{-Gz^+-Mz^-}\) for \(C,G,M>0\) and \(Y\in (0,2)\) (see, e.g., [36, Chapter 4.5]). Such models are covered by the results of this article. Our assumptions on the nonlocal operators (cf. (\(\nu \)1)) are quite general, allowing for degenerate operators and no restrictions on the tail of the Lévy measure \(\nu \).

There has been some development on numerical approximations for MFG systems with local operators. Finite difference schemes for nondegenerate second-order equations have been designed and analyzed, e.g., by Achdou et al. [1,2,3,4, 6,7,8] and Gueant [42,43,44]. Semi-Lagrangian (SL) schemes for MFG system have been developed by Carlini–Silva both for first-order equations [24] and possibly degenerate second-order equations [25]. Other numerical schemes for MFGs include recent machine learning methods [30, 31, 54] for high dimensional problems. We refer to the survey article [6] for recent developments on numerical methods for MFG. We know of no prior schemes or numerical analysis for MFGs with fractional or nonlocal diffusions.

In this paper, we will focus on SL schemes. They are monotone, stable, connected to the underlying control problem, easily handles degenerate and arbitrarily directed diffusions, and large time steps are allowed. Although the SL schemes for HJB equations have been studied for some time (see, e.g., [15, 17, 37, 40]), there are few results for FPK equations (but see [27]) and the coupled MFG system. For nonlocal problems, we only know of the results in [16] for HJB equations.

1.1 Our Contributions

A. Derivation  We construct fully discrete monotone numerical schemes for the MFG system (1). These dual SL schemes are closely related to the underlying control formulation of the MFG. In our case, it is based on the following controlled SDE:

$$\begin{aligned} \hbox {d}X_t = -\alpha _t \, \hbox {d}t + \hbox {d}L_t, \end{aligned}$$

where \(\alpha _t\) is the control and \(L_t\) a pure jump Lévy process [cf. (6)]. Note that \(L_t\) can be decomposed into small and large jumps, where the small jumps may have infinite intensity. We derive our approximation in several steps:

  1. 1.

    (Approximate small jumps) The small jumps are approximated by Brownian motion [see (7)] following, e.g., [10, 16, 38]. This is done to avoid infinitely many jumps per time-interval and singular integrals, and gives a better approximation compared to simply neglecting these terms.

  2. 2.

    (SL scheme for HJB) We discretize the resulting SDE from step 1 in time and approximate the noise by random walks and approximate compound Poisson processes in the spirit of [16] (Sect. 3.1). From the corresponding discrete time optimal control problem, dynamic programming, and interpolation, we construct an SL scheme for the HJB equation (Sect. 3.2).

  3. 3.

    (Approximate control) We define an approximate optimal feedback control for the SL scheme in step 2 from the continuous optimal feedback control as in [24, 25]: \(\alpha ^*_{approx } = D_p H(\cdot , Du_d^\epsilon )\), where \(u_d^{\epsilon }\) is a regularization of the (interpolated) solution from step 2 (Sect. 3.3).

  4. 4.

    (Dual SL scheme for FPK) The control of step 3 and the scheme in step 2 define a controlled approximate SDE with a corresponding discrete FPK equation for the densities of the solutions. We explicitly derive this FPK equation in weak form, and obtain the final dual SL scheme taking test functions to be linear interpolation basis functions (Sect. 3.4).

See (18) and (24) in Sect. 3 for the specific form of our discretizations. These seem to be the first numerical approximations of MFG systems with nonlocal or fractional diffusion and the first SL approximations of nonlocal FPK equations. Our dual SL schemes are extensions to the nonlocal case of the schemes in [24,25,26,27], and we give a clear derivation of such type of schemes in Sect. 3. The schemes come in the form of nonlinear coupled systems (27) that need to be resolved numerically. We prove existence of solutions using fixed point arguments, see Proposition 3.4.

B. Analysis  We establish a range of properties for the scheme including monotonicity, consistency, stability, (discrete) regularity, convergence of individual equations, and convergence to the full MFG system.

  1. 1.

    (HJB approximation) For the approximation of the HJB equation, we prove pointwise consistency and uniform discrete \(L^\infty \), Lipschitz, and semiconcavity bounds. Convergence to a viscosity solution is obtained via the half-relaxed limit method [12].

  2. 2.

    (FKP approximation) We prove consistency in the sense of distributions, preservation of mass and positivity, \(L^1\)-stability, tightness, and equi-continuity in time. In dimension \(d=1\), we also prove uniform \(L^p\)-estimates for all \(p\in (1,\infty ]\). Convergence is obtained from compactness and stability arguments.

  3. 3.

    (The full MFG approximation) We prove convergence along subsequences to viscosity-very weak solutions of the MFG system in two cases: (i) Degenerate equations in dimension \(d=1\), and (ii) nondegenerate equation in \({\mathbb {R}}^d\) under the assumption that solutions of the HJB equation are \(C^1\) in space. Full convergence follows for MFGs with unique solutions, and convergence to classical solutions follows under certain regularity and weak uniqueness conditions. Applying the results to the setting of [39], we obtain full convergence to classical solutions in this case.

Because of the nonlocal or smoothing couplings, the HJB approximation can be analyzed almost independently of the FKP approximation. The analysis of the FKP scheme, on the other hand, strongly depends on boundedness and regularity properties of solutions of the HJB scheme. Compactness in measure is enough in the nondegenerate case when the HJB equation has \(C^1\) solutions, while stronger weak (\(*\)) compactness in \(L^p\) for some \(p\in (1,\infty ]\) is needed in the degenerate case. This way of reasoning is inspired by and similar to [24, 25, 27]. As in [24], we are only able to prove this latter compactness in dimension \(d=1\). A priori estimates and convergence for \(p\in (1,\infty )\) seems to be new also for local MFGs.

In this paper, we study general Lévy jump processes and nonlocal operators. This means that the underlying stochastic processes may not have first moments whatever initial distribution we take (e.g., \(\sigma \)-stable processes with \(\sigma <1\)), and then we can no longer work in the commonly used Wasserstein-1 space \((P_1,d_1)\) for the FKP equations. Instead we work in the space \((P,d_0)\) of probability measures under weak convergence metrizised by the Rubinstein–Kantorovich metric \(d_0\) (see Sect. 2). Surprisingly, a result from [33] (Proposition 6.1) allows us to prove tightness and compactness in this space without any moment assumptions! We refer to Sect. 4.3 for a more detailed discussion along with convergence results in the traditional \((P_1,d_1)\) topology when first moments are available.

This \((P,d_0)\) setting can be adapted to local problems, to give results also there without moment assumptions. Finally, we note that our results for degenerate problems cover the first-order equations and improve [24] in the sense that more general initial distributions \(m_0\) are allowed: \(P\cap L^p\) for some \(p\in (1,\infty ]\) instead of \(P_{1+\delta }\cap L^\infty \) for some \(\delta >0\), where \(P_{1+\delta }\) is set of the probability measures with finite \((1+\delta )\) moments.

C. Testing  We provide several numerical simulations. In Examples 1 and 2, we use a similar setup as in [25], comparing the effects of a range of different diffusion operators: fractional Laplacians of different powers, CGMY-diffusions, a degenerate diffusion, a spectrally one-sided diffusion, as well as classical local diffusion and the case of no diffusion. In Example 3, we solve the MFG system on a long time horizon and observe the turnpike property in a nonlocal setting. Finally, in Example 4 we study the convergence of the scheme.

1.2 Outline of the Paper

In Sect. 2, we list our assumptions and state mostly known results of the MFG system (1) and its individual HJB and FKP equations. In Sect. 3, we construct the discrete schemes for the HJB, FKP, and full MFG equations from the underlying stochastic control problem/game. The convergence results are given in Sect. 4, along with extensions and a discussion section. In Sects. 5 and 6, we analyze the discretizations of the HJB and FKP equations, respectively, including establishing a priori estimates, stability, and some consistency results. Using these results, we prove the convergence results of Sect. 4 in Sect. 7. In Sect. 8, we provide and discuss numerical simulations of various nonlocal MFG systems. Finally, there are two appendices with proofs of technical results.

2 Assumptions and Preliminaries

We start with some notation. By CK, we mean various constants which may change from line to line. The Euclidean norm on any \({\mathbb {R}}^d\)-type space is denoted by \(\vert \cdot \vert \). For any subset \(Q\subseteq {\mathbb {R}}^d\) or \(Q \subseteq [0,T] \times {\mathbb {R}}^d\), and for any bounded, possibly vector valued function on Q, we will consider \(L^p\)-spaces \(L^{p}(Q)\) and spaces \(C_b(Q)\) of bounded continuous functions. Often we use the notation \(\Vert \cdot \Vert _0\) as an alternative notation for the norms in \(C_b\) or \(L^\infty \). The space \(C^m_b(Q)\) is the subset of \(C_b(Q)\) with m bounded and continuous derivatives, and for \(Q \subseteq [0,T] \times {\mathbb {R}}^d\), \(C^{l,k}_b(Q)\) is the subset of \(C_b(Q)\) with l bounded and continuous derivatives in time and k in space. By \(P({\mathbb {R}}^d)\), we denote the set of probability measure on \({\mathbb {R}}^d\). The Kantorovich–Rubinstein distance \(d_0(\mu _1,\mu _2)\) on the space \(P({\mathbb {R}}^d)\) is defined as

$$\begin{aligned} d_0(\mu _1,\mu _2) := \sup _{f\in \text{ Lip}_{1,1}({\mathbb {R}}^d)}\Big \{\int _{{\mathbb {R}}^d}f(x) \hbox {d}(\mu _1-\mu _2)(x)\Big \}, \end{aligned}$$

where \(\text{ Lip}_{1,1}({\mathbb {R}}^d) = \Big \{f : f \, \text{ is } \text{ Lipschitz } \text{ continuous } \text{ and } \, \Vert f\Vert _{0}, \Vert Df\Vert _{0}\le 1 \Big \}\). Whereas the \(1-\)Wasserstein metric \(d_1\) on the space \(P_1({\mathbb {R}}^d)\), probability measures with finite first moment, can be defined as

$$\begin{aligned} d_1(\mu _1,\mu _2) := \sup _{f\in \text{ Lip}_{1}({\mathbb {R}}^d)}\Big \{\int _{{\mathbb {R}}^d}f(x) \hbox {d}(\mu _1-\mu _2)(x)\Big \}, \end{aligned}$$

where \(\text{ Lip}_{1}({\mathbb {R}}^d)= \Big \{f : f \, \text{ is } \text{ Lipschitz } \text{ continuous } \text{ and } \, \Vert Df\Vert _{0}\le 1 \Big \}\).

We define the Legendre transform L of H as:

$$\begin{aligned} L (x,q) := \sup _{p \in {\mathbb {R}}^d} \big \{ p\cdot q - H(x,p) \big \}. \end{aligned}$$

We use the following assumptions for Eq. (1):

\((\nu \)0):

(Lévy condition) \(\nu \) is a positive Radon measure that satisfies

$$\begin{aligned} \int _{{\mathbb {R}}^d} 1 \wedge \vert z\vert ^2 \hbox {d}\nu (z) < \infty . \end{aligned}$$
(\(\nu \)1):

(Growth near singularity) \(\nu \) is absolutely continuous for \(\vert z\vert <1\), and there exist constants \(\sigma \in (0,2)\) and \(C >0\) such that

$$\begin{aligned} 0 \le \frac{\hbox {d}\nu }{\hbox {d}z} \le \frac{C}{\vert z\vert ^{d+\sigma }}, \quad \vert z\vert < 1. \end{aligned}$$
(L0):

(Continuity and local boundedness) The function \(L: {\mathbb {R}}^d\times {\mathbb {R}}^d\rightarrow {\mathbb {R}}\) is continuous in xq, and for any \(K>0\), there exists \(C_{L}(K)>0\) such that

$$\begin{aligned} \sup _{\vert q\vert \le K}\vert L(x,q)\vert \le C_L(K), \qquad x \in {\mathbb {R}}^d. \end{aligned}$$
(L1):

(Convexity and growth) The function L(xq) is convex in q and satisfies

$$\begin{aligned} \lim _{\vert q\vert \rightarrow +\infty } \frac{L(x,q)}{\vert q\vert } = +\infty , \qquad x \in {\mathbb {R}}^d. \end{aligned}$$
(L2):

(Lipschitz regularity) There exists a constant \(L_L >0\) independent of q, such that

$$\begin{aligned} \vert L (x,q) - L (y,q) \vert \le L_L \vert x-y\vert . \end{aligned}$$
(L3):

(Semi-concavity) There exists a constant \(c_L >0\) independent of q, such that

$$\begin{aligned} L(x+y,q) - 2 L(x,q) + L(x-y,q) \le c_L \vert y\vert ^2. \end{aligned}$$
(F1):

(Uniform bounds) There exist constants \(C_{F}, C_{G} >0\) such that

$$\begin{aligned} \vert F ( x,\mu ) \vert \le C_{F},\qquad \vert G ( x,\mu ) \vert \le C_{G}, \qquad x \in {\mathbb {R}}^d,\ \mu \in P ({\mathbb {R}}^d). \end{aligned}$$
(F2):

(Lipschitz assumption) There exist constants \(L_F, L_G >0\) such that

$$\begin{aligned}&\vert F (x,\mu _1 ) - F(y,\mu _2) \vert \le L_F \big [ \vert x-y\vert + d_0(\mu _1,\mu _2) \big ], \\&\vert G (x,\mu _1 ) - G(y,\mu _2) \vert \le L_G \big [ \vert x-y\vert + d_0(\mu _1,\mu _2) \big ]. \end{aligned}$$
(F3):

(Semi-concavity) There exist constants \(c_{F}, c_{G} >0\) such that

$$\begin{aligned}&F (x+y,\mu ) - 2 F(x,\mu ) + F ( x-y,\mu ) \le c_{F} \vert y\vert ^2, \\&G (x+y,\mu ) - 2 G(x,\mu ) + G ( x-y,\mu ) \le c_{G} \vert y\vert ^2. \end{aligned}$$
(M):

(Initial condition) We assume \(m_0\in P({\mathbb {R}}^d)\).

(M’):

The dimension \(d=1\), and \(m_0\in P({\mathbb {R}}) \cap L^p({\mathbb {R}})\) for some \(p\in (1, \infty ]\).

By (L1), the Legendre transform \(H= L^*\) is well defined and the optimal q is \(q^* = D_p H(x,p)\). To study the convergence of the numerical schemes, we further assume local uniform bounds on the derivatives of Hamiltonian:

(H1):

The function \(D_p H \in C ({\mathbb {R}}^d\times {\mathbb {R}}^d)\), and for every \(R >0\), there is a constant \(C_R>0\) such that for every \(x\in {\mathbb {R}}^d\) and \(p\in B_R\) we have \(\vert D_p H (x,p) \vert \le C_R\).

(H2):

The function \(D_p H\in C^1({\mathbb {R}}^d\times {\mathbb {R}}^d)\). For every \(R>0\), there exists a constant \(C_R>0\) such that for every \(x\in {\mathbb {R}}^d\) and \(p\in B_R\), we have

$$\begin{aligned} \vert D_{pp}H(x,p)\vert + \vert D_{px} H(x,p)\vert \le C_R. \end{aligned}$$

Remark 2.1

  1. (i)

    We impose most of the conditions on L, and not on H, as L appears in the optimal control problem, which would be the basis of our semi-Lagrangian approximation. Assumptions (L1) and (L2) (but, not (L3)!) would immediately carry forward to the corresponding Hamiltonian H from the definition of Legendre transform. Whereas we require to assume (H1)–(H2) on H, in contrary to the other assumptions, as it does not follow from the condition on L in general. However, when the Lagrangian L behaves like \(\vert \cdot \vert ^{r}\) in q variable for large q and \(r>1\), the growth of the corresponding Hamiltonian H would be \(\vert \cdot \vert ^{\frac{r}{r-1}}\) in p variable for large p (cf. [34, Proposition 2.1]). The growth of the derivatives of H for large p can be computed similarly, which would correspond to similar condition as in (H1)–(H2).

  2. (ii)

    Couplings satisfying (F1)–(F3) are, e.g., given by

    $$\begin{aligned} F(x,\mu ) = f(x,(\rho *\mu )(x)) \end{aligned}$$

    where \(f\in C^2_b\) and \(\rho \in C^2_b\). These conditions can be relaxed in several directions.

In most of this paper, solutions of the HJB equation in (1) are interpreted in the viscosity sense, we refer to [48] and references therein for general definition and well-posedness results, while solutions of FPK equation in (1) are considered in the very weak sense defined as follows:

Definition 2.2

  1. (a)

    If \(u\in C^{0,1}_b((0,T)\times {\mathbb {R}}^d)\), then \(m\in C([0,T],P({\mathbb {R}}^d))\) is a very weak solution of the FPK equation in (1), if for every \(\phi \in C_c^{\infty }({\mathbb {R}}^d)\) and \(t\in [0,T]\)

    $$\begin{aligned} \begin{aligned}&\int _{{\mathbb {R}}^d} \phi (x) \,\text{ d }m(t)(x) -\int _{{\mathbb {R}}^d} \phi (x) \,\text{ d }m_0(x) \\ {}&\quad = \int _{0}^t \int _{{\mathbb {R}}^d} \Big ({\mathcal {L}}[\phi ](x) -D_pH(x,Du)\cdot D\phi (x)\Big ) \text{ d }m(s)(x)\, \text{ d }s. \end{aligned} \end{aligned}$$
    (3)
  2. (b)

    If \(u\in L^\infty (0,T; W^{1,\infty }({\mathbb {R}}^d))\) and \(p\in [1,\infty ]\), a function \(m\in C([0,T],P({\mathbb {R}}^d))\cap L^p([0,T]\times {\mathbb {R}}^d)\), is a very weak solution of the FPK equation in (1), if (3) holds for every \(\phi \in C_c^{\infty }({\mathbb {R}}^d)\) and \(t\in [0,T]\).

Remark 2.3

Inequality (3) holding for every \(\phi \in C_c^{\infty }({\mathbb {R}}^d)\) and \(t\in [0,T]\) is equivalent to

$$\begin{aligned} \begin{aligned}&\int _{{\mathbb {R}}^d} \phi (T,x) \,\text{ d }(m(T))(x) -\int _{{\mathbb {R}}^d} \phi (0,x) \,\text{ d }m_0(x)\\ {}&\quad = \int _{0}^T \int _{{\mathbb {R}}^d} \Big (\phi _t(s,x) + {\mathcal {L}}[\phi ](s,x) -D_pH(x,Du)\cdot D\phi (s,x)\Big ) \text{ d }m(s)(x)\,\text{ d }s, \end{aligned} \end{aligned}$$

holding for every \(\phi \in C^{1,2}_b([0,T]\times {\mathbb {R}}^d)\) (cf. [33, Lemma 6.1]).

Definition 2.4

A pair (um) is a viscosity-very weak solution of the MFG system (1), if u is a viscosity solution of the HJB equation, and m is a very weak solution of the FPK equation (see, Definition 2.2).

We first give the well-posedness result of the HJB equation in (1) for fixed m.

Proposition 2.5

Fix, \(\mu \in C([0,T],P({\mathbb {R}}^d))\). Let \((\nu \)0), (L2), and (F1) hold.

  1. (a)

    (Comparison principle) If u is a viscosity subsolution and v is a viscosity supersolution of the HJB equation in (1) with \(u(T,\cdot )\le v(T,\cdot )\), then \(u\le v\).

  2. (b)

    There exists a unique bounded viscosity solution \(u \in C_b([0,T]\times {\mathbb {R}}^d)\) of the HJB equation in (1), and for any \(t\in [0,T]\) we have \(\Vert u(t)\Vert _{0} \le C_FT + C_G\).

  3. (c)

    If (L2) and (F2) hold, then the viscosity solution u is Lipschitz continuous in space variable and for every \(t\in [0,T]\) and \(x,y \in {\mathbb {R}}^d\) we have

    $$\begin{aligned} \vert u(t,x) - u(t,x+y)\vert \le \big (T(L_L+L_F) + L_G\big ) \, \vert y\vert . \end{aligned}$$

    In addition, if (L3) and (F3) hold, then u is semiconcave in space variable and for every \(t\in [0,T]\) and \(x,y \in {\mathbb {R}}^d\) we have

    $$\begin{aligned} u(t,x+y) + u(t,x-y) - 2u(t,x) \le \big (T(c_L+c_F) + c_G\big ) \, \vert y\vert ^2. \end{aligned}$$

Proof

These results are by now standard: (a) follows by a similar argument as for [48, Theorem 3.1], (b) follows by, e.g., Perron’s method, and (c) by adapting the comparison arguments of [48] in a standard way. We omit the details. Under some extra assumptions, (b) and (c) also follow from Theorem 5.4 and Lemma 5.3. \(\square \)

We also need a well-posedness result for the FPK equation in (1) for fixed u.

Proposition 2.6

Assume \((\nu \)0), (\(\nu \)1), (H1), and (M).

  1. (a)

    If \(u \in C([0,T] ; C^1_b({\mathbb {R}}^d))\), then there exists a very weak solution \(m \in C([0,T];P({\mathbb {R}}^d))\) of the FPK equation in (1).

  2. (b)

    If \(d=1\), \(u \in C([0,T] ; W^{1,\infty }({\mathbb {R}}))\), u semi-concave, and (M’) holds, then there exists a very weak solution \(m \in C([0,T];P({\mathbb {R}})) \cap L^{p}([0,T]\times {\mathbb {R}})\) of the FPK equation in (1). Moreover, \(\Vert m(t)\Vert _{L^p({\mathbb {R}})} \le \hbox {e}^{CT}\Vert m_0\Vert _{L^p({\mathbb {R}})}\) for some constant \(C>0\) and \(t\in [0,T]\).

Proof

The results follow from the convergence of the discrete scheme in this article. The proof of (a) follows the proof of Theorem 4.3, setting \(Du_{\rho ,h} = Du\). The proof of (b) follows the proof of Theorems 4.1 and 6.7, setting \(Du_{\rho ,h} = Du\). Note that semi-concavity of u is crucial for the \(L^p\)-bound of Theorem 6.7. \(\square \)

Existence and uniqueness results are given in [39] for classical solutions of MFGs with nonlocal diffusions under additional assumptions:

(\(\nu \)2):

(Growth near singularity) There exist constants \(\sigma \in (1,2)\) and \(c>0\) such that the density of \(\nu \) for \(\vert z\vert <1\) satisfies

$$\begin{aligned} \frac{c}{\vert z\vert ^{d+\sigma }} \le \frac{\hbox {d}\nu }{\hbox {d}z}, \text { for } \vert z\vert < 1. \end{aligned}$$
(F4):

There exist constants \(C_{F}, C_{G} > 0\), such that \(\Vert F ( \cdot ,m ) \Vert _{C_{b}^{2}} \le C_{F} \) and \(\Vert G ( \cdot , {\tilde{m}} ) \Vert _{C_{b}^{3}} \le C_{G}\) for all \(m, {\tilde{m}} \in P ( {\mathbb {R}}^d)\).

(F5):

F and G satisfy monotonicity conditions:

$$\begin{aligned} \int _{{\mathbb {R}}^d} \left( F \left( x, m_1 \right) - F \left( x, m_2 \right) \right) \hbox {d} \left( m_1 -m_2 \right) \left( x \right)&\ge 0 \qquad \forall m_1,m_2 \in P ( {\mathbb {R}}^d), \\ \int _{{\mathbb {R}}^d} \left( G \left( x, m_1 \right) - G \left( x, m_2 \right) \right) \hbox {d} \left( m_1 -m_2 \right) \left( x \right)&\ge 0 \qquad \forall m_1,m_2 \in P ( {\mathbb {R}}^d). \end{aligned}$$
(H3):

The Hamiltonian \(H \in C^3 ( {\mathbb {R}}^d\times {\mathbb {R}}^d) \), and for every \(R>0\) there is \(C_{R} >0\) such that for \(x \in {\mathbb {R}}^d\), \(p \in B_{R}\), \(\alpha \in {\mathbb {N}}_{0}^{N}\), \( \vert \alpha \vert \le 3\), then \(\vert D^{\alpha } H ( x,p ) \vert \le C_{R}\).

(H4):

For every \(R > 0\) there is \(C_R >0\) such that for \(x,y \in {\mathbb {R}}^d, u \in \left[ -R,R \right] , p \in {\mathbb {R}}^d\): \(\vert H \left( x,u,p \right) - H \left( y,u,p \right) \vert \le C_R \left( \vert p\vert +1 \right) \vert x-y\vert \).

(H5):

(Uniform convexity) There exists a constant \(C >0\) such that \(\frac{1}{C} I_d \le D_{pp}^2 H \left( x,p \right) \le C I_d\).

(M”):

The probability measure \(m_{0}\) has a density (also denoted by \(m_0\)) \(m_{0} \in C_{b}^{2}\).

Theorem 2.7

Assume \((\nu \)0), (\(\nu \)1), (\(\nu \)2), (F2), (F4), (H3),(H4), and (M”).

  1. (a)

    There exists a classical solution (um) of (1) such that \(u\in C^{1,3}_b((0,T)\times {\mathbb {R}}^d)\) and \(m\in C^{1,2}_b((0,T)\times {\mathbb {R}}^d)\cap C(0,T; P({\mathbb {R}}^d))\).

  2. (b)

    If in addition (F5) and (H5) hold, then the classical solution is unique.

This is a consequence of [39, Theorem 2.5 and Theorem 2.6]. We refer to [39] for more general results, where in particular assumptions (\(\nu \)1) and (\(\nu \)2) can be relaxed to allow for a much larger class of nonlocal operators \({\mathcal {L}}\). In the nondegenerate case, for the individual equations in (1) we also have uniqueness of viscosity-very weak solutions and existence of classical solutions. Uniqueness for HJB equations and existence for HJB and FPK equations follows by Theorem 5.3, Theorem 5.5, and Proposition 6.8 in [39]. We prove uniqueness for very weak solutions of FPK equations here.

Proposition 2.8

(Uniqueness for the FPK equation) Assume \((\nu \)0), (\(\nu \)1), (\(\nu \)2), and \(D_p H (x,Du (t)) \in C_b^{0,2} ((0,T)\times {\mathbb {R}}^d)\). Then, there is at most one very weak solution of the FPK equation in (1).

Proof

Let \(m_1,m_2\) be two very weak solutions, define \({{\tilde{m}}} := m_1 - m_2\) and take any \(\psi \in C_c^{\infty } \left( {\mathbb {R}}^d\right) \). For any \(\tau \in (0,T)\), the terminal value problem

$$\begin{aligned}&\partial _t \phi + {\mathcal {L}} \phi - D \phi \cdot D_p H (x, Du) = 0 \quad \text {in} \quad {\mathbb {R}}^d\times (0,\tau ) \quad \text {and} \quad \\&\phi (x,\tau ) = \psi (x)\quad \text {in} \quad {\mathbb {R}}^d, \end{aligned}$$

has a unique classical solution \(\phi \in C_b^{1,2} ( (0,\tau )\times {\mathbb {R}}^d) \) essentially by [39, Theorem 5.5] (the result follows from Proposition 5.8 with \(k=2\) and the observation that the proof of Theorem 5.5 also holds for \(k=2\)). Using the definition of very weak solution (see Remark 2.3), we get

$$\begin{aligned} \int _{{\mathbb {R}}^d} \psi (x) \, \hbox {d}{{\tilde{m}}}(\tau )(x) = \int _0^\tau \int _{{\mathbb {R}}^d} \big (\partial _t \phi + {\mathcal {L}} \phi - D \phi \cdot D_p H (x,Du) \big ) \, \hbox {d}{{\tilde{m}}}(t)(x) \, \hbox {d}t = 0, \end{aligned}$$

for any \(\tau \in [0,T]\). Since \(\psi \) was arbitrary, it follows that \({{\tilde{m}}} (\tau ) = 0\) in \(P ({\mathbb {R}}^d) \) for every \(\tau \in [0,T]\), and uniqueness follows. \(\square \)

3 Discretization of the MFG System

To discretize the MFG system (1), we first follow [16] and derive a Semi-Lagrange approximation of the HJB equation in (1). Using this approximation and the optimal control of the original problem, we derive an approximation of the FPK equation in (1) which is in (approximate) duality with the approximation of the HJB-equation.

This derivation is based on the following control interpretation of the HJB equation. For a fixed given density \(m= \mu \), the solution u of the HJB equation in (1) is the value function of the optimal stochastic control problem:

$$\begin{aligned} u (t,x) = \inf _{\alpha } J \big ( x,t, \alpha \big ), \end{aligned}$$
(4)

where \(\alpha _t\) is an admissible control, J is the total cost to be minimized,

$$\begin{aligned} J \big ( x,t, \alpha \big ) = {\mathbb {E}} \bigg [ \int _t^T \Big ( L ({{\tilde{X}}}_s, \alpha _s ) + F ({{\tilde{X}}}_s, \mu _s \Big ) \hbox {d}s + G ({{\tilde{X}}}_T, \mu _T)\bigg ], \end{aligned}$$
(5)

and \({{\tilde{X}}}_s={{\tilde{X}}}_s^{x,t}\) solves the controlled stochastic differential equation (SDE)

$$\begin{aligned} {\left\{ \begin{array}{ll} \hbox {d}{{\tilde{X}}}_s = -\alpha _s\, \hbox {d}s + \int _{\vert z\vert <1} z {\tilde{N}} (\hbox {d}z,\hbox {d}s) + \int _{\vert z\vert \ge 1} z N (\hbox {d}z,\hbox {d}s), \quad s>t,\\ {{\tilde{X}}}_t = x, \end{array}\right. } \end{aligned}$$
(6)

where N a Poisson random measure with intensity/Lévy measure \(\nu (\hbox {d}z) \hbox {d}s\), and \({\tilde{N}} = N (\hbox {d}z,\hbox {d}s) - \nu (\hbox {d}z) \hbox {d}s\) is the compensated Poisson measure.Footnote 1

3.1 Approximation of the Underlying Controlled SDE

3.1.1 A. Approximate Small Jumps by Brownian Motion

First, we approximate small jumps in (6) by (vanishing) Brownian motionFootnote 2 (cf. [10]): For \(r\in (0,1)\), let \(X_s=X_s^{x,t}\) solve

$$\begin{aligned} {\left\{ \begin{array}{ll} \hbox {d}X_s = \bar{b} (\alpha _s ) \hbox {d}s + \sigma _r \, \hbox {d}W_s + \int _{\vert z\vert \ge r} z N (\hbox {d}z,\hbox {d}s), \quad s>t\\ X_t = x, \end{array}\right. } \end{aligned}$$
(7)

where \(W_s\) is a standard Brownian motion, \(\bar{b} ( \alpha _s) = -\,\alpha _s - b_{r}^{\sigma }\), and

$$\begin{aligned}&b_r^{\sigma } := \int _{r<\vert z\vert < 1} z\, \nu (\hbox {d}z), \end{aligned}$$
(8)
$$\begin{aligned}&\sigma _r := \bigg ( \frac{1}{2} \int _{\vert z\vert <r} zz^T \nu (\hbox {d}z) \bigg )^{1/2} . \end{aligned}$$
(9)

The last integral in (7) is a compound Poisson process (cf., e.g., [9]): For any \(t\ge 0\),

$$\begin{aligned} \int _0^{t}\int _{\vert z\vert \ge r} z N (\hbox {d}z,\hbox {d}t) = \sum _{j=1}^{{\hat{N}}_t} J_j \end{aligned}$$
(10)

where the number of jumps up to time t is \({\hat{N}}_t \sim Poisson (t\lambda _r)\), the jumps \(\{J_j\}_{j}\) are iid rv’s in \({\mathbb {R}}^d\) with distribution \(\nu _r\) and \(J_0=0\), and for \(r\in (0,1]\),

$$\begin{aligned} \nu _r := \nu \mathbbm {1}_{\vert z\vert >r} \qquad \text {and}\qquad \lambda _r := \int _{{\mathbb {R}}^d} \nu _r(\hbox {d}z). \end{aligned}$$
(11)

The infinitesimal generators \({\mathcal {L}}^{\alpha }\) and \(\hat{{\mathcal {L}}}^{\alpha }\) of the SDEs (6) and (7) are (cf. [9])

$$\begin{aligned}&{\mathcal {L}}^{\alpha } \phi (x)= -\alpha _{t} \cdot \nabla \phi + {\mathcal {L}}_1 \phi ( x ) + {\mathcal {L}}^1 \phi ( x ), \\&\hat{{\mathcal {L}}}^{\alpha } \phi (x)= \, \bar{b}(\alpha _{t}) \cdot \nabla \phi (x) + tr \big ( \sigma _{r}^T \cdot D^{2} \phi ( x )\cdot \sigma _{r} \big ) + {\mathcal {L}}^{r}[\phi ](x) \end{aligned}$$

for \(\phi \in C^2_b({\mathbb {R}}^d)\), where the operator \({\mathcal {L}}\) in (2) can be rewritten as follows

$$\begin{aligned} \begin{aligned} {\mathcal {L}} \phi ( x )&= \bigg (\int _{\vert z\vert< r}+ \int _{\vert z\vert > r}\bigg ) \Big (\phi ( x+z ) - \phi ( x ) - \mathbbm {1}_{\{\vert z\vert <1\}} D \phi ( x )\cdot z \Big ) \hbox {d} \nu ( z ) \\&:= {\mathcal {L}}_r \phi ( x ) + {\mathcal {L}}^r \phi ( x ). \end{aligned} \end{aligned}$$
(12)

The operator \(\hat{{\mathcal {L}}}^{\alpha }\) is an approximation of \({\mathcal {L}}^{\alpha }\).

Lemma 3.1

([49]) If (\(\nu \)1) holds and \(\phi \in C_{b}^{3}({\mathbb {R}}^d)\), then for \({\mathcal {L}}_r\) and \(\sigma _r\) defined in (12) and (9), respectively, we have

$$\begin{aligned} \vert {\mathcal {L}}_r \phi ( x ) - tr \big ( \sigma _{r}^T \cdot D^{2} \phi ( x )\cdot \sigma _{r} \big ) \vert \le C r^{3-\sigma } \Vert D^{3} \phi \Vert _{0}. \end{aligned}$$

If in addition, \(\phi \in C_{b}^{4}({\mathbb {R}}^d)\) and the Lévy measure \(\nu \) is symmetric, then

$$\begin{aligned} \vert {\mathcal {L}}_r \phi ( x ) - tr \big ( \sigma _{r}^T \cdot D^{2} \phi ( x )\cdot \sigma _{r} \big ) \vert \le C r^{4-\sigma } \Vert D^{4} \phi \Vert _{0}. \end{aligned}$$

3.1.2 B. Time Discretization of the Approximate SDE

Fix a time step \(h=\frac{T}{N}\in (0,1)\) for some \(N\in {\mathbb {N}}\) and discrete times \(t_k=kh\) for \(k\in \{0,1,\dots ,N\}\). Following [16], we propose the following Euler-Maruyama discretization of the SDE (7): Let \(X_n^{t_l,x}\approx X^{t_l,x}_{t_n}\), where \(X_n=X^{t_l,x}_n\)solves

$$\begin{aligned} {\left\{ \begin{array}{ll} X_l = x \\ X_n = X_{n-1} + h \bar{b} (\alpha _{n-1}) + \sqrt{h} \displaystyle \sum _{m=1}^d \sigma _r^m \xi _{n-1}^m, \ \ n-l=N_i+1, \dots ,N_{i+1}-1,\\ X_{l+N_{i+1}} = X_{l+N_{i+1}-1} + J_i. \end{array}\right. } \end{aligned}$$
(13)

Here, the control \(\alpha _n\) is constant on each time interval, \(\sigma _r^m\) is the mth-column of \(\sigma _r\), and \(\xi _n = ( \xi _{n}^{1}, \ldots , \xi _{n}^{d} )\) is a random walk in \({\mathbb {R}}^d\) with

$$\begin{aligned} {\mathbf {P}} \big ( \xi ^i_n = \pm 1 \big ) = \frac{1}{2d}. \end{aligned}$$

The processes \(J_k\) and \(N_k\) define an approximation of the compound Poisson part of (7) through equation (10) where \({\hat{N}}_t\) is replaced by an approximation

$$\begin{aligned} {{\tilde{N}}}_t = \max \{k:\Delta T_1+\Delta T_2+\dots +\Delta T_k\le t\}, \end{aligned}$$

where exponentially distributed waiting times (time between jumps) are replaced by time grid approximations \(\{\Delta T_k\}_{k\in {\mathbb {N}}}\)Footnote 3: \(\Delta T_k=h \Delta N_k= h(N_{k}-N_{k-1})\) where \(N_k:\Omega \rightarrow {\mathbb {N}}\cup \{0\}\), \(N_0=0\), and \(\Delta N_k\) iid with approximate \(h \lambda _{r}\)-exponential distribution given by

$$\begin{aligned} {\mathbf {P}} [\Delta N_k > j ] = \hbox {e}^{-h \lambda _rj} \quad \text{ for } \quad j=0,1,2,\dots . \end{aligned}$$

Then for \(p_{j} := P [ \Delta N_k = j ]\), \(p_0=0\) and \(p_j = P [ \Delta N_k> j-1 ]-P [ \Delta N_k > j ] = \hbox {e}^{- jh \lambda _{r}} ( \hbox {e}^{h \lambda _{r}} -1)\) for \(j>0\). We find that \(\sum _{j=0}^{\infty } p_{j} =1\) and \(E(\Delta N_k)=\sum _{j=0}^\infty \hbox {e}^{- jh \lambda _{r}} =\frac{\hbox {e}^{h \lambda _{r}}}{\hbox {e}^{h \lambda _{r}}-1}\). Note that in each time interval, approximation (13) either diffuses (the second equation) or jumps (the third equation), and that by definition of \(N_k\), there is at most one jump in this interval. For the scheme to converge, we will see that we need to send both \(h\rightarrow 0\) and \(h\lambda _r\rightarrow 0\). In this case, \(E(\Delta N_k)\rightarrow \infty \) and the jumps become less and less frequent compared to the random walk (which is natural in view of the limiting processes).

3.2 Semi-Lagrangian Approximation of the HJB Equation

3.2.1 A. Control Approximation of the HJB Equation

We approximate the control problem (4)–(6) by a discrete time control problem: Define the value function

$$\begin{aligned} {{\tilde{u}}}_h (t_l , x ) = \inf _{ \{\alpha _n \} } J_h \big ( x,t_l , \{\alpha _n \} \big ), \end{aligned}$$
(14)

where \(t_l= l h\) for \(l \in \{0,1,\dots ,N-1\}\) and the controls \(\{\alpha _n\}\) are piecewise constant in time, the cost function \(J_h\) is given by

$$\begin{aligned} J_h \big ( x,t_l , \{\alpha _n \} \big ) = {\mathbb {E}} \bigg [ \sum _{n=l}^{N-1} \Big ( L ( X_{n}, \alpha _n ) + F ( X_n, \mu (t_n) ) \Big ) h + G (X_N,\mu (t_N)) \bigg ], \end{aligned}$$
(15)

and the controlled discrete time process \(X_n=X_n^{t_l,x}\) is the solution of (13). By the (discrete time) dynamic programming principle, it follows that

$$\begin{aligned} {{\tilde{u}}}_h (t_l ,x ) = \inf _{\alpha _n}{\mathbb {E}} \bigg [ \sum _{n=l}^{l+p} \Big ( L ( X_{n}^{t_l ,x}, \alpha _n ) + F ( X_n^{t_l ,x}, \mu (t_n) ) \Big ) h + {{\tilde{u}}}_h (t_{l+p+1}, X_{l+p+1}^{t_l,x} ) \bigg ], \end{aligned}$$

for any fixed \(p \in \{0, \ldots , N-(l+1)\}\). Taking \(p=0\)Footnote 4 and computing the expectation using conditional probabilities (the probability to jump in the first time interval is \(p_1=1-\hbox {e}^{-h\lambda _r}\)), we find a (discrete time) HJB equation

$$\begin{aligned}&{{\tilde{u}}}_h (t_l ,x) \nonumber \\&\quad = \inf _{\alpha } \bigg \{ h F ( x, \mu (t_l ) ) + h L (x,\alpha ) + \Big [ \frac{\hbox {e}^{-h \lambda _r}}{2d} \sum _{m=1}^d \big ({{\tilde{u}}}_h (t_{l+1}, x + h \bar{b} (\alpha ) + \sqrt{hd} \sigma _r^m ) \nonumber \\&\qquad + {{\tilde{u}}}_h(t_{l+1}, x+ h\bar{b} (\alpha ) - \sqrt{hd} \sigma _r^m ) \big ) \Big ] + \frac{1-\hbox {e}^{-h\lambda _r}}{\lambda _r} \int _{\vert z\vert \ge r} {{\tilde{u}}}_h (t_{l+1},x+z) \nu (\hbox {d}z) \bigg \}. \end{aligned}$$
(16)

Note that this is an explicit backward in time one-step scheme.

3.2.2 B. Interpolation and the Fully Discrete Scheme

For \(\rho >0\), we fix a grid \({\mathcal {G}}_{\rho } = \{i \rho : i\in {\mathbb {Z}}^d\}\) and a linear/multilinear \({\mathcal {G}}_{\rho }\)-interpolation I. For functions \(f: {\mathcal {G}}_{\rho } \rightarrow {\mathbb {R}}\),

$$\begin{aligned} I[f] (x) := \sum _{i\in {\mathbb {Z}}^d} f(x_i) \beta _i (x), \qquad x \in {\mathbb {R}}^d, \end{aligned}$$
(17)

where the \(\beta _j\)’s are piecewise linear/multilinear basis functions satisfying

$$\begin{aligned} \beta _j\ge 0 , \quad \beta _j (x_i) = \delta _{j,i}, \quad \sum _{j} \beta _j (x) = 1, \quad \text {and} \quad \Vert I[\phi ] -\phi \Vert _{0} = \Vert D^2\phi \Vert _0\rho ^2 \end{aligned}$$

for any \(\phi \in C^2_b({\mathbb {R}}^d)\). A fully discrete scheme is then obtained from (16) as follows:

$$\begin{aligned} {{\tilde{u}}}_{i,k}[\mu ] = S_{\rho ,h,r} [\mu ] ({{\tilde{u}}}_{\cdot ,k+1},i,k), \ k<N, \quad \text {and} \quad {{\tilde{u}}}_{i,N}[\mu ] = G (x_i, \mu (t_N)),&\end{aligned}$$
(18)

where

$$\begin{aligned}&S_{\rho ,h,r} [\mu ] (v,i,k) \nonumber \\&\quad = \inf _{\alpha } \Bigg \{ h F ( x_i, \mu (t_k) ) + h L ( x_i, \alpha ) \nonumber \\&\qquad + \frac{\hbox {e}^{-h \lambda _r}}{2d} \sum _{m=1}^d \Big ( I [ v ] (x_i + h \bar{b} (\alpha ) + \sqrt{hd} \sigma _r^m) + I [ v] (x_i+ h \bar{b} (\alpha ) - \sqrt{hd} \sigma _r^m) \Big ) \nonumber \\&\qquad + \frac{1-\hbox {e}^{-h\lambda _r}}{\lambda _r} \int _{\vert z\vert \ge r} I [ v ] (x_i + z ) \nu (\hbox {d}z) \Bigg \}. \end{aligned}$$
(19)

Finally, we extend the solution of the discrete scheme \({{\tilde{u}}}_{i,k}[\mu ]\) to the whole \({\mathbb {R}}^d\times [0,T]\) by linear interpolation in x and piecewise constant interpolation in t:

$$\begin{aligned} {{\tilde{u}}}_{\rho ,h}[\mu ](t,x) = I\big ({{\tilde{u}}}_{\cdot ,[\frac{t}{h}]}[\mu ]\big )(x)= \sum _{i\in {\mathbb {Z}}^d} \beta _i(x) \, {{\tilde{u}}}_{i,[\frac{t}{h}]}[\mu ] \quad \text{ for } \, \, (t,x)\in [0,T)\times {\mathbb {R}}^d. \end{aligned}$$
(20)

3.3 Approximate Optimal Feedback Control

For the HJB equation in (1), satisfied by the value function (4), it easily follows that the optimal feedback control is

$$\begin{aligned} \alpha (t,x) = D_p H(x, Du[\mu ](t,x)). \end{aligned}$$

Based on this feedback law, we define an approximate feedback control for the discrete time optimal control problem (13)–(15) in the following way: For \(h,\rho ,\epsilon >0\) and \((t,x)\in {\mathbb {R}}^d\times [0,T]\),

$$\begin{aligned} \alpha _{\text {num}}(t,x) := D_p H(x, D{{\tilde{u}}}^{\epsilon }_{\rho ,h}[\mu ](t,x)), \end{aligned}$$
(21)

where \({{\tilde{u}}}_{\rho ,h}[\mu ]\) is given by (20),

$$\begin{aligned} {{\tilde{u}}}_{\rho ,h}^{\epsilon }[\mu ](t,x) = {{\tilde{u}}}_{\rho ,h}[\mu ](t, \cdot )*\rho _{\epsilon }(x), \end{aligned}$$
(22)

and the mollifier \(\rho _{\epsilon }(x)= \frac{1}{\epsilon ^{d}}\rho \big (\frac{x}{\epsilon }\big )\) for \(0\le \rho \in C_c^{\infty }({\mathbb {R}}^d)\) with \(\int _{{\mathbb {R}}^d} \rho (x)\hbox {d}x=1\). We state a standard result on mollification.

Lemma 3.2

If \(u\in W^{1,\infty }({\mathbb {R}}^d)\), \(\epsilon >0\), and \(u^{\epsilon }=u*\rho _\epsilon \). Then, \(u^{\epsilon } \in C_b^\infty ({\mathbb {R}}^d)\), and there exists a constant \(c_\rho >0,\) such that for all \(\epsilon >0\),

$$\begin{aligned} \Vert u^{\epsilon } -u\Vert _0 \le \Vert Du\Vert _0\, \epsilon \qquad \text{ and } \qquad \Vert D^p u^{\epsilon }\Vert _0 \le c_\rho \Vert Du\Vert _0 \,\epsilon ^{1-p} \ \ \text {for any} \ \ p \in {\mathbb {N}}. \end{aligned}$$

By construction, we expect \(\alpha _{\text {num}}\) to be an approximation of the optimal feedback control for the approximate control problem with value function (14) when \(h,\rho ,\epsilon \) are small and \({{\tilde{u}}}^{\epsilon }_{\rho ,h}\) is close to u.

3.4 Dual SL Discretization of the FPK Equation

3.4.1 A. Dual Approximation of the FPK Equation

First note that if \({{\tilde{X}}}_s={{\tilde{X}}}^{0,Z_0}_s\) solves (6) with \(t=0\) and \(X_0 = Z_0\), a rv with distribution \(m_0\), then the FPK equation for \({{\tilde{m}}}:=Law({{\tilde{X}}}_s)\) is

$$\begin{aligned} {\left\{ \begin{array}{ll} {{\tilde{m}}}_t - {\mathcal {L}}^*{{\tilde{m}}} - \text {div} ( {{\tilde{m}}} \alpha ) = 0, \\ {{\tilde{m}}} ( 0 ) = m_{0}. \end{array}\right. } \end{aligned}$$

Setting \(\alpha = \alpha _{\text {num}}\), this equation becomes an approximation of the FPK equation in (1). With this choice of \(\alpha \), we further approximate \({{\tilde{m}}}\) by the density \({{\tilde{m}}}_k:= Law(X_k)\), of the approximate process \(X_k=X_k^{0,Z_0}\) solving (13) with \(l=0\) and \(X_0=Z_0\).

We now derive a FPK equation for \({{\tilde{m}}}_k\) which in discretized form will serve as our approximation of the FPK equation in (1). To simplify, we consider dimension \(d=1\). By definition of \({{\tilde{m}}}_k\),

$$\begin{aligned} {\mathbb {E}}[\phi (X_{k+1})] = \int _{{\mathbb {R}}} \phi (x) \, \hbox {d}{{\tilde{m}}}_{k+1}(x), \end{aligned}$$

for \(\phi \in C_b({\mathbb {R}}^d)\) and \(k \in {\mathbb {N}}\cup \{0\}\). Let \(A_{k}\) be the event of at least one jump in \([t_k,t_{k+1})\), i.e., \(A_{k}= \{\omega : N_{k+1}(\omega )- N_{k}(\omega )\ge 1 \}\) where \(N_k\) is the random jump time defined in Sect. 3.1 B. Then by the definition of \(X_k\) in (13), the fact that \(N_k\), \(J_k\), and \(\xi _k\) are i.i.d. and hence independent of \(X_k\), and conditional expectations, we find that

$$\begin{aligned}&\int _{{\mathbb {R}}} \phi (x) \, \text{ d }{{\tilde{m}}}_{k+1}(x) \\ {}&\quad = {\mathbb {E}}[\phi (X_{k+1})] \\ {}&\quad = {\mathbb {E}}[\phi (X_{k+1})\vert A_{k}^c] \, P(A_{k}^c) + {\mathbb {E}}[\phi (X_{k+1})\vert A_{k}] \, P(A_{k}) \\ {}&\quad = \text{ e}^{-h \lambda _r} {\mathbb {E}}(\phi (X_k + h \bar{b}(\alpha _{\text{ num }}) + \sqrt{h}\sigma _r \xi _{k})) + (1-\text{ e}^{-h \lambda _r}) {\mathbb {E}}(\phi (X_k + J_i)) \\ {}&\quad = \frac{\text{ e}^{-h \lambda _r}}{2} \int _{{\mathbb {R}}} \big ( \phi (x+h \bar{b}(\alpha _{\text{ num }}) + \sqrt{h}\sigma _r) + \phi (x+h \bar{b}(\alpha _{\text{ num }}) - \sqrt{h}\sigma _r)\big ) {{\tilde{m}}}_k(\text{ d }x) \\ {}&\ \qquad + (1-\text{ e}^{-h \lambda _r}) \int _{{\mathbb {R}}} \int _{\vert z\vert >r} \phi (x+z) \frac{\nu (\text{ d }z)}{\lambda _r} {{\tilde{m}}}_k(\text{ d }x). \end{aligned}$$

Let \( E_i:= \big (x_i- \frac{\rho }{2}, x_i + \frac{\rho }{2}\big )\), \({{\tilde{m}}}_{j,k} = \int _{E_j} {{\tilde{m}}}_k(\hbox {d}x)\). We approximate the above expression by a midpoint (quadrature) approximation, i.e., \(\int _{E_j} f(x) {{\tilde{m}}}_{k}(\hbox {d}x) \approx f(x_j) {{\tilde{m}}}_{j,k}\), then by choosing \(\phi (x) = \beta _j(x)\) (linear interpolant) for \(j\in {\mathbb {Z}}\) and using \(\beta _j(x_i)= \delta _{j,i}\) we get a fully discrete approximation

$$\begin{aligned} {\tilde{m}}_{j,k+1}&\approx \sum _{i\in {\mathbb {Z}}} {\tilde{m}}_{i,k}\Big [ \frac{\hbox {e}^{-h \lambda _r}}{2} \Big ( \beta _j(x_i+h \bar{b}(\alpha _{\text {num}}) + \sqrt{h}\sigma _r) \\&\quad + \beta _j(x_i+h \bar{b}(\alpha _{\text {num}}) - \sqrt{h}\sigma _r)\Big ) + \frac{1-\hbox {e}^{-h \lambda _r}}{\lambda _r} \int _{\vert z\vert >r} \beta _j(x_i+z) \nu (\hbox {d}z)\Big ] . \end{aligned}$$

In arbitrary dimension d, we denote

$$\begin{aligned} \Phi ^{\epsilon , \pm }_{j,k,p} := x_j - h\,\big ( H_{p} ( x_j, D{{\tilde{u}}}_{\rho ,h}^{\epsilon }[\mu ] (t_k,x_j) ) + B_r^{\sigma } \big ) \pm \sqrt{hd} \sigma _r^p . \end{aligned}$$
(23)

for \(j \in {\mathbb {Z}}^d\), \(k=0,\ldots , N\), \(p=1, \ldots ,d\). Redefining \(E_i := x_i + \frac{\rho }{2} (-1,1)^d\) and reasoning as for \(d=1\) above, we get the following discrete FPK equation

$$\begin{aligned} {\left\{ \begin{array}{ll} {{\tilde{m}}}_{i,k+1} [ \mu ] &{} := \displaystyle \sum _{j\in {\mathbb {Z}}^d} {{\tilde{m}}}_{j,k}[\mu ] \, {\mathbf {B}}_{\rho ,h,r} [ H_{p} ( \cdot , D{{\tilde{u}}}_{\rho ,h}^{\epsilon } [\mu ] ) ( i,j,k ), \\ {{\tilde{m}}}_{i,0} &{} = \displaystyle \int _{E_i} \hbox {d}m_0(x), \end{array}\right. } \end{aligned}$$
(24)

where

$$\begin{aligned} \begin{aligned} {\mathbf {B}}_{\rho ,h,r} [ H_{p} ( \cdot , D{{\tilde{u}}}_{\rho ,h}^{\epsilon } [\mu ] ] ( i,j,k ):=&\bigg [ \frac{\text{ e}^{-\lambda _r h} }{2d} \sum _{p=1}^d \Big (\beta _i \big ( \Phi ^{\epsilon , +}_{j,k,p} \big ) + \beta _i\big ( \Phi ^{\epsilon , -}_{j,k,p} \big ) \Big ) \\ {}&\ \ + \frac{1-\text{ e}^{-\lambda _r h}}{\lambda _r} \int _{\vert z\vert > r} \beta _i (x_j+z) \nu (\text{ d }z) \bigg ]. \end{aligned} \end{aligned}$$
(25)

The solution is a probability distribution on \({\mathcal {G}}_\rho \times h {\mathcal {N}}_h\), where \({\mathcal {N}}_h:=\{0,\dots ,N\}\):

Lemma 3.3

Let \(({{\tilde{m}}}_{i,k})\) be the solution of (24). If \(m_0\in P({\mathbb {R}}^d)\), then \(({{\tilde{m}}}_{i,k})_i\in P({\mathbb {Z}}^d)\), i.e., \({{\tilde{m}}}_{i,k}\ge 0\), \(i\in {\mathbb {Z}}^d\), and \(\sum _{j\in {\mathbb {Z}}^d} {{\tilde{m}}}_{j,k} =1\) for all \(k\in {\mathcal {N}}_h\).

Proof

First note that \({{\tilde{m}}}_{i,k}\ge 0\) follows directly from the definition of the scheme and \(m_{i,0}\ge 0\). Changing the order of summation and as \(\sum _i {\mathbf {B}}_{\rho ,h,r} [ H_{p} ( \cdot , D{{\tilde{u}}}_{\rho ,h}^{\epsilon } [\mu ]] ( i,j,k ) =1\), we find that

$$\begin{aligned} \sum _i {{\tilde{m}}}_{i,k+1} = \sum _{i} \sum _{j} {{\tilde{m}}}_{j,k} {\mathbf {B}}_{\rho ,h,r} [ H_{p} ( \cdot , D{{\tilde{u}}}_{\rho ,h}^{\epsilon } [\mu ] ] ( i,j,k ) = \sum _{j} {{\tilde{m}}}_{j,k}. \end{aligned}$$

The result follows by iteration since \(\sum _{j}{{\tilde{m}}}_{j,0}=1\). \(\square \)

We extend \(({{\tilde{m}}}_{i,k}[\mu ])\) to \({\mathbb {R}}^d\) by piecewise constant interpolation in x and then to [0, T] by linear interpolation in t: For \(t\in [t_k,t_{k+1}]\) and \(k\in {\mathcal {N}}_h\),

$$\begin{aligned} {{\tilde{m}}}_{\rho ,h}^{\epsilon }[\mu ](t,x)&:= \frac{t-t_k}{h} {{\tilde{m}}}_{\rho ,h}^{\epsilon }[\mu ](t_{k+1},x)+ \frac{t_{k+1}-t}{h} {{\tilde{m}}}_{\rho ,h}^{\epsilon }[\mu ](t_{k},x), \end{aligned}$$
(26)

where, \({{\tilde{m}}}_{\rho ,h}^{\epsilon }[\mu ](t_k,x) := \frac{1}{\rho ^d} \sum _{i\in {\mathbb {Z}}^d} {{\tilde{m}}}_{i,k}[\mu ] \, \mathbbm {1}_{E_i}(x)\). Note that \({{\tilde{m}}}_{\rho ,h}^{\epsilon }[\mu ] \in C([0,T],P({\mathbb {R}}^d))\) and the duality with the linear in x/constant in t interpolation used for \({\tilde{u}}_{\rho ,h}\) in (20).

3.5 Discretization of the Coupled MFG System

The discretization of the MFG system is obtained by coupling the two discretizations above by setting \(\mu ={{\tilde{m}}}^\epsilon _{\rho ,h}[\mu ]\). With this choice and \(u={{\tilde{u}}}[\mu ]\) and \(m={{\tilde{m}}}[\mu ]\), we get the following discretization of (1):

$$\begin{aligned} {\left\{ \begin{array}{ll} u_{i,k} = S_{\rho ,h,r} [m^\epsilon _{\rho ,h}] (u_{\cdot ,k+1},i,k), \\ u_{i,N} = G (x_i, m^\epsilon _{\rho ,h} (t_N)), \\ m_{i,k+1} = \sum _{j\in {\mathbb {Z}}^d} m_{j,k} \, {\mathbf {B}}_{\rho ,h,r} [ H_{p} ( \cdot , Du_{\rho ,h}^{\epsilon } ) ] ( i,j,k ), \\ m_{i,0} = \int _{E_i} \hbox {d}m_0(x), \end{array}\right. } \end{aligned}$$
(27)

where \(S_{\rho ,h,r}, {\mathbf {B}}_{\rho ,h,r}, u_{\rho ,h}^{\epsilon }, m^\epsilon _{\rho ,h}\) are defined above.

The individual discretizations are explicit, but due to the forward-backward nature of the coupling, the total discretization is not explicit. It yields a nonlinear system that must be solved by some method like, e.g., a fixed-point iteration or a Newton type method.

The approximation scheme (27) has a least one solution:

Proposition 3.4

(Existence for the discrete MFG system) Assume \((\nu \)0), (\(\nu \)1), (L1)–(L2), (F1)–(F2), (H1), and (M).

Then, there exist a pair \( ( u_{\rho ,h} , \ m_{\rho ,h}^{\epsilon })\) solving (27).

The proof of this result is non-constructive and given in Sect. 7.

4 Convergence to the MFG System

In this section, we give the main theoretical results of this paper, various convergence results as \(h,\rho ,\epsilon ,r \rightarrow 0\) under CFL-conditions. The proofs will be given in Sect. 7 and require results for the individual schemes given in Sects. 5 and 6.

4.1 Convergence to Viscosity-Very Weak Solutions

We consider degenerate and nondegenerate cases separately. For the degenerate case, the convergence holds only in dimension \(d=1\).

Theorem 4.1

(Degenerate case, \(d=1\)) Assume \((\nu \)0), (\(\nu \)1), (L1)–(L3), (F1)–(F3), (H1)–(H2), (M’), \(\{(u_{\rho ,h}, m^{\epsilon }_{\rho ,h})\}_{\rho ,h,\epsilon >0}\) are solutions of the discrete MFG system (27). If \(\rho _n,h_n,\epsilon _n,r_n\rightarrow 0\) under the CFL conditions \(\frac{\rho _n^2}{h_n},\frac{h_n}{r_n^{\sigma }},\frac{\sqrt{h_n}}{\epsilon _n}=o(1)\), then:

  1. (i)

    \(\{u_{\rho _n,h_n}\}_n\) is precompact in \(C_b([0,T]\times K)\) for every compact set \(K \subset {\mathbb {R}}\).

  2. (ii)

    \(\{m^{\epsilon _n}_{\rho _n,h_n}\}_n\) is sequentially precompact in \(C ( [ 0,T ], P ( {\mathbb {R}}) )\), and (a) in \(L^1\) weak if \(p \in ( 1,\infty )\) in (M’), or (b) in \(L^{\infty }\) weak \(*\) if \(p= \infty \) in (M’).

  3. (iii)

    If (um) is a limit point of \(\{(u_{\rho _n,h_n}, m^{\epsilon _n}_{\rho _n,h_n})\}_n\), then (um) is a viscosity-very weak solution of the MFG system (1).

Note that \(\{m^{\epsilon }_{\rho ,h}\}\) is precompact in \(C([0,T], P({\mathbb {R}}^d))\), just by assuming (M) for the initial distribution. But in the degenerate case, this is not enough for convergence of the MFG system, due to lower regularity of the solutions of the HJB equation (no longer \(C^1\)). Therefore, we need assumption (M’) and the stronger compactness given by Theorem 4.1(ii) part (a) or (b). This latter result, we are only able to show in \(d=1\).

In arbitrary dimensions, we assume more regularity on solutions of the HJB equation in (1):

(U):

Let u[m] be a viscosity solution of the HJB equation in (1). For any \(m \in C([0,T],P({\mathbb {R}}^d))\) and \(t\in (0,T)\), \(u[m](t)\in C^1({\mathbb {R}}^d)\).

Remark 4.2

Assumption (U) holds in nondegenerate cases, e.g., under assumption (\(\nu \)2), see Theorem 2.7 and the discussion below.

We have the following convergence result in arbitrary dimensions.

Theorem 4.3

(Nondegenerate case) Assume \((\nu \)0), (\(\nu \)1), (L1)–(L3), (F1)–(F3), (H1)–(H2), (U), (M), \(\{(u_{\rho ,h}, m^{\epsilon }_{\rho ,h})\}_{\rho ,h,\epsilon >0}\) are solutions of the discrete MFG system (27). If \(\rho _n,h_n,\epsilon _n,r_n\rightarrow 0\) under the CFL conditions \(\frac{\rho _n^2}{h_n},\frac{h_n}{r_n^{\sigma }},\frac{\sqrt{h_n}}{\epsilon _n}=o(1)\), then:

  1. (i)

    \(\{u_{\rho _n,h_n}\}_n\) is precompact in \(C_b([0,T]\times K)\) for every compact set \(K \subset {\mathbb {R}}^d\).

  2. (ii)

    \(\{m^{\epsilon _n}_{\rho _n,h_n}\}_n\) is precompact in \(C([0,T],P({\mathbb {R}}^d))\).

  3. (iii)

    If (um) is a limit point of \(\{(u_{\rho _n,h_n}, m^{\epsilon _n}_{\rho _n,h_n})\}_n\), then (um) is a viscosity-very weak solution of the MFG system (1).

These results give compactness of the approximations and convergence along subsequences. To be precise, by part (i) and (ii) there are convergent subsequences, and by part (iii) the corresponding limits are solutions of the MFG system (1).

We immediately have existence for (1).

Corollary 4.4

(Existence of solutions of (1)) Under the assumptions of either Theorem 4.1 or Theorem 4.3, there exists a viscosity-very weak solution (um) of the MFG system (1).

If in addition we have uniqueness for the MFG system (1), then we have full convergence of the sequence of approximations.

Corollary 4.5

Under the assumption of either Theorem 4.1 or Theorem 4.3, if the MFG system (1) has at most one viscosity-very weak solution, then the whole sequence \(\{(u_{\rho _n,h_n}, m^{\epsilon _n}_{\rho _n,h_n}) \}_n\) converges to a limit (um) which is the (unique) viscosity-very weak solution of the MFG system (1).

4.2 Convergence to Classical Solutions

In the case the individual equations are regularizing, we can get convergence to classical solutions of the MFG system. To be precise, we need:

  1. 1.

    (“Weak” uniqueness of individual PDEs) The HJB equation has unique viscosity solution, and the FPK equation has unique very weak solution.

  2. 2.

    (Smoothness of individual PDEs) Both equations have classical solutions.

This means that viscosity-very weak solutions of the MFG system automatically (by uniqueness for individual equations) are classical solutions. If in addition

  1. 3.

    (Classical uniqueness for MFG) classical solutions of the MFG system are unique,

we get full convergence of the approximate solutions to the solution of the MFG system.

We now give a precise result in the setting of [39], see Theorem 2.7 in Sect. 2 for existence and uniqueness of classical solutions of (1).

Corollary 4.6

Assume \((\nu \)0)–(\(\nu \)2), (L1)–(L3), (F1)–(F4), (H3)–(H4), and (M”). Let \((u_{\rho ,h}, m^{\epsilon }_{\rho ,h})\) be solutions of the discrete MFG system (27). If \(\rho _n,h_n,\epsilon _n,r_n\rightarrow 0\) under the CFL conditions \(\frac{\rho _n^2}{h_n},\frac{h_n}{r_n^{\sigma }},\frac{\sqrt{h_n}}{\epsilon _n}=o(1)\), then:

  1. (a)

    \(\{(u_{\rho _n,h_n}, m^{\epsilon _n}_{\rho _n,h_n})\}_n\) has a convergent subsequence in the space \(C_{b,\text {loc} }([0,T]\times {\mathbb {R}}^d)\times C ( [ 0,T ], P ( {\mathbb {R}}^d) )\), and any limit point is a classical–classical solution of (1).

  2. (b)

    If in addition (F5) and (H5) hold, then the whole sequence in (a) converges to the unique classical–classical solution (um) of (1).

Proof

  1. 1.

    Assumption (U) holds by Theorem 2.7, and then by Theorem 4.3, there is a convergent subsequence \(\{(u_{\rho _n, h_n}, m_{\rho _n,h_n}^{\epsilon _n} )\}_n\) such that \((u_{\rho _n, h_n}, m_{\rho _n,h_n}^{\epsilon _n} ) \rightarrow (u,m)\) and (um) is a viscosity-very weak solution of (1).

  2. 2.

    Since \(m \in C ([0,T] , P ({\mathbb {R}}^d))\), the viscosity solution u is unique by Proposition 2.5 (b) (see also [39, Theorem 5.3]). Hence, it coincides with the classical \(C_b^{1,3} ((0,T) \times {\mathbb {R}}^d)\) solution given by [39, Theorem 5.5].

  3. 3.

    Now \(D_p H (x,Du (t)) \in C_b^2 ({\mathbb {R}}^d)\) by part 2 and (H3), and then by Proposition 2.8 there is at most one very weak solution of the FPK equation. Hence, it coincides with the classical \(C_b^{1,2}((0,T) \times {\mathbb {R}}^d)\) solution given by [39, Proposition 6.8].

  4. 4.

    In addition if (F5) and (H5) hold, there is a most one classical solution (um) by Theorem 2.7 (b).

  5. 5.

    This shows (compactness, smoothness, and uniqueness) that all convergent subsequences of \(\{(u_{\rho _n, h_n}, m_{\rho _n,h_n}^{\epsilon _n} )\}_n\) have the same limit, and thus the whole sequence converges to (um), the unique classical solution of (1).\(\square \)

4.3 Extension and Discussion

4.4 Extension to More General Nonlocal Lévy Operators

The results of Theorems 4.1 and 4.3 hold under much more general assumptions on the Lévy operator \({\mathcal {L}}\). In [39], they use \((\nu \)0) together with the assumptions,

(\(\nu \)1\('\)):

There exists a constant \(c>0\) such that for every \(r\in (0,1),\) \( \displaystyle r^{-2+\sigma }\int _{\vert z\vert<r} \vert z\vert ^2 \hbox {d}\nu + r^{-1+\sigma }\int _{r<\vert z\vert<1} \vert z\vert \hbox {d}\nu + r^{\sigma }\int _{r<\vert z\vert <1} \hbox {d}\nu \le c.\)

(\(\nu \)2\('\)):

There are \(\sigma \in (1,2)\) and \({\mathcal {K}} >0\) such that the heat kernels \(K_\sigma \) and \(K_\sigma ^*\) of \(\mathcal L\) and \({\mathcal {L}}^*\) satisfy for \(K=K_\sigma ,K_\sigma ^*\) : \(K\ge 0\), \(\Vert K(t,\cdot )\Vert _{L^1({\mathbb {R}}^d)}=1\), and

$$\begin{aligned} \Vert D^{\beta } K (t,\cdot ) \Vert _{L^p ({\mathbb {R}}^d)} \le {\mathcal {K}} t^{-\frac{1}{\sigma }\big (\vert \beta \vert +(1-\frac{1}{p})d\big )}\quad \text {for }t\in (0,T) \end{aligned}$$

and any \(p\in [1,\infty )\) and multi-index \(\beta \in {\mathbb {N}}^{d}\cup \{0\}\).

where the heat kernel of the operator \({\mathcal {L}}\) is defined as the fundamental solution of the heat equation \(\partial _{t} u - {\mathcal {L}} u = 0\). These assumptions cover lots of new cases compared to \((\nu \)0), (\(\nu \)1), and (\(\nu \)2). New cases include (i) sums of operators satisfying (\(\nu \)1) on subspaces spanning \({\mathbb {R}}^d\), having possibly different orders, (ii) more general non-absolutely continuous Lévy measures, and (iii) Lévy measures supported on positive cones. An example of (i) (cf. [39]) is

$$\begin{aligned} {\mathcal {L}}=-\Big (\!-\frac{\partial ^2}{\partial x_1^2}\Big )^{\sigma _1/2}-\dots -\Big (\!-\frac{\partial ^2}{\partial x_d^2}\Big )^{\sigma _d/2}, \qquad \sigma _1,\dots ,\sigma _d\in (1,2), \end{aligned}$$

which satisfies (\(\nu \)1\('\)) with \(\sigma =\min _i\sigma _i\) and \(\hbox {d}\nu (z)= \sum _{i=1}^d\frac{dz_i}{\vert z_i\vert ^{1+\sigma _i}}\Pi _{j\ne i}\delta _0(dz_j)\). This is a sum of one-dimensional fractional Laplacians of different orders. An example of (iii) is given by the spectrally positive “fractional Laplacian" in one space dimension: \({\mathcal {L}} u = c_{\sigma } \int _{0}^{\infty } ( u ( x+z ) - u ( x ) - Du ( x ) \cdot z \mathbbm {1}_{\{z < 1\}} ) z^{-1-\sigma } \hbox {d}z\).

We have the following generalization of the well-posedness result for classical solutions given in Theorem 2.7.

Theorem 4.7

([39]) Theorem 2.7 holds when you replace (\(\nu \)1) – (\(\nu \)2) by (\(\nu \)1\('\)) – (\(\nu \)2\('\)).

It follows that (U) holds whenever Theorem 4.7 holds. Since (\(\nu \)1) implies (\(\nu \)1\('\)) and the integrals in (\(\nu \)1\('\)) are what appear in the different proofs, it is easy to check that all estimates in this paper are true for Lévy measures satisfying (\(\nu \)1\('\)) instead of (\(\nu \)1). This means that under assumption (\(\nu \)1\('\)) and (\(\nu \)2\('\)) we have the following extensions of Theorems 4.1 and 4.3 and Corollary 4.6.

Theorem 4.8

Theorem 4.1 holds when you replace (\(\nu \)1) with (\(\nu \)1\('\)).

Theorem 4.9

Theorem 4.3 holds when you replace (\(\nu \)1)–(\(\nu \)2) by (\(\nu \)1\('\))–(\(\nu \)2\('\)).

Corollary 4.10

Corollary 4.6 holds when you replace (\(\nu \)1)–(\(\nu \)2) by (\(\nu \)1\('\))–(\(\nu \)2\('\)).

4.5 Extension to Mixed Local–Nonlocal Operators

The results of this article can be extended for the MFG system involving mixed local and nonlocal diffusion operators. In this case, the underlying process replacing (6) would be, e.g.,

$$\begin{aligned} {\left\{ \begin{array}{ll} &{}\hbox {d}{{\tilde{X}}}_s = -\alpha _s\, \hbox {d}s + \int _{\vert z\vert <1} z {\tilde{N}} (\hbox {d}z,\hbox {d}s) + \int _{\vert z\vert \ge 1} z N (\hbox {d}z,\hbox {d}s) + a(s)\hbox {d}W_s, \quad s>t,\\ &{}{{\tilde{X}}}_t = x, \end{array}\right. } \end{aligned}$$

where \(W_s\) is a standard Brownian motion and a is continuous. The MFG system is then

$$\begin{aligned} {\left\{ \begin{array}{ll} -u_t - {\mathcal {L}} u - \frac{1}{2}\text {tr}[a(t) a(t)^T D^2u]+ H(x,Du) = F (x, m(t)), \, &{}\text { in } (0,T)\times {\mathbb {R}}^d, \\ m_t - {\mathcal {L}}^*m - \frac{1}{2}\text {tr}[a(t) a(t)^T D^2m]- \text {div} (m D_p H(x,Du)) = 0 \, &{}\text { in } (0,T)\times {\mathbb {R}}^d, \\ u (T,x) = G(x,m(T)), \ m(0) = m_0 \, &{}\text { in } {\mathbb {R}}^d, \end{array}\right. } \end{aligned}$$
(28)

where the operator \({\mathcal {L}}\) is defined in (2). A fully discrete approximation of (28) follows by combining Sect. 3 for the nonlocal part and papers [25, 27] for the local part. Under the assumptions of this paper, the proofs of existence and convergence follow in a similar way as here and in [25, 27]: In the degenerate case, the conclusion of Theorem 4.1 holds for the discretization of (28). In the nondegenerate case where \(a(t) a(t)^T \ge cI_d>0\), the solution of (28) is regular even without assuming (\(\nu \)2). Hence, in this case the conclusion of Theorem 4.3 holds for the discretization of (28).

4.6 The Wasserstein Metric \(d_1\) Versus our Metric \(d_0\)

The typical setting for the FPK equations in the MFG literature seems to be the metric space \( ( P_{1} ( {\mathbb {R}}^d) , d_{1} )\), that is the \(1-\)Wasserstein space \({\mathcal {W}}_{1}\) of probability measures with finite first moment. This is also the case in [27] where convergence results are given for SL schemes for local nondegenerate MFGs in \({\mathbb {R}}^d\). In this paper, we cannot assume finite first moments if we want to cover general nonlocal operators. An example is the fractional Laplacian \( -( - \Delta )^{\frac{\sigma }{2}}\) for \(\sigma < 1\), where the underlying \(\sigma \)-stable process only has finite moments of order less than \(\sigma \). Instead we consider the weaker metric space \((P ( {\mathbb {R}}^d), d_{0} )\), which is just a metrization of the weak (weak-* in \(C_{b}\)) convergence of probability measures (see [14, Chapter 8.3]). In this topology, we can consider processes, probability measures and solutions of the FPK equations that do not have any finite moments or any restrictions on the tail behavior of the corresponding Lévy measures.

Indeed, the following lemma shows that under additional assumptions convergence in \(d_0\) implies convergence in \(d_1\).

Lemma 4.11

If \(m_{n}\) converges to m in \((P ( {\mathbb {R}}^d),d_0)\) and \(m_{n}\) and m has uniformly bounded \((1+\delta )\)-moments for \(\delta >0\), then \(m_{n} \rightarrow m\) in \( ( P_{1} ( {\mathbb {R}}^d) , d_{1} ) \).

Convergence in \(P_{1} ( {\mathbb {R}}^d)\) [55, Definition 6.8] is by definition equivalent to weak convergence plus convergence of first moments, and the result follows from, e.g., Proposition 1.1 and Lemma 1.5 in [5].

We then have the following version of Theorems 4.1 and 4.3.

Corollary 4.12

Assume \(m_{0} \in P_{1+\delta } ( {\mathbb {R}} )\), \(\int _{{\mathbb {R}}^d\setminus B_{1}} \vert z\vert ^{1+\delta } d \nu ( z ) < \infty \) for some \(\delta >0\), and the assumptions of Theorems 4.1 and 4.3. Then, the statements of Theorems 4.1 and 4.3 hold if we replace \((P,d_0)\) by \((P_{1},d_1)\) in part (ii).

Note that the number of moments of m is determined by the number of moments of \(1_{\vert z\vert >1}\nu \) (and \(m_0\)), see, e.g., the discussion in section 2.3 in [39]. Moreover, if \(1_{\vert z\vert >1}\nu \) has at most \(\alpha \) finite moments, then \(\mathcal Lu\) is well defined only if u has at most order \(\alpha \) growth at infinity. Hence, in the nonlocal case there is “duality" between the moments of m and the growth of u. Note that um will always be integrable which is natural since then, e.g., \(Eu(X_t,t) =\int u(x,t)m(\hbox {d}x,t)\) is finite.

In our case, we assume no moments and have to work with bounded solutions u.

4.7 On Moments and Weak Compactness in \(L^p\) in the Degenerate Case

Previous results for Semi-Lagrangian schemes in the first-order and the degenerate second-order case [24, 25] cover the case \(m_{0} \in P_{1} ( {\mathbb {R}} ) \cap L^{\infty } ( {\mathbb {R}} )\), which means that \(m_{0}\) has finite first-moments. Our results assume \(m_{0} \in P ( {\mathbb {R}} ) \cap L^{p} ( {\mathbb {R}} )\), for \(p \in ( 1,\infty ]\), and hence no moment bounds and possibly unbounded \(m_{0}\). When \(p < \infty \) we have weak compactness in \(L^{1}\) instead of weak-* compactness in \(L^{\infty }\).

Since our results in the degenerate case allow for \({\mathcal {L}}=0\), they immediately give an extension to this \(P \cap L^{p}\) setting for the convergence results for first-order problems of [24]. Moreover, the same conditions, arguments, and results easily also hold in the local diffusive case considered in [25].

5 On the SL Scheme for the HJB Equation

We prove results for the numerical approximation of the HJB equation, including monotonicity, consistency, and different uniform a priori stability and regularity estimates. Using the “half-relaxed" limit method [12], we then show convergence in the form of \(v_{\rho _{n}, h_{n}} [ \mu _{n} ] ( t_{n},x_{n} ) \rightarrow v [ \mu ] ( t,x )\), where \(v [ \mu ]\) is the (viscosity) solution of the continuous HJB equation. Let \(B ( {\mathcal {G}}_{\rho })\) be the set of all bounded functions defined on \({\mathcal {G}}_{\rho }\).

Theorem 5.1

Assume \((\nu \)0), (L1), \(\rho ,h,r >0\), \(\mu \in C ( [0,T], P ( {\mathbb {R}}^d))\), and let \(S_{\rho ,h,r} [ \mu ]\) denote the scheme defined in (18).

  1. (i)

    (Bounded control) If \(\phi \in \text {Lip} ( {\mathbb {R}}^d )\), \(S_{\rho ,h,r} [ \mu ] ( \phi ,i,k )\) has a minimal control and \(\vert \alpha \vert \le K\) where K only depends on \(\Vert D_{x} \phi \Vert _{0} \) and the growth of L as \(\vert x\vert \rightarrow \infty \).

  2. (ii)

    (Monotonicity) For all \(v,w \in B ( {\mathcal {G}}_{\rho } ) \) with \(v\le w\) we have,

    $$\begin{aligned} S_{\rho ,h,r} [\mu ] (v,i,k) \le S_{\rho ,h,r} [\mu ] (w,i,k) \text { for all } i\in {\mathcal {G}}_{\rho }, \ k= 0,\ldots , N-1. \end{aligned}$$
  3. (iii)

    (Commutation by constant) For every \(c\in {\mathbb {R}}\) and \(w \in B ({\mathcal {G}}_{\rho } )\),

    $$\begin{aligned} S_{\rho ,h,r} [\mu ] (w+c,i,k) = S_{\rho ,h,r} [\mu ] (w,i,k) +c \text { for all } i\in {\mathcal {G}}_{\rho }, \ k= 0,\ldots , N-1. \end{aligned}$$

Assume also (\(\nu \)1) and (F2).

  1. (iv)

    (Consistency) Let \(\rho _{n},h_{n}, r_{n} \xrightarrow {n \rightarrow \infty } 0\) under CFL conditions \(\frac{\rho _{n}^{2}}{h_{n}},\frac{h_{n}}{ r_{n}^{\sigma }} = o ( 1 )\), grid points \((t_{k_{n}}, x_{i_{n}}) \rightarrow (t,x)\), and \(\mu _{n},\mu \in C ( [0,T]; P ({\mathbb {R}}^d))\) such that \(\mu _{n} \rightarrow \mu \). Then, for every \(\phi \in C_{c}^{\infty } ({\mathbb {R}}^d\times [0,T))\),

    $$\begin{aligned}&\lim _{n \rightarrow \infty }\frac{1}{h_{n}} \big [ \phi ( t_{k_{n}+1}, x_{i_{n}}) - S_{ \rho _{n}, h_{n}, r_n} [ \mu _{n} ] ( \phi _{\cdot ,k_{n}+1}, i_{n}, k_{n} ) \big ] \\&\quad = - \partial _{t} \phi ( t,x ) - \inf _{\alpha \in {\mathbb {R}}^d } \big [ L ( x, \alpha ) - D \phi \cdot \alpha \big ] - {\mathcal {L}} \phi ( x ) - F ( x, \mu ( t ) ). \end{aligned}$$

Proof

(i) Since

$$\begin{aligned} h ( \alpha ) := \frac{\hbox {e}^{-h \lambda _{r}}}{2d} \sum _{m=1}^{d} \big ( I [ \phi ] ( x_{i} + h \bar{b} ( \alpha ) + \sqrt{h} \sigma _{r}^{m}) + I [ \phi ] ( x_{i} + h \bar{b} ( \alpha ) - \sqrt{h} \sigma _{r}^{m}) \big ) \end{aligned}$$

is Lipschitz in \(\alpha \) (maximum linear growth at infinity), while \(L ( x, \alpha )\) is coercive (more than linear growth at infinity) by (L1), there exists a ball \(B_{R}\), where R depends on the Lipschitz constant of \(I [ \phi ]\) and the growth of L, such that the minimizing control \( \bar{\alpha }\) of \(S_{\rho ,h,r} [ \mu ] ( \phi ,i,k )\) belongs to \( B_{R}\).

(ii) and (iii) Follows directly from the definition of the scheme.

(iv) For ease of notation, we write \(\rho ,h,r,\mu \) instead of \(\rho _{n}, h_{n}, r_{n}, \mu _{n}\). A 4th order Taylor expansion of \(\phi \) gives

$$\begin{aligned}&\phi (x+h \bar{b}(\alpha ) \pm \sqrt{hd} \sigma _{r}^{m}) = \phi (x) + D \phi (x) \cdot (h \bar{b}(\alpha ) \pm \sqrt{hd} \sigma _r^{m} ) \\&\quad + \frac{hd}{2} ( \sigma _{r}^{m})^T D^2 \phi (x) \sigma _r^{m} \pm \sqrt{d}\,h^{\frac{3}{2}} b(\alpha )^{T} D^{2} \phi ( x ) \sigma _{r}^{m} + \frac{h^{2}}{2} \bar{b} ( \alpha )^{T} D^{2} \phi ( x ) \bar{b}( \alpha ) \\&\quad + \sum _{\vert \beta \vert = 3} \frac{D^{\beta } \phi ( x )}{\beta !} ( h \bar{b} ( \alpha ) \pm \sqrt{hd}\, \sigma _{r}^{m} )^{\beta } + \sum _{\vert \beta \vert = 4} \frac{D^{\beta } \phi ( \xi _{\pm } )}{\beta !} ( h \bar{b} ( \alpha ) \pm \sqrt{hd}\, \sigma _{r}^{m} )^{\beta }, \end{aligned}$$

for some \(\xi _{\pm } \in {\mathbb {R}}^{d}\). Using that \(\bar{b} ( \alpha ) = - \alpha - \int _{ r \le \vert z\vert \le 1 } z \nu ( \hbox {d}z ) \), and by (\(\nu \)1) \(\int _{r \le \vert z\vert \le 1} z \nu ( \hbox {d}z ) = O ( r^{1-\sigma } )\), we get that

$$\begin{aligned}&\phi (x+h \bar{b}(\alpha ) + \sqrt{hd} \sigma _{r}^{m}) + \phi (x+h \bar{b}(\alpha ) - \sqrt{hd} \sigma _{r}^{m}) -2\phi (x)\nonumber \\ {}&\quad = -2 h D \phi ( x ) \cdot \alpha - 2h \int _{ r< \vert z\vert <1 } D \phi ( x ) \cdot z \nu ( \text{ d }z ) + hd ( \sigma _{r}^{m})^T \cdot D^2\phi (x) \cdot \sigma _r^{m} \nonumber \\ {}&\, \qquad + {\mathcal {O}}\big (h^2r^{2-2 \sigma }\big ). \end{aligned}$$
(29)

We used that \(\frac{h^{2}}{2} \bar{b} ( \alpha )^{T} D^{2} \phi ( x ) \bar{b}( \alpha )\) is of order \({\mathcal {O}} ( h^{2} r^{2-2 \sigma } )\), the 3rd order terms are of order \({\mathcal {O}} ( h^{3} r^{3- 3 \sigma } + h^{2} r^{1-\sigma })\), and the 4th order terms are of order \(( \sqrt{hd} \sigma _{r} )^{4} = {\mathcal {O}} (h^2 r^{4-2 \sigma } )\). Then, the error of the Taylor expansion is \(O ( h^{2} r^{2- 2 \sigma } )\). Using Lemma 3.1,

$$\begin{aligned}&\phi ( x_{i} ) - S_{\rho ,h,r} [ \mu ] ( \phi , i,k )\nonumber \\&\quad = \phi ( x_{i} ) - \inf _{\alpha } \bigg [ h F ( x_{i}, \mu ( t_{k+1} ) ) + h L (x_{i}, \alpha ) + \frac{\hbox {e}^{-h \lambda _r}}{2d} \sum _{m=1}^{d} \Big ( 2\phi ( x_{i} ) - 2h D \phi ( x_{i} ) \cdot \alpha \nonumber \\&\qquad + h d ( \sigma _{r}^{m} )^{T} D^{2} \phi ( x_{i} ) \sigma _{r}^{m} - 2 h \int _{ r< \vert z\vert< 1 } D \phi ( x_{i} ) \cdot z \nu ( \hbox {d}z ) \Big ) \nonumber \\&\qquad + \frac{1-\hbox {e}^{-h\lambda _r}}{\lambda _r} \int _{\vert z\vert> r} \phi (x_{i} + z ) \nu (\hbox {d}z) + {\mathcal {O}} ( \rho ^{2} ) + {\mathcal {O}}(h^2 r^{2-2\sigma } )\bigg ] \nonumber \\&\quad = \ h F ( x_{i}, \mu ( t_{k+1} ) ) - \inf _{\alpha } \bigg [ h L ( x_{i}, \alpha ) - h \hbox {e}^{-h \lambda _{r}} D \phi ( x_{i} ) \cdot \alpha \bigg ] + ( 1 - \hbox {e}^{-h \lambda _{r}}) \phi ( x_{i}) \nonumber \\&\qquad - h \hbox {e}^{-h \lambda _{r}} \Big ({\mathcal {L}}_{r} \phi ( x_{i} ) +{\mathcal {O}}(r^{3-\sigma })\Big ) + h \hbox {e}^{-h \lambda _{r}} \int _{ r< \vert z\vert < 1 } D \phi ( x_{i} ) \cdot z \nu ( \hbox {d}z ) \nonumber \\&\qquad - \frac{1-\hbox {e}^{-h\lambda _r}}{\lambda _r} \int _{\vert z\vert > r} \phi (x_{i} + z ) \nu (\hbox {d}z) + {\mathcal {O}} ( \rho ^{2} + h^2 r^{2-2\sigma }). \end{aligned}$$
(30)

Using that \(\int _{\vert z\vert <r} \vert z\vert ^2 \nu (\hbox {d}z)\le K r^{2-\sigma }\) (by (\(\nu \)1)), for the small jump operator \({\mathcal {L}}_{r}\) (defined in (12)) we have

$$\begin{aligned} \vert {\mathcal {L}}_r\phi (x_i) - \hbox {e}^{-h\lambda _r}{\mathcal {L}}_r\phi (x_i)\vert \le h\lambda _r \, r^{2-\sigma }\Vert D^2\phi \Vert _{0}. \end{aligned}$$
(31)

Again, as \(\int _{r<\vert z\vert <1} \vert z\vert \nu (\hbox {d}z)\le K r^{1-\sigma }\) and \(\int _{\vert z\vert >1} \nu (\hbox {d}z)\le K\), for the long jump operator \({\mathcal {L}}^{r}\) (defined in (12)) we have that

$$\begin{aligned}&\Big \vert {\mathcal {L}}^r\phi (x_i) +\hbox {e}^{-h \lambda _{r}} \int _{ r<\vert z\vert < 1} D \phi ( x_{i} ) \cdot z \nu ( \hbox {d}z ) \nonumber \\&\qquad - \frac{1-\hbox {e}^{-h\lambda _r}}{h \lambda _r} \int _{\vert z\vert > r} (\phi (x_{i} + z ) - \phi (x_i)) \nu (\hbox {d}z)\Big \vert \nonumber \\&\quad \le K (1-\hbox {e}^{-h \lambda _{r}}) r^{1-\sigma } \Vert D\phi \Vert _0 + K \Big (1-\frac{1-\hbox {e}^{-h\lambda _r}}{h \lambda _r}\Big ) \Big ( r^{1-\sigma } \Vert D\phi \Vert _0 + \Vert \phi \Vert _0 \Big ) \nonumber \\&\quad \le K \Big ( h \lambda _r r^{1-\sigma } \Vert D\phi \Vert _0 + h\lambda _r \Vert \phi \Vert _0\Big ). \end{aligned}$$
(32)

Recalling that \({\mathcal {L}} \phi ( x_{i} ) = {\mathcal {L}}_{r} \phi ( x_{i} ) + {\mathcal {L}}^{r} \phi ( x_{i} ) \), combining (30) with (31) and (32), we find

$$\begin{aligned} \phi ( x_{i} ) - S_{\rho ,h,r} [ \mu ] ( \phi , i,k )&= h F ( x_{i}, \mu ( t_{k+1} ) ) - h \inf _{\alpha } \bigg [ L ( x_{i}, \alpha ) - D \phi ( x_{i} ) \cdot \alpha \bigg ] \\&\quad - h {\mathcal {L}} \phi ( x_{i} )\\&\quad + {\mathcal {O}} \big ( h^2\lambda _r + hr^{3-\sigma } + h^2 \lambda _r r^{1-\sigma } + \rho ^2 + h^2 r^{2-2\sigma }\big ). \end{aligned}$$

As \(\vert \lambda _r\vert \le C r^{-\sigma }\), we have

$$\begin{aligned}&\frac{\phi (t_{k},x_i)- \phi (t_{k+1},x_i)}{h} + \frac{1}{h} \Big (\phi (t_{k+1},x_i) - S_{ \rho , h} [ \mu ] ( \phi _{\cdot ,k+1}, i, k )\Big ) \\&\quad = - \partial _t \phi (t_{k},x_i) - {\mathcal {L}} \phi (t_{k+1}, x_{i} ) + F ( x_{i}, \mu ( t_{k+1} ) ) \\&\qquad - \inf _{\alpha } \bigg [ L (x_{i}, \alpha ) - D \phi ( t_{k+1},x_{i} ) \cdot \alpha \bigg ] \\&\qquad + {\mathcal {O}} \big (h+ h r^{-\sigma } + r^{3-\sigma } + h r^{1-2\sigma } + \frac{\rho ^2}{h} + h r^{2-2\sigma } \big ). \end{aligned}$$

Hence, the result follows by taking the limit \(n \rightarrow \infty \) with \(\frac{\rho _{n}^2}{h_{n}} ,\frac{h_{n}}{r_{n}^{\sigma }} = o ( 1 )\). \(\square \)

Theorem 5.2

(Comparison) Assume \(\mu _1,\mu _2\in C([0,T],P({\mathbb {R}}^d))\), \((\nu \)0), and (L1). Let \(u_{\rho ,h}[\mu _1]\) and \({u}_{\rho ,h}[\mu _2] \) be defined by the scheme (20) for \(\mu =\mu _1,\mu _2\), respectively. Then,

$$\begin{aligned}&\Vert u_{\rho ,h}[\mu _1] - u_{\rho ,h}[\mu _2] \Vert _{0} \\&\quad \le T \Vert F(\cdot ,\mu _1) - F(\cdot ,\mu _2) \Vert _{0} + \Vert G(\cdot ,\mu _1(T)) - G(\cdot ,\mu _2(T)) \Vert _{0} . \end{aligned}$$

Proof

Let \(c^{\pm }_{m}(\alpha ) := h \bar{b} (\alpha ) \pm \sqrt{hd} \sigma _r^m\), and note that

$$\begin{aligned} I [u_{\cdot , k+1}[\mu _1]] ( x) - I[u _{\cdot ,k+1}[\mu _2]]( x) =&\sum _{p \in {\mathbb {Z}}^d} \beta _p (x) (u_{p,k+1}[\mu _1]- u _{p,k+1}[\mu _2] ) . \end{aligned}$$
(33)

By (18) and the definition of \(\inf \), for any \(\epsilon >0\), there is \(\alpha _{\epsilon } \in {\mathbb {R}}^d\) such that

$$\begin{aligned}&u_{i,k} [\mu _2] \ge \,h F ( x_i,\mu _2(t_{k}) ) + h L (x_{i}, \alpha _{\epsilon }) + \frac{\hbox {e}^{-h \lambda _r}}{2d} \sum _{m=1}^d \Big [ I [ u_{\cdot , k+1}[\mu _2] ] (x_i + c^{+}_{m}(\alpha _{\epsilon })) \nonumber \\&\quad + I [ u_{\cdot ,k+1}[\mu _2] ] (x_i+ c^{-}_{m}(\alpha _{\epsilon })) \Big ] + \frac{1-\hbox {e}^{-h\lambda _r}}{\lambda _r} \int _{\vert z\vert \ge r} I [ u_{\cdot ,k+1}[\mu _2] ] (x_i + z) \nu (\hbox {d}z) - \epsilon . \end{aligned}$$
(34)

We then find, using (18), (33), (34),

$$\begin{aligned}&u_{i,k}[\mu _1] - u_{i,k}[\mu _2] \le h \big ( F ( x_i,\mu _1(t_{k}) ) - F ( x_i,\mu _2(t_{k}) \big ) + h ( L (x_{i}, \alpha _{\epsilon }) - L (x_{i}, \alpha _{\epsilon }) ) \\&\qquad + \sum _{p \in {\mathbb {Z}}^d} \bigg [ \frac{\hbox {e}^{-h \lambda _r}}{2d} \sum _{m=1}^d \Big (\beta _p ( c^{+}_{m}(\alpha _{\epsilon })) + \beta _p( c^{-}_{m}(\alpha _{\epsilon })) \Big )\big (u_{p+i,k+1}[\mu _1] - u_{p+i,k+1}[\mu _2] \big ) \\&\qquad + \frac{1-\hbox {e}^{-h\lambda _r}}{\lambda _r} \int _{\vert z\vert \ge r} \beta _p (z) \big (u_{p+i,k+1}[\mu _1] - u_{p+i,k+1}[\mu _2]\big ) \nu (\hbox {d}z) \bigg ] + \epsilon \\&\quad \le h \Vert F(\cdot ,\mu _1) - F(\cdot ,\mu _2) \Vert _{0} + c\sup _i \vert u_{i,k+1} - {\tilde{u}}_{i,k+1}\vert + \epsilon , \end{aligned}$$

where since \(\sum _p \beta _p\equiv 1\),

$$\begin{aligned} c=&\frac{\text{ e}^{-h \lambda _r}}{2d} \sum _{m=1}^d\sum _{p \in {\mathbb {Z}}^d} \Big (\beta _p ( c^{+}_{m}(\alpha _{\epsilon })) + \beta _p( c^{-}_{i}(\alpha _{\epsilon })) \Big ) \\ {}&\ + \frac{1-\text{ e}^{-h\lambda _r}}{\lambda _r} \int _{\vert z\vert \ge r} \sum _{p \in {\mathbb {Z}}^d}\beta _p (z) \nu (\text{ d }z) =1. \end{aligned}$$

Since \(\vert u_{i,N}[\mu _1] - u_{i,N}[\mu _2] \vert \le \Vert G(\cdot ,\mu _1(t_N)) - G(\cdot ,\mu _2(t_N)) \Vert _0 \), a symmetry and iteration argument shows that

$$\begin{aligned}&\big \vert u_{i,k}[\mu _1] - u _{i,k}[\mu _2]\big \vert \\&\quad \le ( N-k ) h \Vert F(\cdot ,\mu _1) - F(\cdot ,\mu _2) \Vert _0 + \Vert G(\cdot ,\mu _1(t_N)) - G(\cdot ,\mu _2(t_N)) \Vert _0 . \end{aligned}$$

The result then follows from interpolation and \(T = Nh\). \(\square \)

The SL scheme is very stable in the sense that we have uniform in \(h,\rho ,\mu \) boundedness, Lipschitz continuity, and semi-concavity of the solutions \(u_{i,k}\).

Lemma 5.3

Let \(\mu \in C ( [0,T], P ( {\mathbb {R}}^d))\) and \(u_{i,k}[\mu ]\) be defined by the scheme (18).

  1. (a)

    (Lipschitz continuity) Assume \((\nu \)0), (L2), and (F2). Then,

    $$\begin{aligned} \frac{\vert u_{i,k} - u_{j,k} \vert }{\vert x_{i} - x_{j}\vert } \le ( L_{F} + L_{L} )(T-t_k) + L_G , \quad i,j \in {\mathbb {Z}}^{d}, \ k \in \{ 0,1, \ldots N \}. \end{aligned}$$
  2. (b)

    (Semi-concavity) Assume \((\nu \)0) , (L3) and (F3). Then

    $$\begin{aligned} \frac{u_{i+j,k} \!-\! 2 u_{i,k} \!+\! u_{i-j,k}}{ \vert x_j\vert ^2} \!\le \! ( c_F +c_L ) (T-t_k) \!+\! c_G, \quad i,j \in {\mathbb {Z}}^{d}, \ k \in \{ 0,1, \ldots N. \end{aligned}$$
  3. (c)

    (Uniformly bounded) Assume \((\nu \)0), (L0)–(L2), (F1), and (F2). Then,

    $$\begin{aligned} \vert u_{i,k}\vert \le (C_{F}+C_{L}(K))(T-t_k) + C_G, \quad i,j \in {\mathbb {Z}}^{d}, \ k \in \{ 0,1, \ldots N \}, \end{aligned}$$

    where K is defined in Theorem 5.1 (i).

Proof

(a) Note that since \(\beta _{m} (x_{j}+x) = \beta _{m-j} (x)\),

$$\begin{aligned} I [u_{\cdot , k+1}] (x_{j} + x) - I[u_{\cdot ,k+1}](x_i + x) =&\sum _{p \in {\mathbb {Z}}^d} \beta _p (x) (u_{p+j,k+1} - u_{p+i,k+1} ) . \end{aligned}$$
(35)

Then, by (L2), (F2), and similar computations as in Theorem 5.2, we find that

$$\begin{aligned} u_{j,k} - u_{i,k} \le h(L_f+L_L) \vert x_i-x_j\vert + \sup _j \vert u_{i,k+1}-u_{j,k+1}\vert + \epsilon , \end{aligned}$$

Since \(\vert u_{i,N+1} - u_{j,N+1} \vert = \vert G (x_i, m(t_{N+1}) ) - G (x_j, m(t_{N+1}) ) \vert \le L_G \vert x_i - x_j\vert \) by (F2), the result follows by iteration.

(b) Similar to (35) we see

$$\begin{aligned}&I [u_{\cdot , k+1}] (x_{i+j} + x) - 2 I[u_{\cdot ,k+1}](x_i + x) + I [u_{\cdot , k+1}] (x_{i-j} + x) \\&\quad = \sum _{p \in {\mathbb {Z}}^d} \beta _p (x_i +x) (u_{p+j,k+1} - 2 u_{p,k+1} + u_{p-j, k+1} ) . \end{aligned}$$

Then, by (L3), (F3), and similar computations as in Theorem 5.2, we find that

$$\begin{aligned} u_{i+j,k} - 2 u_{i,k} + u_{i-j,k} \le ( c_L + c_{F} )h \vert x_j\vert ^2 + \sup _i(u_{i+j,k+1} - 2 u_{i,k+1} + u_{i-j,k+1}). \end{aligned}$$

Since \(u_{i+j,N} - 2 u_{i,N} + u_{i-j,N}\le c_G\vert x_j\vert ^2\) by (F3), the result follows by iteration.

(c) By part (a) and Theorem 5.1 (i), \(\vert \alpha \vert \le K\), and then a direct calculation shows that

$$\begin{aligned} - \sup _{\vert \alpha \vert \le K}\Big ( h (\vert F\vert + \vert L\vert ) + \sup _j\vert u_{j,k+1}\vert \Big )\le u_{i,k} \le \sup _{\vert \alpha \vert \le K}\Big ( h (\vert F\vert +\vert L\vert ) + \sup _j\vert u_{j,k+1}\vert \Big ). \end{aligned}$$

The result follows from (L1) and (F1). \(\square \)

Theorem 5.4

(Convergence of the HJB scheme) Assume \((\nu \)0), (\(\nu \)1), (F1), (F2), (L2), \(\rho _{n},h_{n}, r_{n} \xrightarrow {n \rightarrow \infty } 0\) under CFL conditions \(\frac{\rho _{n}^{2}}{h_{n}} ,\frac{h_{n}}{r_{n}^{\sigma }} = o ( 1 )\), \(\mu _n\rightarrow \mu \) in \(C([0,T],P({\mathbb {R}}^d))\), and \(u_{\rho _{n},h_{n}}[\mu _n]\) is the solution of the scheme (18) defined by (20). Then, there is a continuous bounded function \(u[\mu ]\) such that \(u_{\rho _{n},h_{n}}[\mu _n]\rightarrow u[\mu ]\) locally uniformly in \({\mathbb {R}}^d\times [ 0,T ]\), and \(u[\mu ]\) is the viscosity solution of the HJB equation in (1) for \(m=\mu \).

Proof

The result follows from the Barles–Perthame–Souganidis relaxed limit method [12], using the monotonicity, consistency, and \(L^\infty \)-stability properties of the scheme (cf. Theorem 5.1 (ii), (iii), and Lemma 5.3 (c)), and the strong comparison principle for the HJB equation in Proposition 2.5 (a).

We refer to the proof of [24, Theorem 3.3] for a standard but more detailed proof in a similar case. \(\square \)

We recall that the continuous extensions \(u_{\rho , h} [ \mu ] ( t,x )\) and \(u_{\rho , h}^{\epsilon } [ \mu ] ( t,x )\) are defined in (20) and (22), respectively. The results of Lemma 5.3 transfer to \(u_{\rho ,h}^{\epsilon } [ \mu ] ( t,x )\).

Lemma 5.5

Let \(\mu \in C ( [0,T], P ( {\mathbb {R}}^d))\) and \(u_{\rho ,h}^{\epsilon } [ \mu ]\) be given by (22).

  1. (a)

    (Lipschitz continuity) Assume \((\nu \)0), (L2) and (F2). Then,

    $$\begin{aligned} \big \vert u_{\rho , h}^{\epsilon } [ \mu ] ( t,x ) - u_{\rho , h}^{\epsilon } [ \mu ] ( t,y ) \big \vert \le ( ( L_{L} + L_{F} )T + L_{G} ) \vert x-y\vert . \end{aligned}$$
  2. (b)

    (Approximate semiconcavity) Assume \((\nu \)0), (L2),(L3), (F2), and (F3). Then, there exists a constant \(c_1>0\), independent of \(\rho , h, \epsilon \) and \(\mu \), such that

    $$\begin{aligned}&u_{\rho , h}^{\epsilon } [ \mu ] ( t, x + y ) - 2 u_{\rho , h}^{\epsilon } [ \mu ] ( t,x ) + u_{\rho , h}^{\epsilon } [ \mu ] ( t,x - y )\\&\quad \le c_{1} ( \vert y\vert ^{2} + \rho ^{2} + \frac{\rho ^2}{\epsilon } ), \ \text{ and } \\&\langle D u_{\rho , h}^{\epsilon } [ \mu ] ( t,y ) - D u_{\rho , h}^{\epsilon } [ \mu ] ( t,x ) , y-x \rangle \le c_{1} \Big (\vert x-y\vert ^{2} + \frac{\rho ^2}{\epsilon ^2}\Big ). \end{aligned}$$
  3. (c)

    Assume \(d=1\), \((\nu \)0), (L3), and (F3). Then, there exists a constant \(c_{2} >0\), independent of \(\rho , h, \epsilon \) and \(\mu \), such that for each \(i,j \in {\mathbb {Z}}^d\) and \(k \in {\mathcal {N}}_h\)

    $$\begin{aligned} \langle D u_{\rho , h}^{\epsilon } [ \mu ] ( t_k,x_j ) - D u_{\rho , h}^{\epsilon } [ \mu ] ( t_k,x_i ) , x_j-x_i \rangle \le c_{2} \vert x_j-x_i\vert ^{2} . \end{aligned}$$

Proof

  1. (a)

    Since \(u_{i,k}\) satisfies the discrete Lipschitz bound of Lemma 5.3 (a), \(u_{\rho ,h} [ \mu ]\) is Lipschitz with same Lipschitz constant as \(u_{i,k}\) by properties of linear interpolation, and \(u_{\rho ,h}^{\epsilon } [ \mu ]\) is Lipschitz with same constant as \(u_{\rho ,h} [ \mu ]\) by properties of mollifiers (Lemma 3.2).

  2. (b)

    For \(i,j \in {\mathbb {Z}}^{d}\), we have by Lemma 5.3 (b), \(u_{i+j} + u_{i-j} - 2 u_{i} \le c \vert x_{j}\vert ^{2}.\) Multiplying both sides by \(\beta _{i} ( x )\), and summing over \({\mathbb {Z}}^{d}\), we get

    $$\begin{aligned} u_{\rho ,h} ( x + x_{j} ) + u_{\rho ,h} ( x-x_{j} ) -2 u_{\rho ,h} ( x ) \le c \vert x_{j}\vert ^{2}. \end{aligned}$$

    Letting \(x \rightarrow x - z\), multiplying by a positive mollifier \(\rho _{\epsilon } ( z )\) and integrating, we get

    $$\begin{aligned} u_{\rho ,h}^{\epsilon } ( x + x_{j} ) + u_{\rho ,h}^{\epsilon } ( x-x_{j} ) -2 u_{\rho ,h}^{\epsilon } ( x ) \le c \vert x_{j}\vert ^{2}. \end{aligned}$$

    We multiply both sides with \(\beta _{j} ( y )\), and sum over \({\mathbb {Z}}^{d}\),

    $$\begin{aligned} I [ u_{\rho ,h}^{\epsilon } ] ( x+y ) + I [ u_{\rho ,h}^{\epsilon } ] ( x-y ) - 2 I [ u_{\rho ,h}^{\epsilon } ] ( x ) \le c I [ \vert \cdot \vert ^{2} ] ( y ) \le c ( \vert y\vert ^{2} + \rho ^{2} ). \end{aligned}$$

    By Lemma 3.2 and part (a), we have that \(\vert I [ u_{\rho ,h}^{\epsilon } ] ( \xi ) - u_{\rho ,h}^{\epsilon } ( \xi )\vert \le K \Vert D^{2} u_{\rho ,h}^{\epsilon } \Vert _{0} \rho ^{2} \le K \frac{\rho ^{2}}{\epsilon }\), where the Lipschitz bound K depends on the constants in (L2) and (F2). Thus,

    $$\begin{aligned} u_{\rho ,h}^{\epsilon } ( x+y ) + u_{\rho ,h}^{\epsilon } ( x-y ) - 2 u_{\rho ,h}^{\epsilon } ( x ) \le c ( \vert y\vert ^{2} + \rho ^{2} + \frac{\rho ^{2}}{\epsilon } ). \end{aligned}$$

    The second part of (b) then follows as in [3, Remark 6].

  3. (c)

    The proof is given in [24, Lemma 3.6]. \(\square \)

Under our assumptions, the continuous HJB equation has a (viscosity) solution \(u ( t ) \in W^{1,\infty } ( {\mathbb {R}}^d )\), that is, the derivative exists almost everywhere [39, Theorem 4.3]. We have the following result for \(Du_{\rho ,h}^{\epsilon } [ \mu ]\).

Theorem 5.6

Assume \((\nu \)0), (\(\nu \)1), (L1)–(L2), (F1)–(F2), \(\rho _{n},h_{n}, r_{n}, \epsilon _n \xrightarrow {n \rightarrow \infty } 0\) under CFL conditions \(\frac{\rho _{n}^{2}}{h_{n}} ,\frac{h_{n}}{r_{n}^{\sigma }} = o ( 1 )\), and \(\mu _n\rightarrow \mu \) in \(C([0,T],P({\mathbb {R}}^d))\). Let \(u_{\rho _{n},h_{n}}^{\epsilon _n}[\mu _n]\) be defined by (22) and \(u[\mu ]\) the viscosity solution of the HJB equation in (1) for \(m=\mu \). Then

  1. (i)

    \(u_{\rho _{n}, h_{n}}^{\epsilon _{n}} [ \mu _{n} ] \rightarrow u [ \mu ]\) locally uniformly,

  2. (ii)

    Assume also (L3), (F3) and \(\frac{\rho _n}{\epsilon _n}=o(1)\). Then \(Du_{\rho _{n}, h_{n}}^{\epsilon _{n}} [ \mu _{n} ] ( t,x ) \rightarrow Du [ \mu ] ( t,x )\) whenever Du(tx) exists, that is, the convergence is almost everywhere.

  3. (iii)

    Assume also (L3), (F3), \(\frac{\rho _n}{\epsilon _n}=o(1)\), and (U). Then, \(Du_{\rho _{n}, h_{n}}^{\epsilon _{n}} [ \mu _{n} ] \rightarrow Du [ \mu ]\) locally uniformly.

Proof

  1. (i)

    This follows from the convergence result Theorem 5.4 and Lemma 3.2.

  2. (ii)

    and (iii). We refer to [24, Theorem 3.5] and [27, Proposition 5.1]. Estimates from Lemma 5.5 are needed. For completeness, we give the proof in Appendix A. \(\square \)

6 On the Dual SL Scheme for the FPK Equation

In this section, we establish more properties of the discrete FPK equation (24), including tightness, equicontinuity in time, \(L^1\)-stability of solutions with respect to \(\mu \), and \(L^p\)-bounds in dimension \(d=1\). To prove tightness, we will use a result from [33].

Proposition 6.1

Assume \((\nu \)0) and (M). Then, there exists a function \(0\le \Psi \in C^{2}({\mathbb {R}}^d)\) with \(\Vert D\Psi \Vert _{0},\) \(\Vert D^2\Psi \Vert _{0} < \infty \), and \(\displaystyle \lim _{\vert x\vert \rightarrow \infty } \Psi (x) = \infty \), such that

$$\begin{aligned} \sup _{x\in {\mathbb {R}}^d}\Big \vert \int _{\vert z\vert >1} \big (\Psi (x+z) - \Psi (z) \big )\nu (\hbox {d}z)\Big \vert< \infty \quad \text{ and } \quad \int _{{\mathbb {R}}^d} \Psi (x) \, m_0(\hbox {d}x)<\infty . \end{aligned}$$
(36)

Proof

We use [33, Lemma 4.9] on the family of measures \(\{\nu _1 , m_0\}\), where \(\nu _1\) is defined in (11), to get a function \(\Psi (x) = V_0(\sqrt{1+\vert x\vert ^2})\) such that \(V_0:[0,\infty )\rightarrow [0,\infty )\) is a non-decreasing sub-additive function, \(\Vert V_0'\Vert _{0}, \Vert V_0''\Vert _0<\infty \), \(\displaystyle \lim _{x\rightarrow \infty } V_0(x) = \infty \), and

$$\begin{aligned} \int _{{\mathbb {R}}^d} \Psi (x) \, \mu (\hbox {d}x)<\infty \qquad \text{ for } \qquad \mu \in \{\nu _1 , m_0\}. \end{aligned}$$

We immediately get the result except for the first part of (36). But this estimate follows from sub-additivity and \(\nu _1\)-integrability of \(V_0\), see [33, Lemma 4.13 (ii)]. \(\square \)

Remark 6.2

(a) If \(\frac{\hbox {d}\nu }{\hbox {d}z}\le \frac{C}{\vert z\vert ^{d+\sigma _1}}\) for \(\vert z\vert >1\) and \(\int _{{\mathbb {R}}^d} \vert x\vert ^{\sigma _2} \, m_0(\hbox {d}x)< \infty \) for \(\sigma _1,\sigma _2>0\), then \(\Psi (z)= \log (\sqrt{1+\vert z\vert ^2})\) is a possible explicit choice for the function in Proposition 6.1.

(b) Since \(\Psi \in C^2({\mathbb {R}}^d)\), the first part of (36) is equivalent to \(\Vert {\mathcal {L}}\Psi \Vert _0 < \infty \) (see [33, Lemma 4.13 (ii)]).

Lemma 6.3

Assume \(\{\mu _\alpha \}_{\alpha \in A}\subset P({\mathbb {R}}^d)\) and there exists a function \(0\le \psi \in C({\mathbb {R}}^d)\) such that \(\lim _{\vert x\vert \rightarrow \infty } \psi (x)=\infty \) and \(\sup _{\alpha }\int _{{\mathbb {R}}^d} \psi (x) \mu _\alpha (\hbox {d}x) \le C\). Then, \(\{\mu _\alpha \}_{\alpha }\) is tight.

This result is classical and can be proved in a similar way as the Chebyshev inequality.

Theorem 6.4

(Tightness) Assume \((\nu \)0), (\(\nu \)1), (L1)–(L2), (F2), (H1), (M), the CFL condition \(\frac{\rho ^2}{h},hr^{1-2\sigma }={\mathcal {O}}(1)\), \(\mu \in C ( [0,T], P ( {\mathbb {R}}^d))\), and \(m^{\epsilon }_{\rho ,h}[\mu ]\) is defined by (26). Take \(\Psi \) as in Proposition 6.1. Then, there exists \(C>0\), independent of \(\rho , h, \epsilon \) and \(\mu \), such that

$$\begin{aligned} \int _{{\mathbb {R}}^d} \Psi (x) \, dm^{\epsilon }_{\rho ,h}[\mu ](t) \le C \qquad \text{ for } \text{ any } \qquad t\in [0,T]. \end{aligned}$$

Proof

Essentially we start by multiplying the scheme (24) by \(\Psi \) and integrating in space. By the definition of \(m^{\epsilon }_{\rho ,h}=m^{\epsilon }_{\rho ,h}[\mu ]\) in (26) and (24), we find that

$$\begin{aligned} \int _{{\mathbb {R}}^d} \Psi (x) \hbox {d}m^{\epsilon }_{\rho ,h}(t_{k+1})&= \frac{1}{\rho ^d}\sum _{i \in {\mathbb {Z}}^d} m_{i,k+1} \int _{E_i} \Psi (x) \hbox {d}x \\&= \sum _{i \in {\mathbb {Z}}^d} \frac{1}{\rho ^d} \int _{E_i} \Psi (x) \hbox {d}x \sum _{j} m_{j,k} \, {\mathbf {B}}_{\rho ,h,r} [ H_{p} ( \cdot , Du_{\rho ,h}^{\epsilon } ) ] ( i,j,k ). \end{aligned}$$

By the definition of \({\mathbf {B}}_{\rho ,h,r}\) in (25) and interchanging the order of summation and integration, we have

$$\begin{aligned}&\int _{{\mathbb {R}}^d} \Psi (x) \hbox {d}\,m^{\epsilon }_{\rho ,h}(t_{k+1})\\&\quad = \sum _{j\in {\mathbb {Z}}^d} \frac{m_{j,k}}{\rho ^{d}} \bigg [ \frac{1-\hbox {e}^{-\lambda _r h}}{\lambda _r}\int _{\vert z\vert >r} \sum _{i\in {\mathbb {Z}}^d} \int _{E_i}\Psi (x) \beta _i(x_j +z) \hbox {d}x \, \nu (\hbox {d}z) \\&\qquad +\frac{\hbox {e}^{-\lambda _r h}}{2d}\sum _{p=1}^d\sum _{i\in {\mathbb {Z}}^d} \int _{E_i} \Psi (x) \big (\beta _i(\Phi ^{\epsilon ,+}_{j,k,p})+ \beta _i(\Phi ^{\epsilon ,-}_{j,k,p})\big ) \hbox {d}x \bigg ]. \end{aligned}$$

Since \(\Psi \in C^2({\mathbb {R}}^d)\), by properties of midpoint approximation and linear/multilinear interpolation we have \(\big \vert \frac{1}{\rho ^d} \int _{E_i} \Psi (x) \hbox {d}x - \Psi (x_i)\big \vert + \big \vert \Psi (x) - \sum _{i\in {\mathbb {Z}}^d} \beta _i(x) \Psi (x_i)\big \vert \le {\mathcal {O}}(\rho ^{2})\). Therefore

$$\begin{aligned} \int _{{\mathbb {R}}^d} \Psi (x) \hbox {d}\,m^{\epsilon }_{\rho ,h}(t_{k+1})&\le \sum _{j\in {\mathbb {Z}}^d} m_{j,k}\bigg [\frac{\hbox {e}^{-\lambda _r h}}{2d} \sum _{p=1}^d \big ( \Psi \big (\Phi ^{\epsilon ,+}_{j,k,p}\big ) + \Psi \big (\Phi ^{\epsilon ,-}_{j,k,p}\big ) \big ) \nonumber \\&\quad + \frac{1-\hbox {e}^{-\lambda _r h}}{\lambda _r}\int _{\vert z\vert >r} \Psi (x_j+z) \, \nu (\hbox {d}z)\bigg ] + {\mathcal {O}}(\rho ^{2}). \end{aligned}$$
(37)

We estimate the terms on the right hand side. Let \(\Phi ^{\epsilon ,\pm }_{j,k,p}= x_j\pm a^{\pm }_{h,j}\) where

$$\begin{aligned}&a^{\pm }_{h,j}= h\,\Big ( D_p H \big (x_j, Du^{\epsilon }_{\rho ,h}(t_k,x_j)\big ) + B_r^{\sigma }\Big ) \pm \sqrt{h} \sigma _r^p. \end{aligned}$$
(38)

By the fundamental theorem of Calculus,

$$\begin{aligned} \Psi (x_j-a^{+}_{h,j}) + \Psi (x_j-a^{-}_{h,j}) = 2 \Psi (x_j) - (a^{+}_{h,j} + a^{-}_{h,j})\cdot D \Psi (x_j) +E_1 \end{aligned}$$
(39)

where \(a^{+}_{h,j} + a^{-}_{h,j} = 2 h\,\big ( D_p H \big (x_j, Du^{\epsilon }_{\rho ,h}(t_k,x_j)\big ) + B_r^{\sigma }\big )\) and

$$\begin{aligned} E_1=&- \int _{0}^1 \Big [a^{+}_{h,j} \cdot \big (D\Psi (x_j-t a^{+}_{h,j}) - D \Psi (x_j)\big ) \\&\quad + a^{-}_{h,j} \cdot \big (D\Psi (x_j-t a^{-}_{h,j}) - D \Psi (x_j)\big )\Big ] \hbox {d}t. \end{aligned}$$

By Lemma 5.5 (a) and (H1), we find that \(\Vert D_p H(\cdot , Du_{\rho ,h}^{\epsilon })\Vert _0\le C_R\) with \(R=( L_{L} + L_{F} )T + L_{G}+1\), and then that

$$\begin{aligned} \vert E_1\vert&\le \Vert D^2 \Psi \Vert _{0} ( \vert a^{+}_{h,j}\vert ^2 + \vert a^{-}_{h,j}\vert ^2 ) \le 4 \Vert D^2 \Psi \Vert _{0} \big ( h^2 (C_R^2+\vert B_r^{\sigma }\vert ^2) + h\vert \sigma _r^{p}\vert ^2 \big ). \end{aligned}$$

To estimate the nonlocal term, we write

$$\begin{aligned}&\int _{\vert z\vert>r} \Psi (x_j+z) \, \nu (\hbox {d}z) = \int _{\vert z\vert>1} \Psi (x_j+z) \nu (\hbox {d}z)\\&\qquad + \int _{r<\vert z\vert<1} \Big \{\Psi (x_j) + z\cdot D\Psi (x_j) + \int _0^1 z \cdot \Big [ D \Psi (x_j+tz) - D \Psi (x_j)\Big ]\hbox {d}t \Big \} \, \nu (\hbox {d}z) \\&\quad \le \Big \vert \int _{\vert z\vert >1} \big (\Psi (x_j+z) - \Psi (x_j) \big ) \nu (\hbox {d}z)\Big \vert + \lambda _r \Psi (x_j) + B_r^{\sigma } \cdot D\Psi (x_j) \\&\qquad + \Vert D^2\Psi \Vert _{0} \int _{r<\vert z\vert <1} \vert z\vert ^2 \nu (\hbox {d}z) \\&\quad \le \, \lambda _r \Psi (x_j) + B_r^{\sigma } \cdot D\Psi (x_j) + E_2, \end{aligned}$$

where \(E_2\) is finite and independent of \(\rho , h,\epsilon \) by Proposition 6.1 and \(\int _{\vert z\vert<1} \vert z\vert ^2 \nu (\hbox {d}z)<\infty \). Going back to (37) and using the above estimates then leads to

$$\begin{aligned}&\int _{{\mathbb {R}}^d} \Psi (x) \hbox {d}\,m^{\epsilon }_{\rho ,h}(t_{k+1}) \\&\quad \le \sum _{j \in {\mathbb {Z}}^d} m_{j,k} \bigg [ \frac{\hbox {e}^{-\lambda _r h}}{2d} \sum _{p=1}^d\Big ( 2 \Psi (x_j) - 2 h\,\big [ D_p H \big (x_j, Du^{\epsilon }_{\rho ,h}(t_k,x_j)\big ) + B_r^{\sigma }\big ] \cdot D\Psi (x_j) \\&\qquad + \vert E_1\vert \Big ) + \frac{1-\hbox {e}^{-\lambda _r h}}{\lambda _r} \Big (\lambda _r \Psi (x_j) + B_r^{\sigma } \cdot D\Psi (x_j) + E_2 \Big ) \bigg ] + C \rho ^2 \\&\quad \le \sum _{j\in {\mathbb {Z}}^d} m_{j,k}\, \Psi (x_j) +C \Big ( h^2 \lambda _r \vert B_r^{\sigma }\vert + h^2 \vert B_r^{\sigma }\vert ^2+ h + \rho ^{2}\Big ), \end{aligned}$$

where we used \(\vert -h\hbox {e}^{-\lambda _rh} +\frac{1-\hbox {e}^{-\lambda _r h}}{\lambda _r}\vert \le \frac{3}{2} \lambda _r h^2\) and \(\frac{1-\hbox {e}^{-\lambda _r h}}{\lambda _r}\le h\) to get the last inequality.

With \(A_{k+1} = \int _{{\mathbb {R}}^d} \Psi (x) d\,m^{\epsilon }_{\rho ,h}(t_{k+1})\), the above estimate becomes \(A_{k+1} \le A_k + E\) where \(E=C (\lambda _r h^2 \vert B_r^{\sigma }\vert + h^2 \vert B_r^{\sigma }\vert ^2 + h + \rho ^{2})\). By iteration, \( \vert B^{\sigma }_r\vert ^2\le \lambda _r \vert B_r^{\sigma }\vert \le C r^{1-2\sigma }\) (by \((\nu \)0), (\(\nu \)1)), and \(k\le N \le \frac{C}{h}\), we find that

$$\begin{aligned} A_{k+1}&\le \, A_0 + (k+1) E \le A_0 + C\Big ( h r^{1-2\sigma } +1 +\frac{\rho ^{2}}{h} \Big ) . \end{aligned}$$
(40)

By assumption \(\frac{\rho ^{2}}{h},h r^{1-2\sigma }={\mathcal {O}}(1)\), and by Proposition 6.1, \(A_0 = \int _{{\mathbb {R}}^d} \Psi (x) d\,m_0<\infty \). Therefore

$$\begin{aligned} \int _{{\mathbb {R}}^d} \Psi (x) \hbox {d}\,m^{\epsilon }_{\rho ,h}(t_k) \le C \qquad \text{ for } \qquad k=0,1,\dots ,N , \end{aligned}$$

for some constant \(C>0\) independent of \(\rho ,h,\epsilon , \mu \), and hence by (26) the result follows for \(t\in [0,T]\). \(\square \)

Theorem 6.5

(Equicontinuity in time) Assume \((\nu \)0), (\(\nu \)1), (L1)–(L2), (F2), (H1), (M), \(\mu \in C ( [0,T], P ( {\mathbb {R}}^d))\), and \(m^{\epsilon }_{\rho ,h}[\mu ]\) is defined by (26). Let \(\frac{\rho ^{2}}{h},\frac{h}{r^{\sigma }}={\mathcal {O}}(1)\) if \(\sigma \in (0,1)\), or \(\frac{\rho ^{2}}{h},h r^{1-2\sigma }={\mathcal {O}}(1)\) if \(\sigma \in (1,2)\). Then, there exists a constant \(C_0>0\), independent of \(\rho , h, \epsilon \) and \(\mu \), such that for any \(t_1,t_2 \in [0,T]\),

$$\begin{aligned} d_0 (m^\epsilon _{\rho ,h}[\mu ](t_1),m^\epsilon _{\rho ,h}[\mu ](t_2)) \le C_0 \sqrt{\vert t_1-t_2\vert }. \end{aligned}$$

Proof

We start by the case \(\sigma >1\). For \(\delta >0\), let \(\phi _{\delta }:= \phi *\rho _{\delta }\) for \(\rho _{\delta }\) defined just before Lemma 3.2. With \(m^\epsilon _{\rho ,h}=m^\epsilon _{\rho ,h}[\mu ]\) we first note that

$$\begin{aligned}&d_0 (m^\epsilon _{\rho ,h}(t_1), m^\epsilon _{\rho ,h}(t_2))= \sup _{\phi \in \text{ Lip}_{1,1}}\int _{{\mathbb {R}}^d} \phi (x) (m^\epsilon _{\rho ,h}(t_1)-m^\epsilon _{\rho ,h}(t_2))\hbox {d}x \nonumber \\&\quad = \sup _{\phi \in \text{ Lip}_{1,1}}\Big \{\int _{{\mathbb {R}}^d}(\phi -\phi _\delta )(m^\epsilon _{\rho ,h}(t_1)-m^\epsilon _{\rho ,h}(t_2))\hbox {d}x \nonumber \\&\qquad + \int _{{\mathbb {R}}^d} \phi _\delta \, (m^\epsilon _{\rho ,h}(t_1)-m^\epsilon _{\rho ,h}(t_2))\hbox {d}x\Big \} \nonumber \\&\quad \le \, 2 \delta \Vert D\phi \Vert _0 + \sup _{\phi \in \text{ Lip}_{1,1}}\int _{{\mathbb {R}}^d} \phi _\delta \, (m^\epsilon _{\rho ,h}(t_1)-m^\epsilon _{\rho ,h}(t_2))\hbox {d}x, \end{aligned}$$
(41)

where Lemma 3.2 was used to estimate the \(\phi -\phi _\delta \) term and \(\int m^\epsilon _{\rho ,h}\hbox {d}x=1\). Since \(m^\epsilon _{\rho ,h}\) and \(\int _{{\mathbb {R}}^d} \phi _{\delta }(x) m^\epsilon _{\rho ,h}(t,x)\hbox {d}x\) are affine on each interval \([t_k,t_{k+1}]\), \(\int _{{\mathbb {R}}^d} \phi _{\delta }(x) \, m^\epsilon _{\rho ,h}(\cdot ,x)\hbox {d}x\in W^{1,\infty }[0,T]\) and

$$\begin{aligned} \Big \Vert \frac{\hbox {d}}{\hbox {d}t} \int _{{\mathbb {R}}^d} \phi _{\delta }(x) \, m^\epsilon _{\rho ,h}(\cdot ,x) \hbox {d}x\Big \Vert _{0} \le \sup _k \vert I_k\vert . \end{aligned}$$

where \(I_k= \int _{{\mathbb {R}}^d} \phi _{\delta }(x) \, \frac{m^\epsilon _{\rho ,h}(t_{k+1},x)-m^\epsilon _{\rho ,h}(t_k,x)}{h}\hbox {d}x\). It follows that

$$\begin{aligned}&\int _{{\mathbb {R}}^d} \phi _{\delta } \, (m^{\epsilon }_{\rho ,h}(t_1,x)- m^{\epsilon }_{\rho ,h}(t_2,x))\hbox {d}x \le \vert t_1-t_2\vert \sup _k \vert I_k\vert . \end{aligned}$$
(42)

Let us estimate \(I_k\). By (26), (24), (25), the midpoint quadrature approximation error bound, and the linear/multi-linear interpolation error bound, we have

$$\begin{aligned} I_k&=\frac{1}{h} \sum _i \frac{1}{\rho ^d} \int _{E_i} \phi _{\delta }(x)\, \hbox {d}x[m_{i,k+1} - m_{i,k}]\\&= \frac{1}{h\rho ^d} \sum _{j,i} \Big (\int _{E_i} \phi _{\delta }(x) \hbox {d}x \Big )\Big [ m_{j,k} \, {\mathbf {B}}_{\rho ,h,r} [ H_{p} ( \cdot , Du_{\rho ,h}^{\epsilon } ) ] ( i,j,k ) - m_{i,k}\, \delta _{i,j}\Big ] \\&= \frac{1}{h} \sum _{j} m_{j,k} \Big [ \sum _i \phi _{\delta }(x_i){\mathbf {B}}_{\rho ,h,r} [ H_{p} ( \cdot , Du_{\rho ,h}^{\epsilon } ) ] ( i,j,k ) - \phi _{\delta }(x_j) + C\Vert D^2 \phi _{\delta }\Vert _{0}\rho ^2 \Big ] \\&= \frac{1}{h} \sum _j m_{j,k}\Big [ \frac{\hbox {e}^{-\lambda _r h}}{2d} \Big ( \sum ^d_{p=1}\phi _{\delta }(\Phi ^{\epsilon ,+}_{j,k,p}) + \phi _{\delta } (\Phi ^{\epsilon ,-}_{j,k,p}) -2\phi _{\delta }(x_j)\Big ) \\&\quad + \frac{1-\hbox {e}^{-\lambda _r h}}{\lambda _r}\int _{\vert z \vert > r} \big (\phi _{\delta }(x_j + z)-\phi _{\delta }(x_j) \big )\nu (\hbox {d}z) + C\Vert D^2 \phi _{\delta }\Vert _{0}\rho ^2 \Big ] . \end{aligned}$$

Since \(\Phi ^{\epsilon ,\pm }_{j,k,p}= x_j + a^{\pm }_{h,j}\) by (38), a second-order Taylor’s expansion gives us

$$\begin{aligned} \big \vert I_k \big \vert&\le \frac{1}{h}\sum _j m_{j,k}\bigg [\hbox {e}^{-\lambda _r h}\Big ( (-h D_p H \big (x_j, Du^{\epsilon }_{\rho ,h}[\mu ](t_k,x_j)\big ) -hB_r^{\sigma })\cdot D\phi _{\delta }(x_j) \\&\quad +\frac{\Vert D^2\phi _{\delta }\Vert _{0}}{2d}\sum ^d_{p=1} \big (\vert a^{+}_{h,j}\vert ^ 2+\vert a^{-}_{h,j}\vert ^2\big ) + \frac{1-\hbox {e}^{-\lambda _r h}}{\lambda _r}\Big (B^\sigma _r \cdot D\phi _{\delta }(x_j) \\&\quad + \Vert D^2\phi _{\delta }\Vert _{0}\int _{r< \vert z \vert <1}\vert z\vert ^2 \nu (\hbox {d}z) + 2\Vert \phi _{\delta }\Vert _{0} \int _{\vert z\vert >1} \nu (\hbox {d}z) + C\Vert D^2 \phi _{\delta }\Vert _{0}\rho ^2 \Big ) \bigg ] \\&\le \, \frac{1}{h} \bigg [\Big (h \Vert D_pH(\cdot ,Du_{\rho ,h}^{\epsilon }) \Vert _0 + h^2 \lambda _r \vert B^\sigma _r\vert \Big )\Vert D\phi _{\delta }\Vert _0 + c_3 h {\Vert \phi _{\delta } \Vert }_0\\&\quad + c_1 \Big (h^2 {\Vert D_pH(\cdot ,Du_{\rho ,h}^{\epsilon }) \Vert }^2 + h^2 {\vert B^\sigma _r \vert }^2 + h{\vert \sigma _r\vert }^2 + h + \rho ^2\Big ) {\Vert D^2 \phi _{\delta } \Vert }_0 \bigg ] \sum _j m_{j,k}. \end{aligned}$$

The above inequality follows since \((\frac{1-\hbox {e}^{-\lambda _r h}}{\lambda _r} -h \hbox {e}^{-h\lambda _r}) \le h^2\lambda _r\) (used for the \(B_r^{\sigma }\cdot D\phi _{\delta }\)-terms), and \(\int _{r<\vert z \vert <1} {\vert z \vert }^2 \nu (\hbox {d}z) + \int _{\vert z\vert >1} \nu (\hbox {d}z) \le C\) independently of r by \((\nu \)0) and (\(\nu \)1). By Lemma 5.5 (a) and (H1), \(\Vert D_pH(\cdot , Du_{\rho ,h}^{\epsilon })\Vert _0\le C_R\) with \(R=( L_{L} + L_{F} )T + L_{G}+1\). Since \(\sum m_{j,k} =1\), \(\phi \in Lip _{1,1}\), \(\Vert D^2\phi _{\delta }\Vert _{0} \le \frac{\Vert D\phi \Vert _{0}}{\delta }\), and \( \vert B^{\sigma }_r\vert ^2\le \lambda _r \vert B_r^{\sigma }\vert \le K r^{1-2\sigma }\) (by \((\nu \)0), (\(\nu \)1)), we get that

$$\begin{aligned} \vert I_k\vert \le C(1+h r ^{1-2 \sigma } ) + C\big (1 + h +h r ^{1-2 \sigma } + \frac{\rho ^2}{h}\big ) \frac{1}{\delta }. \end{aligned}$$

To conclude the proof in the case \(\sigma >1\), we go back to (41) and (42). In view of the above estimate on \(I_k\) and the assumption that \(\frac{\rho ^2}{h},h r^{1-2\sigma }={\mathcal {O}}(1)\), we find that

$$\begin{aligned} d_0(m^{\epsilon }_{\rho ,h}(t_1),m^{\epsilon }_{\rho ,h}(t_2))&\le 2\delta + C \vert t_1-t_2\vert \Big (1+\frac{1}{\delta }\Big ). \end{aligned}$$

Finally taking \( \delta = \sqrt{\vert t_1-t_2\vert }\), we get \(d_0(m^{\epsilon }_{\rho ,h}(t_1),m^{\epsilon }_{\rho ,h}(t_2)) \le C\sqrt{\vert t_1-t_2\vert }.\)

When \(\sigma <1\), we find that \(\vert B_r^{\sigma }\vert \le C\) and hence that

$$\begin{aligned} \vert I_k\vert \le C(1+h r ^{-\sigma } ) + C\big ( 1 + h +h r ^{-\sigma } + \frac{\rho ^2}{h}\big ) \frac{1}{\delta }. \end{aligned}$$

By assumption \(h r^{-\sigma }+ \frac{\rho ^2}{h}= {\mathcal {O}}(1)\), so again we find that

$$\begin{aligned} d_0(m^{\epsilon }_{\rho ,h}(t_1),m^{\epsilon }_{\rho ,h}(t_2)) \le 2\delta + C \vert t_1-t_2\vert \Big (1+\frac{1}{\delta }\Big ), \end{aligned}$$

and can conclude as before. \(\square \)

We also need a \(L^1\)-stability result for \(m^{\epsilon }_{\rho ,h}[\mu ]\) with respect to variations in \(\mu \).

Lemma 6.6

(\(L^{1}\)-stability) Assume \((\nu \)0), (H1), and \(m^{\epsilon }_{\rho ,h}[\mu ]\) is defined by (26). Then for \(\mu _1,\mu _2 \in C([0,T],P({\mathbb {R}}^d))\),

$$\begin{aligned}&\sup _{t \in [ 0,T ]} \Vert m^{\epsilon }_{\rho ,h}[\mu _1] ( t, \cdot ) - m^{\epsilon }_{\rho ,h}[\mu _2] ( t,\cdot ) \Vert _{L^{1} ( {\mathbb {R}}^d)} \\&\quad \le \frac{cKT}{\rho } \hbox {e}^{- h \lambda _{r}} \big \Vert D_p H(\cdot , Du^{\epsilon }_{\rho ,h}[\mu _1]) - D_p H(\cdot , Du^{\epsilon }_{\rho ,h}[\mu _2]) \big \Vert _{0} . \end{aligned}$$

Proof

Let \(\alpha =D_p H(\cdot , Du^{\epsilon }_{\rho ,h}[\mu _1])\), \({\tilde{\alpha }}=D_p H(\cdot , Du^{\epsilon }_{\rho ,h}[\mu _2])\), \(m_{j,k}=m_{j,k}[\mu _1]\), and \({{\tilde{m}}}_{j,k}=m_{j,k}[\mu _2]\). By (25) and Lemma 3.3, \({\mathbf {B}}_{\rho ,h,r} [ \alpha ] ( i,j,k ) \ge 0\) and \(m_{j,k} \ge 0 \), so that

$$\begin{aligned}&\sum _{i} \big \vert m_{i,k+1} - {\tilde{m}}_{i,k+1} \big \vert = \sum _{i} \big \vert \sum _{j} ( m_{j,k} \, {\mathbf {B}}_{\rho ,h,r} [ \alpha ] ( i,j,k ) - {{\tilde{m}}}_{j,k} \, {\mathbf {B}}_{\rho ,h,r} [ \tilde{\alpha } ] ( i,j,k ) ) \big \vert \\ {}&\quad \le \sum _{i} \sum _{j} \Big ( m_{j,k} \big \vert {\mathbf {B}}_{\rho ,h,r} [ \alpha ] ( i,j,k ) - {\mathbf {B}}_{\rho ,h,r} [ \tilde{\alpha } ] ( i,j,k ) \big \vert \\ {}&\qquad \ \quad \qquad + \big \vert m_{j,k}- {\tilde{m}}_{j,k} \big \vert {\mathbf {B}}_{\rho ,h,r} [ \tilde{\alpha } ] ( i,j,k ) \Big ) . \end{aligned}$$

Since \(\sum _{i} {\mathbf {B}}_{\rho ,h,r} [ \tilde{\alpha } ] ( i,j,k ) = 1\) (follows from \(\sum _i\beta _i=1\) and (25)),

$$\begin{aligned} \sum _{i} \sum _{j} \big \vert m_{j,k} - {\tilde{m}}_{j,k} \big \vert {\mathbf {B}}_{\rho ,h,r} [ \tilde{\alpha } ] ( i,j,k ) = \sum _{j} \big \vert m_{j,k} - {\tilde{m}}_{j,k} \big \vert . \end{aligned}$$

Moreover, since only a finite number \(K_{d}\) of \(\beta _{i}\)’s are nonzero at any given point, \(\beta _{i}\) is Lipschitz with constant \(\frac{c}{\rho }\), and \(\sum _{j} m_{j,k} = 1\) by Lemma 3.3, by the definitions of \({\mathbf {B}}_{\rho ,h,r}\) (25) and \(\Phi _{j,k,p}^{ \pm }\) (23),

$$\begin{aligned}&\sum _{i} \sum _{j} m_{j,k} \big \vert {\mathbf {B}}_{\rho ,h,r} [ \alpha ] ( i,j,k ) - {\mathbf {B}}_{\rho ,h,r} [ \tilde{\alpha } ] ( i,j,k ) \big \vert \\&\quad \le \sum _{j} m_{j,k} \frac{\hbox {e}^{-h \lambda _{r}}}{2d} \sum _{p=1}^{d} \sum _i \, \big \vert \beta _{i} ( \Phi _{j,k,p}^{+} [\mu _1] ) - \beta _{i} ( \Phi _{j,k,p}^{+} [\mu _2] ) \\&\qquad + \beta _{i} ( \Phi _{j,k,p}^{-} [\mu _1] ) - \beta _{i} ( \Phi _{j,k,p}^{-} [\mu _2] ) \big \vert \le K_{d} \frac{c h \hbox {e}^{-h \lambda _{r}} }{\rho } \Vert \alpha - \tilde{\alpha } \Vert _{0} . \end{aligned}$$

An iteration then shows that

$$\begin{aligned} \sum _{i} \big \vert m_{i,k+1} - {\tilde{m}}_{i,k+1} \big \vert \le \sum _{i} \big \vert m_{i,0} - {\tilde{m}}_{i,0} \big \vert + \frac{cK_{d}T}{\rho } \hbox {e}^{- h \lambda _{r}} \Vert \alpha - \tilde{\alpha } \Vert _{0} . \end{aligned}$$

Since \( m_{i,0} = {\tilde{m}}_{i,0}=\int _{E_i}m_0 \, \hbox {d}x\), the result follows by interpolation. \(\square \)

We end this section by a uniform \(L^p\)-bound on \(m^{\epsilon }_{\rho ,h}\) in dimension \(d=1\).

Theorem 6.7

(\(L^p\) bounds) Assume \(d=1\), \((\nu \)0), (\(\nu \)1), (L1), (L3), (F3), (H2), (M’), \(\mu \in C ( [0,T], P ( {\mathbb {R}}^d))\), and \(m^{\epsilon }_{\rho ,h}[\mu ]\) be defined by (26). Then, \(m_{\rho ,h}^{\epsilon }[\mu ] \in L^p({\mathbb {R}})\) and there exists a constant \(K>0\) independent of \(\epsilon ,h, \rho \) and \(\mu \) such that

$$\begin{aligned} \Vert m^{\epsilon }_{\rho , h}[\mu ](\cdot , t)\Vert _{L^{p}({\mathbb {R}})} \le \hbox {e}^{KT} \Vert m_{0}\Vert _{L^{p}({\mathbb {R}})}. \end{aligned}$$

To prove the theorem, we need few technical lemmas.

Lemma 6.8

Assume \(d=1\), \((\nu \)0), (\(\nu \)1), (L1), (L3), (F3), and (H2). There exists a constant \(c_0>0\) independent of \(\rho ,h,\epsilon ,\mu \) such that

$$\begin{aligned} \Big (D_pH\big (x_j,Du^{\epsilon }_{\rho ,h}(t_k,x_j)\big )- D_p H \big (x_i, Du^{\epsilon }_{\rho ,h}(t_k,x_i)\big )\Big ) (x_j-x_i) \le c_0 \vert x_j-x_i\vert ^2. \end{aligned}$$

Proof

By (L1) and (H2) for \(R = ( ( L_{F} + L_{L} )T + L_G) +1\), we have

$$\begin{aligned}&\Big (D_pH\big (x_j,Du^{\epsilon }_{\rho ,h}(t_k,x_j)\big )- D_p H \big (x_i, Du^{\epsilon }_{\rho ,h}(t_k,x_i)\big )\Big ) (x_j-x_i) \\ {}&\quad = (x_j-x_i)\int _0^1 \frac{\text{ d }}{\text{ d }t} \Big ( D_pH\big (x_j,t \, Du^{\epsilon }_{\rho ,h}(t_k,x_j) + (1-t)Du^{\epsilon }_{\rho ,h}(t_k,x_i)\big ) \Big ) \text{ d }t \\ {}&\qquad \ + (x_j-x_i) \Big (D_pH\big (x_j,Du^{\epsilon }_{\rho ,h}(t_k,x_i)\big )- D_p H \big (x_i, Du^{\epsilon }_{\rho ,h}(t_k,x_i)\big )\Big ) \\ {}&\quad = (x_j-x_i)\int _0^1 D_{pp} H\Big (x_j,t \, Du^{\epsilon }_{\rho ,h}(t_k,x_j) \\ {}&\qquad \ + (1-t)Du^{\epsilon }_{\rho ,h}(t_k,x_i)\Big ) \big (Du^{\epsilon }_{\rho ,h}(t_k,x_j) - Du^{\epsilon }_{\rho ,h}(t_k,x_i)\big ) \text{ d }t \\ {}&\qquad \ + (x_j-x_i) \Big (D_pH\big (x_j,Du^{\epsilon }_{\rho ,h}(t_k,x_i)\big )- D_p H \big (x_i, Du^{\epsilon }_{\rho ,h}(t_k,x_i)\big )\Big ) \\ {}&\quad \le C_R \, c_2 \vert x_j-x_i\vert ^2 +C_R \vert x_j-x_i\vert ^2, \end{aligned}$$

where the last inequality follows from convexity of H (since L is convex by (L1)), semiconcavity of \(u^{\epsilon }_{\rho ,h}\) in Lemma 5.5 (c), and regularity of H in (H2). \(\square \)

Lemma 6.9

Assume \(d=1\), \((\nu \)0), (\(\nu \)1), (L1), (L3), (F3), (H2), \(\mu \in C ( [0,T], P ( {\mathbb {R}}^d))\), and let \(\Phi ^{\epsilon ,\pm }_{j,k}[\mu ]\) be defined in (23). There exists a constant \(K_0>0\) independent of \(\epsilon , \rho , h,\mu \), such that for all \(i \in {\mathbb {Z}}\) and \(k={\mathcal {N}}_h\),

$$\begin{aligned} \max \Big \{ \sum _{j\in {\mathbb {Z}}}\beta _i(\Phi ^{\epsilon ,+}_{j,k})[\mu ],\sum _{j\in {\mathbb {Z}}} \beta _i(\Phi ^{\epsilon ,-}_{j,k})[\mu ] \Big \} \le 1+K_0 h. \end{aligned}$$

The proof of this result is similar to the proof of [24, Lemma 3.8]—a slightly expanded proof is given in “Appendix B.” A similar result holds for the integral-term:

Lemma 6.10

Assume \(d=1\). Then, we have

$$\begin{aligned} \frac{1}{\lambda _r} \sum _{j\in {\mathbb {Z}}} \int _{\vert z\vert >r} \beta _i(x_j+z) \nu (\hbox {d}z) =1. \end{aligned}$$

Proof

By (11) and properties of the basis functions \(\beta _j\), we have

$$\begin{aligned} \frac{1}{\lambda _r} \sum _{j\in {\mathbb {Z}}} \int _{\vert z\vert>r} \beta _i(x_j+z) \nu (\hbox {d}z) = \frac{1}{\lambda _r} \int _{\vert z\vert>r} \sum _{j\in {\mathbb {Z}}} \beta _{i-j}(z) \nu (\hbox {d}z) =\frac{1}{\lambda _r} \int _{\vert z\vert >r} \nu (\hbox {d}z) =1. \qquad \quad \end{aligned}$$

\(\square \)

Proof of Theorem 6.7

By definition of \(m^{\epsilon }_{\rho , h}\) in (26) and the scheme (24),

$$\begin{aligned}&\int _{{\mathbb {R}}} ( m^{\epsilon }_{\rho , h}(x, t_{k+1}))^p \hbox {d}x = \int _{{\mathbb {R}}} \Big ( \frac{1}{\rho } \sum _{i} m_{i,k+1} \mathbbm {1}_{E_i}(x)\Big )^p \hbox {d}x \\&\quad = \frac{1}{\rho ^{p-1}} \sum _{i\in {\mathbb {Z}}} (m_{i,k+1})^p = \frac{1}{\rho ^{p-1}} \sum _{i} \Big ( \sum _{j} m_{j,k} \, {\mathbf {B}}_{\rho ,h,r} ( i,j,k )\Big )^p, \end{aligned}$$

where \({\mathbf {B}}_{\rho ,h,r} ={\mathbf {B}}_{\rho ,h,r} [ H_{p} ( \cdot , Du_{\rho ,h}^{\epsilon }[\mu ] ) ]\) is defined in (25). By Jensen’s inequality, we have

$$\begin{aligned}&\sum _{i\in {\mathbb {Z}}} \Big ( \sum _{j} m_{j,k}\, {\mathbf {B}}_{\rho ,h,r} ( i,j,k )\Big )^p \\&\quad \le \sum _{i\in {\mathbb {Z}}} \Big (\sum _{p\in {\mathbb {Z}}}{\mathbf {B}}_{\rho ,h,r} ( i,p,k )\Big )^{p-1} \Big (\sum _{j} \big (m_{j,k}\big )^p \, {\mathbf {B}}_{\rho ,h,r} ( i,j,k ) \Big ), \end{aligned}$$

and by Lemmas 6.9 and 6.10,

$$\begin{aligned} \sum _{p\in {\mathbb {Z}}}{\mathbf {B}}_{\rho ,h,r} ( i,p,k ) \le 1+ K_0h, \end{aligned}$$

where \(K_0\) is independent of \(i, \rho , h, \epsilon \) and \(\mu \). Since \(\sum _{i} {\mathbf {B}}_{\rho ,h,r} ( i,p,k ) =1\) (follows from \(\sum _{i} \beta _i =1\)), we find that

$$\begin{aligned} \sum _{i\in {\mathbb {Z}}} (m_{i,k+1})^p&\le (1+K_0h)^{p-1} \sum _{j}\big (m_{j,k}\big )^p \sum _{i} {\mathbf {B}}_{\rho ,h,r} ( i,j,k ) \\&\le \rho ^{p-1} \Vert m^{\epsilon }_{\rho ,h}(t_k,\cdot )\Vert ^p_{L^p({\mathbb {R}})} (1+ K_0h)^{p-1}. \end{aligned}$$

By iteration and \(\Vert m^{\epsilon }_{\rho , h}(\cdot , t_0)\Vert _{L^{p}} = \Vert m_0\Vert _{L^{p}}\), \(\Vert m^{\epsilon }_{\rho ,h}(t_{k+1},\cdot )\Vert _{L^p} \le \hbox {e}^{K_0T(1-\frac{1}{p})} \Vert m_0\Vert _{L^p}\), and the result follows for \(p\in [1,\infty )\).

The proof of \(p=\infty \) is simpler, and in view of Lemmas 6.9 and 6.10, the proof follows as in [25] for second-order case. \(\square \)

7 Proof of Proposition 3.4, Theorems 4.1 and 4.3

7.1 Proof of Proposition 3.4

The proof is an adaptation of the Schauder fixed-point argument used to prove existence for MFGs. We will use a direct consequence of Theorems 6.4 and 6.5:

Corollary 7.1

Assume \((\nu \)0),(\(\nu \)1), (L1)–(L2), (H1), (F2), (M), \(\Psi \) is given by Proposition 6.1, and \(m^{\epsilon }_{\rho ,h}[\mu ]\) is defined by (26). Then, there is \(C_{\rho ,h,\epsilon }>0\), such that for any \(\mu \in C ( [0,T], P ( {\mathbb {R}}^d))\) and \(t,s\in [0,T]\),

$$\begin{aligned} \int _{{\mathbb {R}}^d} \Psi (x) \, \hbox {d}m^{\epsilon }_{\rho ,h}[\mu ](t) +\frac{d_0(m^\epsilon _{\rho ,h}[\mu ](t),m^\epsilon _{\rho ,h}[\mu ](s))}{\sqrt{\vert t-s\vert }}\le C_{\rho ,h,\epsilon }. \end{aligned}$$

The point is that \(\rho ,h,\epsilon \) are fixed in this result.

Proof of Proposition 3.4

Let

$$\begin{aligned}&{\mathcal {C}} := \Big \{ \mu \in C ( 0,T ; P ( {\mathbb {R}}^d) ) : \mu ( 0 ) = m_{0}, \\&\quad \sup _{t,s \in [ 0,T] }\Big [ \int _{{\mathbb {R}}^d} \psi ( x ) \hbox {d} \mu ( t,x ) + \frac{d_{0} ( \mu ( t ), \mu ( s ) )}{\sqrt{ \vert t-s \vert }} \Big ]\le C_{\rho ,h, \epsilon } \Big \}, \end{aligned}$$

where \(C_{\rho ,h, \epsilon }\) is defined in Corollary 7.1. For \(\mu \in {\mathcal {C}}\), let \(u_{\rho ,h} [ \mu ]\) be solution of (18) and \(u_{\rho ,h}^{\epsilon } [ \mu ]\) defined by (22). Then, \(m_{\rho ,h}^{\epsilon } = S ( \mu )\) is defined to the corresponding solution of (24). Note that a fixed point of S will give a solution (um) of the scheme (27). We now conclude the proof by applying Schauder’s fixed-point theorem since:

  1. 1.

      (\({\mathcal {C}}\) is a convex, closed, compact set). It is a convex and closed by standard arguments and compact by the Prokhorov and Arzelà–Ascoli theorems.

  2. 2.

      (S is a self-map on \({\mathcal {C}}\)). The map S maps \({\mathcal {C}}\) into itself by Corollary 7.1 (tightness and equicontinuity), and Lemma 3.3 (positivity and mass preservation).

  3. 3.

      (S is continuous). Let \( \mu _{n} \rightarrow \mu \) in \({\mathcal {C}}\). By Theorem 5.2 (comparison) and (F2),

    $$\begin{aligned}&\Vert u_{\rho ,h} [ \mu _{n} ] - u_{\rho ,h} [ \mu ] \Vert _{0} \\&\quad \le T \sup _{ t,x } \vert F ( x, \mu _{n} ( t ) ) - F ( x, \mu ( t ) ) \vert + \sup _{x} \vert G ( x, \mu _{n} ( T ) ) - G ( x, \mu ( T ) ) \vert \\&\quad \le T L_{F} \, \sup _{t} d_{0} ( \mu _{n} ( t ) , \mu ( t ) ) + L_{G} \, d_{0} ( \mu _{n} ( T ) , \mu ( T ) ) \rightarrow 0. \end{aligned}$$

    Then, \( \sup _{i} \big \vert \frac{u_{i,k} [ \mu _{n} ] - u_{i-j,k} [ \mu _{n} ]}{ \rho } - \frac{u_{i,k} [ \mu ] - u_{i-j,k} [ \mu ]}{ \rho } \big \vert \rightarrow 0\) uniformly for \(\vert i-j \vert = 1 \), \(\Vert Du_{\rho ,h}^{\epsilon } [ \mu _{n} ] - Du_{\rho ,h}^{\epsilon } [ \mu ] \Vert _{0} \rightarrow 0 \), and finally by Lemma 6.6,

    $$\begin{aligned}&\sup _{t \in [ 0,T ]} \Vert m_{\rho ,h}^{\epsilon } [ \mu _{n} ] ( t, \cdot ) - m_{\rho ,h}^{\epsilon } [ \mu ] ( t,\cdot ) \Vert _{L^{1} ( {\mathbb {R}}^d)} \\&\quad \le \frac{cKT}{\rho } \hbox {e}^{- h \lambda _{r}} \Vert Du_{\rho ,h}^{\epsilon } [ \mu _{n} ] - Du_{\rho ,h}^{\epsilon } [ \mu ] \Vert _{0} \rightarrow 0. \end{aligned}$$

    Hence, \({\mathcal {S}}\) is continuous. \(\square \)

7.2 Proof of the Convergence: Theorems 4.1 and 4.3

The main structure of the proofs is similar, so we present the proofs together. We proceed by several steps.

Step 1. (Compactness of \(m^{\epsilon _n}_{\rho _n,h_n}\)) In view of Theorems 6.4 and 6.5, \(m^{\epsilon }_{\rho ,h}\) is precompact in \(C([0,T], P({\mathbb {R}}^d))\) by the Prokhorov and Arzelà–Ascoli Theorem. Hence, there exists a subsequence \(\{m^{\epsilon _n}_{\rho _n,h_n}\}\) and m in \( C([0,T], P({\mathbb {R}}^d))\) such that

$$\begin{aligned} m^{\epsilon _n}_{\rho _n,h_n} \rightarrow m \quad \text{ in } \quad C([0,T],P({\mathbb {R}}^d)). \end{aligned}$$

This proves Theorem 4.3 (a) (ii) and the first part of Theorem 4.1 (a) (ii).

If (M’) holds with \(p=\infty \), then Theorem 6.7 and Helly’s weak \(*\) compactness theorem imply that \(\{m^{\epsilon }_{\rho ,h}\}\) is weak \(*\) precompact in \(L^{\infty }([0,T]\times {\mathbb {R}})\) and there is a subsequence \(\{m^{\epsilon _n}_{\rho _n,h_n}\}\) and function m such that \(m^{\epsilon _n}_{\rho _n,h_n} \overset{*}{\rightharpoonup } m\) in \(L^{\infty }([0,T]\times {\mathbb {R}})\). If (M’) holds with \(p \in (1,\infty )\), then \(\{m^{\epsilon }_{\rho ,h}\}\) is equi-integrable in \([0,T]\times {\mathbb {R}}\) by Theorems 6.4 and 6.7 and de la Vallée Poussin’s theorem. By Dunford–Pettis’ theorem, it is then weakly precompact in \(L^1([0,T]\times {\mathbb {R}})\) and there exists a subsequence \(\{m^{\epsilon _n}_{\rho _n,h_n}\}\) and function m such that \(m^{\epsilon _n}_{\rho _n,h_n} \rightharpoonup m\) in \(L^{1}([0,T]\times {\mathbb {R}})\). The second part of Theorem 4.1 (a) (ii) follows.

Step 2. (Compactness and limit points for \(u_{\rho _n,h_n}\)) Part (i) and limit points u as viscosity solutions in part (iii) of both Theorems 4.1 and 4.3 follow from step 1 and Theorem 5.6 (i).

Step 3. (Consistency for \(m^{\epsilon _n}_{\rho _n,h_n}\)) Let (um) be a limit point of \(\{(u^{\epsilon _n}_{\rho _n,h_n},m^{\epsilon _n}_{\rho _n,h_n})\}_n\). Then by step 2, u is a viscosity solution of the HJB equation in (1). We now show that m is a very weak solution of the FPK equation in (1) with u as the input data, i.e., m satisfies (3) for \(t\in [0,T]\) and \(\phi \in C_c^\infty ({\mathbb {R}}^d)\). In the rest of the proof, we use \(\rho , h, r, \epsilon \) instead of \(\rho _{n}, h_{n}, r_{n}, \epsilon _n\) to simplify. We also let \(\widehat{\widehat{m}} = m^{\epsilon _n}_{\rho _n,h_n}\), \(w=u_{\rho _n,h_n}^{\epsilon _n}[\widehat{\widehat{m}}]\), and take \(t_n=\big [\frac{t}{h_n}\big ]h_n\). Then, we note that

$$\begin{aligned} \int _{{\mathbb {R}}^d}\phi (x)\hbox {d}\widehat{\widehat{m}}(t_n)(x)=\int _{{\mathbb {R}}^d}\phi (0)\hbox {d}m_0(x) + \sum _{k=0}^{n-1}\int _{{\mathbb {R}}^d}\phi (x)\hbox {d}[\widehat{\widehat{m}}(t_{k+1})-\widehat{\widehat{m}}(t_k)], \end{aligned}$$

so to prove (3), we must estimate the sum on the right.

By the midpoint approximation and (26), the scheme (24), and (25) combined with linear/multilinear interpolation, and finally midpoint approximation again, we find that

$$\begin{aligned} \int _{{\mathbb {R}}^d} \phi (x) \hbox {d}\widehat{\widehat{m}}(t_{k+1})&= \frac{1}{\rho ^d} \sum _{i\in {\mathbb {Z}}^d} m_{i,k+1} \int _{E_i} \phi (x)\hbox {d}x = \sum _{i} m_{i,k+1} \phi (x_i) + {\mathcal {O}} (\rho ^2) \\&= \sum _{i} \phi (x_i) \sum _{j} m_{j,k} \, {\mathbf {B}}_{\rho ,h,r} [ H_{p} ( \cdot , Dw ) ] ( i,j,k ) + {\mathcal {O}} (\rho ^2)\\&= \sum _j m_{j,k} \Big ( \frac{\hbox {e}^{-\lambda _r h}}{2d}\sum _{p=1}^d [ \phi (\Phi _{j,k,p}^{\epsilon ,+})+ \phi (\Phi _{j,k,p}^{\epsilon ,-}) ] \\&\quad + \frac{1-\hbox {e}^{-\lambda _r h}}{\lambda _r} \int _{\vert z\vert>r} \phi (x_j+z) \nu (\hbox {d}z) \Big ) + {\mathcal {O}} (\rho ^2) \\&=\sum _j \frac{m_{j,k}}{\rho ^d} \int _{E_j} \Big ( \frac{\hbox {e}^{-\lambda _r h}}{2d}\sum _{p=1}^d [\phi (\Phi _{k,p}^{\epsilon ,+})(x)+ \phi (\Phi _{k,p}^{\epsilon ,-})(x)] \\&\quad + \frac{1-\hbox {e}^{-\lambda _r h}}{\lambda _r} \int _{\vert z\vert >r} \phi (x+z) \nu (\hbox {d}z) \Big ) \hbox {d}x + {\mathcal {O}} (\rho ^2) +E_\Phi +E_\nu , \end{aligned}$$

where \(\Phi _{j,k,p}^{\epsilon ,\pm }\) is defined in (23), \(\Phi ^{\epsilon ,\pm }_{k,p}(x) = x - h\,\big ( H_{p} ( x, D w (t_k,x) ) + B_r^{\sigma } \big ) \pm \sqrt{hd} \sigma _r^p\), and \(E_\Phi +E_\nu \) is the error of the last midpoint approximation.

Since \(\phi \) is smooth, \(u_{\rho ,h}\) uniformly Lipschitz (Lemma 5.5 (a)), \(\Vert D^2w\Vert _{0}\le \frac{C \Vert Du_{\rho ,h}\Vert _{0}}{\epsilon }\), and by assumption (H2),

$$\begin{aligned}&\Big \vert \phi (\Phi ^{\epsilon ,\pm }_{j,k,p}) - \frac{1}{\rho ^d} \int _{E_j} \phi (\Phi ^{\epsilon ,\pm }_{k,p})(x) \hbox {d}\Big \vert \\&\quad \le \frac{\Vert D\phi \Vert _0}{\rho ^d} \int _{E_j} \vert x-x_j\vert \hbox {d}x + \frac{h\Vert D\phi \Vert _0}{\rho ^d} \int _{E_j} \big \vert D_p H(x_j, Dw(t_k,x_j)) \\&\qquad - D_p H(x, Dw(t_k,x))\big \vert \hbox {d}x \\&\quad \le K \rho \big (1+ h (\Vert H_{pp}\Vert _{0}\Vert D^2 w\Vert _0 + \Vert H_{px}\Vert _0)\big ) \le K\rho \big (1+ \frac{h}{\epsilon } \Vert D u_{\rho ,h} \Vert _0\big ), \end{aligned}$$

and hence \(E_\Phi ={\mathcal {O}}(\frac{h\rho }{\epsilon })\). Similarly, \(E_\nu ={\mathcal {O}}(h\rho ^2\lambda _r)={\mathcal {O}}(\frac{h\rho ^2}{r^{\sigma }})\).

From the above estimates, we find that

$$\begin{aligned}&\int _{{\mathbb {R}}^d} \phi (x) \hbox {d}\big (\widehat{\widehat{m}}(t_{k+1}) - \widehat{\widehat{m}}(t_k)\big )(x) = \int _{{\mathbb {R}}^d} \Big ( \frac{\hbox {e}^{-\lambda _r h}}{2d}\sum _{p=1}^d [\phi (\Phi _{k,p}^{\epsilon ,+})(x)+ \phi (\Phi _{k,p}^{\epsilon ,-})(x) \\&\quad - 2\phi (x)] + \frac{1-\hbox {e}^{-\lambda _r h}}{\lambda _r} \int _{\vert z\vert >r} \big (\phi (x+z) - \phi (x)\big ) \nu (\hbox {d}z) \Big ) \hbox {d}\widehat{\widehat{m}}(t_k)(x) \\&\quad + {\mathcal {O}} \big (\rho ^2 + \frac{h\rho }{\epsilon } +\frac{h\rho ^2}{r^{\sigma }}\big ). \end{aligned}$$

By a similar argument as in (29) and using Lemma 3.1,

$$\begin{aligned} \phi (\Phi _{k,p}^{\epsilon ,+})(x)&+ \phi (\Phi _{k,p}^{\epsilon ,-})(x)- 2\phi (x) = - 2h \Big (D\phi (x)\cdot D_p H(x, Dw(t_k,x)) \\&\,+ B_r^{\sigma }\cdot D\phi (x)\Big ) + 2h {\mathcal {L}}_r[\phi ](x) + {\mathcal {O}}(h^2r^{2-2\sigma } + hr^{3-\sigma } ). \end{aligned}$$

Hence using (31) and (32), we have

$$\begin{aligned}&\int _{{\mathbb {R}}^d} \phi (x)\hbox {d}(\widehat{\widehat{m}}(t_{k+1}) -\widehat{\widehat{m}}(t_k))(x) \\&\quad = h\int _{{\mathbb {R}}^d} \big [- D\phi (x)\cdot D_p H(x, Dw(t_k,x)) + {\mathcal {L}}_r[\phi ](x)+{\mathcal {L}}^r[\phi ](x)\big ] \hbox {d}\widehat{\widehat{m}}(t_k)(x) \\&\qquad +{\mathcal {O}}(h^2 r^{-\sigma } + h^2 r^{1-2\sigma } + h^2 r^{2-2\sigma } ) + {\mathcal {O}} (\rho ^2 + \frac{h\rho }{\epsilon } + \frac{h\rho ^2}{r^{\sigma }} + h^2r^{2-2\sigma } + hr^{3-\sigma }). \end{aligned}$$

Summing from \(k=0\) to \(k=n-1\) and approximating sums by integrals, we obtain

$$\begin{aligned}&\int _{{\mathbb {R}}^d} \phi (x)\hbox {d}\widehat{\widehat{m}}(t_n)(x)-\int _{{\mathbb {R}}^d}\phi (x) \hbox {d}\widehat{\widehat{m}}(t_0) \nonumber \\&\quad = h\sum _{k=0}^{n-1} \int _{{\mathbb {R}}^d} \big [- D\phi (x)\cdot D_p H(x,Dw(t_k,x)) + {\mathcal {L}}[\phi ](x)\big ] \hbox {d}\widehat{\widehat{m}}(t_k)(x)\nonumber \\&\qquad + n \,{\mathcal {O}} (\rho ^2 + \frac{h\rho }{\epsilon } +\frac{h\rho ^2}{r^{\sigma }} +h^2r^{-\sigma } + hr^{3-\sigma }) \nonumber \\&\quad = \int _{{\mathbb {R}}^d}\int _0^{t_n} \big [- D\phi (x)\cdot D_p H(x, Dw(s, x)) + {\mathcal {L}}[\phi ](x)\big ] \hbox {d}\widehat{\widehat{m}}(s)(x) \, \hbox {d}s \nonumber \nonumber \\&\qquad + {\mathcal {O}} \Big (\frac{\rho ^2}{h} + \frac{\rho }{\epsilon }+ \frac{\rho ^2+h}{r^{\sigma }}+ r^{3-\sigma }\Big ) + E, \end{aligned}$$
(43)

where E is Riemann sum approximation error. Let \(I_k(x):= - D\phi (x)\cdot D_p H(x, Dw(t_k,x))\) \(+ {\mathcal {L}}[\phi ](x)\) and use time-continuity \(\widehat{\widehat{m}}\) in the \(d_0\)-metric (Theorem 6.5), that \(w(\cdot ,x)\) is constant on \([t_k, t_{k+1})\), (H1), (H2) and \(\Vert D^2w\Vert _{0}\le \frac{C \Vert Du_{\rho ,h}\Vert _{0}}{\epsilon }\), to conclude that for \(s\in [t_k, t_{k+1})\)

$$\begin{aligned}&\int _{t_k}^{t_{k+1}} \int _{{\mathbb {R}}^d} I_k(x) d\big (\widehat{\widehat{m}}(t_k) - \widehat{\widehat{m}}(s)\big )(x) \hbox {d}s \\&\quad \le h \big (\Vert I_k\Vert _{0} + \Vert DI_k\Vert _{0}\big ) C_0 \sup _{s\in [t_k,t_{k+1})} \sqrt{s-t_k} \\&\quad \le K h \Big ( 1 + \Vert Dw\Vert _{0} + \Vert D^2w\Vert _{0} \Big ) \sqrt{h} \le K h \Big (1+ \frac{1}{\epsilon }\Big ) \sqrt{h}. \end{aligned}$$

Summing over k, we have \(E= \big \vert \sum _{k=0}^{n-1}\int _{t_k}^{t_{k+1}} \int _{{\mathbb {R}}^d} I_k(x) \hbox {d}\big (\widehat{\widehat{m}}(t_k) - \widehat{\widehat{m}}(s)\big )(x) \hbox {d}s\big \vert = {\mathcal {O}}(\frac{\sqrt{h}}{\epsilon })\).

Since \(\widehat{\widehat{m}}\) converges to m in \(C([0,T], P({\mathbb {R}}^d))\) and \(\phi \in C_c^{\infty }({\mathbb {R}}^d)\) implies \({\mathcal {L}}[\phi ] \in C_b({\mathbb {R}}^d)\), we have

$$\begin{aligned} \int _{{\mathbb {R}}^d}\int _0^{t_n} {\mathcal {L}}[\phi ](x) \hbox {d}\widehat{\widehat{m}}(s)(x) \xrightarrow {n\rightarrow \infty }\int _{{\mathbb {R}}^d}\int _0^{t} {\mathcal {L}}[\phi ](x) \hbox {d}{m}(s)(x). \end{aligned}$$
(44)

It now remains to show convergence of the \(D_p H\)-term and pass to the limit in (43) to get that m is a very weak solution satisfying (3).

Step 4 (Proof of Theorem4.1(a) (iii)).   Now \(d=1\) and part (ii) of Theorem 4.1 (a) implies that \(\widehat{\widehat{m}} \overset{*}{\rightharpoonup } m\) in \(L^{\infty }([0,t]\times {\mathbb {R}})\) if \(m_0 \in L^{\infty }({\mathbb {R}})\), or \(\widehat{\widehat{m}}\rightharpoonup m\) in \(L^{1}([0,t]\times {\mathbb {R}})\) if \(m_0 \in L^{p}({\mathbb {R}})\) for \(p \in (1,\infty )\). We also have \(Dw(t,x)=Du_{\rho ,h}^{\epsilon }(t,x)\rightarrow Du(t,x)\) almost everywhere in \([0,T]\times {\mathbb {R}}\) by Theorem 5.6 (ii). Since \(D\phi \in C_c^{\infty }({\mathbb {R}})\) and \(D_p H(\cdot ,Dw)\) uniformly bounded, by the triangle inequality and the dominated convergence Theorem we find that

$$\begin{aligned}&\int _{{\mathbb {R}}} \int _{0}^{t_n} D\phi (x)\cdot D_p H(x,Dw(s,x)) \, \hbox {d}\widehat{\widehat{m}}(s)(x) \\&\quad \longrightarrow \int _{{\mathbb {R}}} \int _0^t D\phi (x)\cdot D_p H(x,Du(s,x)) \, \hbox {d}{m}(s)(x). \end{aligned}$$

Then by passing to the limit in (43) using the above limit, (44), and the CFL conditions \(\frac{\rho ^2}{h},\frac{h}{r^{\sigma }},\frac{\sqrt{h}}{\epsilon }= o (1)\) (note that \(\rho ^2\le h\) for large n), we see that (3) holds and m is a very weak solution of the FPK equation. This completes the proof of Theorem 4.1 (a) (iii).

Step 5(Proof of Theorem4.3(iii)).    Now (U) holds and \(Dw=Du_{\rho ,h}^{\epsilon }\rightarrow Du\) locally uniformly by Theorem 5.6 (iii). Since \(D \phi \in C^{\infty }_c({\mathbb {R}}^d)\) and \(\int _{{\mathbb {R}}^d} \hbox {d}\widehat{\widehat{m}}(s)(x) =1\), by continuity and uniform boundedness of \(D_p H(\cdot , Dw)\), it follows that

$$\begin{aligned} \begin{aligned}&\Big \vert \int _{{\mathbb {R}}^d} \int _{0}^{t_n} D\phi (x)\cdot D_p H(x,Dw(s,x)) \, \hbox {d}\widehat{\widehat{m}}(s)(x) \\&\qquad - \int _{{\mathbb {R}}^d} \int _0^{t_n} D\phi (x)\cdot D_p H(x,Du(s,x)) \, \hbox {d}{\widehat{\widehat{m}}}(s)(x)\Big \vert \\&\quad \le T \Vert D\phi \Vert _0 \Vert D_{pp} H\Vert _0 \Vert Dw - Du \Vert _{L^{\infty }(supp (\phi ))} \int _{{\mathbb {R}}^d} \hbox {d}\widehat{\widehat{m}}(s)(x) \longrightarrow 0. \end{aligned} \end{aligned}$$
(45)

Since \(\widehat{\widehat{m}} \rightarrow m\) in \(C([0,T], P({\mathbb {R}}^d))\) and \(D\phi \cdot D_p H(\cdot ,Du)(t) \in C_b({\mathbb {R}}^d)\) by (U), we get

$$\begin{aligned}&\int _{{\mathbb {R}}^d} \int _{0}^{t_n} D\phi (x)\cdot D_p H(x,Du(s,x)) \, \hbox {d}\widehat{\widehat{m}}(s)(x) \\&\longrightarrow \int _{{\mathbb {R}}^d} \int _0^t D\phi (x)\cdot D_p H(x,Du(s,x)) \, \hbox {d}{m}(s)(x). \end{aligned}$$

Then by passing to the limit in (43) using the above limit, (45), (44), and the CFL conditions \(\frac{\rho ^2}{h},\frac{h}{r^{\sigma }},\frac{\sqrt{h}}{\epsilon }= o (1)\), we see that (3) holds and m is a very weak solution of the FPK equation. This completes the proof of Theorem 4.3(iii).

8 Numerical Examples

For numerical experiments, we look at

$$\begin{aligned} {\left\{ \begin{array}{ll} -u_t - \sigma ^2 {\mathcal {L}} u + \frac{1}{2} \vert u_x\vert ^{2} = f ( t,x ) + K \ \phi _{ \delta } *m ( t,x ) \, \, &{}\text { in } (0,T)\times [ a,b ], \\ m_t - \sigma ^2 {\mathcal {L}}^{*} m - \text {div} (m u_x ) = 0 \, \, &{}\text { in } (0,T)\times [ a,b ], \\ u (T,x) = G(x,m(T)), \qquad m(x,0) = m_0 (x) \, \, &{}\text { in } [ a,b ], \end{array}\right. } \end{aligned}$$
(46)

where \(a<b\) are real numbers, \({\mathcal {L}}\) is a diffusion operator, \(\phi _{\delta } = \frac{1}{\delta \sqrt{2 \pi }} \hbox {e}^{-\frac{x^2}{2 \delta ^2}} \), K some real number, and f is some bounded smooth function. We will specify these quantities in the examples below.

8.1 Artificial Boundary Conditions

Our schemes (18) and (24) for approximating (46) are posed in all of \({\mathbb {R}}\). To work in a bounded domain, we impose (artificial) exterior conditions:

  1. (U1)

    \(u \equiv \Vert u_{0} \Vert _{0} + T \cdot \Vert f \Vert _{L^{\infty } ( ( 0,T ) \times ( a,b ) )} \) in \(({\mathbb {R}} {\setminus } [ a,b ] ) \times [ 0,T ]\),

  2. (M1)

    \(m \equiv 0\) in \(({\mathbb {R}} {\setminus } [ a,b ] ) \times [ 0,T ]\), and \(m_{0}\) is compactly supported in [ab].

Condition (U1) penalize being in \([a,b]^c\) ensuring that optimal controls \(\alpha \) in (18) are such that \(x_{i} - h \alpha \pm \sqrt{h} \sigma _{r} \in [a,b]\). Condition (M1) ensures that the mass of m is contained in [ab] up to some finite time, and there is no contribution from \([ a,b ]^c\) when we compute \({{\mathcal {L}}}^*m\). Note that some mass will leak out due to nonlocal effects (and then vanish by the 0 exterior condition), but this leakage is very small: By the decay of the Lévy measures \(\nu \) at infinity, the contributions to nonlocal operators (of m and u) from \([a,b ]^c\) will be small far from the boundary.

We will present numerical results from a region of interest that is far away from the boundary of [ab], and where the influence of the (artificial) exterior data is expected to be negligible.

8.2 Evaluating the Integrals

To implement the scheme, we need to evaluate the integral

$$\begin{aligned}&\int _{\vert z\vert \ge r} I [ f ] ( x_{i} + z ) \nu ( \hbox {d}z ) = \sum _{j \in {\mathbb {Z}}} f [ x_{i} ] \omega _{j-i, \nu } , \\ \text {where,} \qquad&\omega _{j-i, \nu } = \int _{\vert z\vert \ge r } \beta _{j-i} ( z ) \nu ( \hbox {d}z ), \end{aligned}$$

see (17). In addition, we need to compute the values of \(\sigma _{r}, b_{r}\), and \(\lambda _{r}\) [see (9), (8), and (11)]. To compute the weights \(\omega _{j-i, \nu }\), we use two different methods. For the fractional Laplacians, we use the explicit weights of [47], while for CGMY diffusions, we calculate the weights numerically using the inbuilt integral function in MATLAB. When tested on the fractional Laplacian, the MATLAB integrator produced an error of less than \(10^{-15}\). Below the quantities \(\sigma _{r}, b_{r}, \lambda _{r}\) are computed explicitly, except in the CGMY case where we use numerical integration.

8.3 Solving the Coupled System

We use a fixed-point iteration scheme: (i) Let \(\mu = m_0\), and solve for \(u_{\rho ,h}\) in (18)–(20). (ii) With approximate optimal control \(Du_{\rho ,h}^{\epsilon }\) as in (21), we solve for \(m_{\rho ,h}^{\epsilon }\) in (24). (iii) Let \(\mu _{\text {new}} = ( m_{\rho ,h}^{\epsilon } + \mu )/2\), and repeat the process with \(\mu =\mu _{\text {new}}\). We continue until we have converged to a fixed point to within machine accuracy.

Remark 8.1

Instead \(\mu _{\text {new}} = m_{\rho ,h}^{\epsilon }\), we take \(\mu _{\text {new}} =( m_{\rho ,h}^{\epsilon } + \mu )/2\). That is, we use a fixed-point iteration with some memory. This gives much faster convergence in our examples.

Example 1

Problem (46) with \( [ 0,T ] \times [ a,b ] = [ 0,2 ] \times [ 0,1 ]\), \(G = 0\), \(f ( t,x ) = 5 ( x - 0.5(1 - \sin (2 \pi t)))^2\), \(m_{0} ( x ) = C \hbox {e}^{-\frac{(x-0.5)^2}{0.1^2}} \), where C is such that \(\int _{a}^b m_{0} = 1\). Furthermore, in accordance with the CFL-conditions of Theorem 4.1, we let \(h = \rho = 0.005\), \(r = h^{\frac{1}{2 s}}\), \(\epsilon = \sqrt{h} \approx 0.0707\), \(\sigma =0.09\), \(\delta = 0.4\), \(K=1\).

For the diffusions, we consider \({\mathcal {L}} = ( - \Delta )^{\frac{s}{2}}\) for \(s =0.5, 1.5, 1.9\), \({\mathcal {L}} = \Delta \), and \({\mathcal {L}} \equiv 0\). In Fig. 1, we plot the different solutions at time \(t=0.5\) and \(t=1.5\).

Fig. 1
figure 1

The solutions m in Example 1 for different fractional Laplacians

In Fig. 2, we plot the solution with \(s=1.5\) on the time interval [0, 2].

Fig. 2
figure 2

Solution m and u in Example 1 with diffusion parameter \(s=1.5\)

Example 2

Problem (46) with the same cost functions as in Example 1, but different diffusions with parameter \(s=1.5\):

  1. (i)

    \({\mathcal {L}} = \sigma ^2 ( - \Delta )^{\frac{s}{2}},\)

  2. (ii)

    \({\mathcal {L}} = \sigma ^2 C_{d,s} \int _{{\mathbb {R}}} [ u ( x+y ) - u ( x ) - Du ( x ) \cdot y \mathbbm {1}_{\vert y\vert <1} ] \, \mathbbm {1}_{ [ 0, + \infty )} \, \frac{\hbox {d}y}{\vert y\vert ^{1+s}}\),

  3. (iii)

    \( {\mathcal {L}} = \sigma ^2 C_{d,s} \int _{{\mathbb {R}}} [ u ( x+y ) - u ( x ) - Du ( x ) \cdot y \mathbbm {1}_{\vert y\vert <1} ] \, \mathbbm {1}_{ [ -0.5,0.5 ]^{c}} \, \frac{\hbox {d}y}{\vert y\vert ^{1+s}}\),

  4. (iv)

    \({\mathcal {L}} = \sigma ^2 C_{d,s} \int _{{\mathbb {R}}} [ u ( x+y ) - u ( x ) - Du ( x ) \cdot y \mathbbm {1}_{\vert y\vert <1} ] \, \hbox {e}^{-10 y^{-} -y^{+}} \, \frac{\hbox {d}y}{\vert y\vert ^{1+s}} \),

where \(C_{d,s}\) is the normalizing constant for the fractional Laplacian (see [47]). Case (i) is the reference solution, a symmetric and uniformly elliptic operator. Case (ii) is nonsymmetric and nondegenerate, case (iii) is symmetric and degenerate, and case (iv) is a CGMY-diffusion (see, e.g., [36]). We have plotted m at \(t=0.5\) and \(t=1.5\) in Fig. 3.

Fig. 3
figure 3

The solutions m in Example 2 for different nonlocal operators

Example 3

(Long time behavior). Under certain conditions (see, e.g., [22, 23]), the solution of time dependent MFG systems will quickly converge to the solution of the corresponding stationary ergodic MFG system, as the time horizon T increases. We check numerically that this is also the case for nonlocal diffusions. In (46), we take \({\mathcal {L}} = ( -\Delta )^{\frac{s}{2}}\), with \(s=1.5\), \( [ 0,T ] \times [ a,b ] = [ 0,10 ] \times [ -1,2 ]\), \(G ( x ) = ( x-2 )^{2}\), \(f ( t,x ) = x^2\), and \(m_{0} ( x ) = \mathbbm {1}_{ [ 1,2 ] } ( x )\). We expect (from the cost functions f and G) that the solution m will approach the line \(x=0\) quite fast, and then travel along this line, until it goes toward the point \(x=2\) in the very end. Our numerical simulations show that this is the case also for nonlocal diffusions. Here, we have considered the cases \(K=0\) (no coupling in the u equation) and \(K = 0.4\) (some coupling). The parameters used in the simulations are \(h = \rho = 0.01\), \(\epsilon = \sqrt{h}\), \(r = h^{1 / 2s}\), and the results are shown in Fig. 4.

Fig. 4
figure 4

Long time behavior and turnpike property. The solution m in Example 3 with different right hand sides

The players want to avoid each other in the case of \(K=0.4\), so the solution is more spread out in space direction than in the case of \(K=0\).

Example 4

We compute the convergence rate when f, G, \(m_{0}\) are as in Example 1, \(s=1.5\), \(\nu =0.2\), \(\delta =0.4\), and the domain \( [ 0,T ] \times [ a,b ] = [ 0,0.5 ] \times [ 0,1 ]\). We take \(\rho = h\), \(r = h^{\frac{1}{2 s}}\), and for simplicity \(\epsilon = 0.25\).

We calculate solutions for different values of h, and compare with a reference solution computed at \(h=2^{-10}\). We calculate \(L^{\infty }\) and \(L^{1}\) relative errors restricted to the x-interval \([ \frac{1}{3}, \frac{2}{3} ]\) (to avoid boundary effects), and \(t=0\) for u and \(t=T\) for m:

$$\begin{aligned}&ERR _u:= \frac{\Vert u_{\rho ,h} ( 0,\cdot ) - u_{\text {ref}} ( 0,\cdot ) \Vert _{L^{\infty } ( \frac{1}{3}, \frac{2}{3} )}}{\Vert u_{\text {ref}} ( 0,\cdot ) \Vert _{L^{\infty } ( \frac{1}{3} , \frac{2}{3} )}}, \\&ERR _m:=\frac{\Vert m_{\rho ,h}^{\epsilon } ( T,\cdot ) - m_{\text {ref}} ( T,\cdot ) \Vert _{L^{1} ( \frac{1}{3} , \frac{2}{3} )}}{\Vert m_{\text {ref}} ( T,\cdot ) \Vert _{L^{1} ( \frac{1}{3} , \frac{2}{3} )}}. \end{aligned}$$

The results are given in the table below.

h

\(2^{-2}\)

\(2^{-3}\)

\(2^{-4}\)

\(2^{-5}\)

\(2^{-6}\)

\(2^{-7}\)

\(2^{-8}\)

\(2^{-9}\)

ERR\(_u\)

0.3155

0.1951

0.0920

0.0446

0.0218

0.0097

0.0035

0.0013

ERR\(_m\)

0.8055

0.4583

0.2886

0.1869

0.1023

0.0596

0.0300

0.0186

We see that when we halve h, the error is halved, i.e., we observe an error of order O(h).