1 Introduction

There has been extensive research in mean field game (MFG) since its introduction by Lasry and Lions [21] and Huang et al. [18] as a limit of symmetric nonzero sum noncooperative N-player dynamic games when the number of players tends to infinite, see Carmona and Delarue [4] and Guéant et al. [16] for excellent introduction to MFG. The main research focus nowadays is in two areas. One is to study the existence and uniqueness of MFG equilibrium with the partial differential equation (PDE) system that characterizes the equilibrium value function and mean field state, see Lasry and Lions [21]. The other is to analyze the convergence from the stochastic differential game among large but finite number of players to the MFG limit when the number of players tends to infinity and the numerical approximation for MFG, see Achdou et al. [1] and Achdou and Capuzzo-Dolcetta [2] The MFG theory has been applied to many modeling problems in economics, politics, social welfare and other areas, see, for example, Guéant ([14] and Lasry et al. [22].

In this paper, we focus on MFG with finite time horizon and continuous time state dynamic of each agent taking values in a finite set under fully symmetric payoff and complete information. Gomes et al. [12] first study finite state MFG and prove the existence and uniqueness of Nash equilibrium with the coupled forward and backward ordinary differential equation (FBODE) system and show the convergence of N-player game’s Nash equilibrium to that of the limiting MFG when N tends to infinite and time horizon is small. [8] analyze the MFG with a probabilistic approach. Carmona and Wang [7] tackle both the mean field of states and that of controls and prove the existence of equilibrium with backward stochastic differential equation and the uniqueness of equilibrium when the Hamiltonian does not depend on mean field controls. Carmona and Wang [6] analyze finite state MFG between one major player and infinite number of minor players. Cardaliaguet et al. [3] make the breakthrough in convergence analysis for a diffusion model with common noise and characterize the equilibrium with the master equation and its regular solution. Cecchin and Pelino [9] apply the master equation to obtain the convergence of feedback Nash equilibrium in the finite state space, which extends the convergence result in Gomes et al. [12] without requiring the time horizon being sufficiently small.

Despite the progress in existence, uniqueness and convergence for Nash equilibrium of the finite state MFG, there is still a considerable obstacle to approximate the N-player game with a simpler MFG. One main difficulty is that the Nash equilibrium of finite state MFG is characterized by a FBODE system with both the initial and terminal conditions, which in generally has no analytical solution and is difficult to solve numerically. One commonly used method for solving FBODE is the shooting method but it tends to work better when the dimension is low and the boundary condition is simple. The shooting method fails to work in our case. Gomes and Saude [13] propose a numerical scheme to solve finite state MFG under some monotone conditions that do not hold in many applications.

There has been active research in recent years on using the deep neural network (DNN) to solve PDEs and ODEs with different boundary conditions, see, e.g., Lagaris et al. [19], Malek and Beidokhti [27], Lee and Kang [24], Lagaris et al. [20]. Given that the feature of FBODE system is similar to that of a PDE, we are motivated to use DNN to numerically solve the FBODE system in the finite state MFG problem. Sirignano and Spiliopoulos [30] propose DGM (deep Galerkin method) to solve high-dimensional PDEs with a mesh-free DNN and show the convergence of approximate solutions to the true solution under some conditions, which is similar in spirit to the Galerkin method except that the solution is approximated by a neural network instead of a linear combination of basis functions. [5] provide a comprehensive literature review on deep learning method applied on MFG. Many papers apply the DGM approach to numerically solve high-dimensional PDEs derived from different types of MFG (see, e.g., Han et al. [17], Ruthotto et al. [29]), while others apply the DNN to solve the corresponding BSDEs (see, e.g., Fouque and Zhang [11], Lauriere [23]). Most of these papers only provide numerical results without rigorous proof for the numerical solutions. There is no guarantee that the neural network approximation can converge to the true solution, and the approximation may not be accurate enough albeit the loss function is already small as there is no relation established between the loss function and the error between approximate and true solutions. Li et al. [25] and Li et al. [26] prove the strong and uniform convergence of the DGM approach. Parallel to this paper, Mishra and Molinaro [28] focus on error estimation of physical informed neural network (PINN), which is the other name of DGM. For PDEs satisfying certain conditions, they provide the abstract framework to relate the loss function of neural network to the error between the true solution and the approximate solution generated by neural network and prove the error bound for several specific types of PDEs. Their assumptions on the regularity of PDEs are strong (see Assumption 2.1 in Mishra and Molinaro [28]) and are not necessarily satisfied by the FBODE system derived in this paper. To our best knowledge, there is no existing literature addressing the error between the approximation and the true solution via the loss function for FBODEs derived from continuous time finite state MFG problems. We provide the error bound estimation to fill the gap.

The main result of the paper, Theorem 2.6, states that the error between the true solution \((\theta , p)\) of the FBODE system and the DNN approximate solution \(({\tilde{\theta }}, {\tilde{p}})\) is linear to the square root of the loss function in the DNN method, which provides the magnitude of the error bound for the DNN approximation as well as the convergence result. To bridge \(\theta \) and \({\tilde{\theta }}\), we use the master equation for \(\theta \) in Cecchin and Pelino [9] and prove that \({\tilde{\theta }}\) satisfies a similar equation. Cecchin and Pelino [9] prove the equilibrium of finite players finite state game converges to that of the corresponding MFG with the former satisfying a backward ODE while the latter a FBODE which is equivalent to a backward PDE (master equation) and can be compared with the backward ODE system. In contrast, we want to estimate the error between the true solution and the DNN approximation to MFG with both satisfying FBODE systems and the one for the DNN approximation having extra error terms compared with the one for the true solution. We leverage the master equations to connect the two FBODE systems and do error analysis. Due to perturbation terms in the FBODE system, we need to address the issue of negative \({\tilde{p}}\), prohibited in [9, Theorem 6] and find a new way to bypass that difficulty.

As an application, we apply the DNN to numerically solve an optimal market making problem with the same framework as that of Guéant [15], except that the terminal reward depends on the trading volume ranking that is determined in a so-called market maker incentive program contract designed by the exchange to encourage market maker to provide more liquidity (i.e., trading volume). El Euch et al. [10] discuss the market maker incentive contract and analyze how exchange should optimally decide the commission fee schedule for market makers. The trading volume ranking-related reward, commonly seen in market incentive programs from various exchanges, is not considered in El Euch et al. [10]. In this paper, we use a finite state MFG to model the competition between market makers in the presence of the trading volume ranking reward and solve the Nash equilibrium using the DNN approach. The results may help exchanges design better market incentive program by better understanding market makers’ behavior responding to the contract.

The rest of the paper is organized as follows. In Sect. 2, we formulate the finite state MFG model and state the main result of the paper, Theorem 2.6, on the error estimation of DNN approximate solution. In Sect. 3, we discuss an application in the optimal market making with rank-based trading volume reward. Section 4 contains the proofs of Theorems 2.4, 2.52.6 and Proposition 3.1. Section 5 concludes.

2 Model and Main Results

Define a finite state MFG in continuous time similar to the one in Cecchin and Pelino [9]. The finite state space is \(\Sigma = \{1, \ldots , K\}\), and the reference game player’s state is denoted by Z, which is a Markov chain. The player at state z can decide the switching intensities with feedback controls \(\lambda : [0, T] \times \Sigma \rightarrow ({\mathbb {R}}^{+})^{K}\) from \(\Sigma \) to \(({\mathbb {R}}^{+})^{K}\). The dynamic for the player is given by

$$\begin{aligned} \mathrm{d} Z_t = \sum _{z \in \Sigma } \mathrm{d} N^{z}_{t}, \end{aligned}$$

where \(N^{z}_{t}\) is a Poisson process with controlled intensity \(\lambda _{z}(t, Z_{t})\).

If there are some states that state z cannot access, then we can simply set the corresponding components in the intensity vector to be zero. The probability measure on mean field of states is a function \(p: [0, T] \rightarrow P(\Sigma )\), where

$$\begin{aligned} P(\Sigma ) = \{ (p_{1}, \ldots , p_{K}): \sum _{z = 1}^{K} p_{z} = 1, \ p_{z} \ge 0 \}. \end{aligned}$$

Start at time \(t \in [0, T]\), given any probability measure p on the mean field of state, game player with controlled state process \(Z_{t}\) that start at state z wants to optimize

$$\begin{aligned} \theta _{z}(t):= \sup _{\lambda \in {\mathcal {A}}} {\mathbb {E}}_t\left[ \int _{t}^{T} F(Z_{t}, \lambda (t, Z_{t})) \mathrm{d} t + G(Z_{T}, p(T))\right] , \end{aligned}$$
(2.1)

where \({\mathbb {E}}_t[\cdot ]\) is the conditional expectation given the initial state \(Z_{t} = z\) at time t, F the running payoff, G the terminal payoff and \({\mathcal {A}}\) the admissible control set containing all measurable functions \(\lambda : [0, T] \times \Sigma \rightarrow ({\mathbb {R}}^{+})^{K}\). We assume for any \(z \in \Sigma \), \(F(z, \lambda )\) is an upper bounded function which does not depend on \(\lambda _{z}\), the zth component of \(\lambda \). Define \(\theta : [0, T] \rightarrow {\mathbb {R}}^{K}\) by \(\theta (t) = (\theta _{1}(t), \ldots , \theta _{K}(t))\). According to Cecchin and Pelino [9], the equilibrium value function \(\theta \) and the mean field probability p satisfy the following FBODE system:

$$\begin{aligned} \begin{aligned} \frac{\mathrm{d} \theta _{z}(t)}{\mathrm{d} t}&= - H(z, \Delta ^{z} \theta (t)), \quad \theta _{z}(T) = G(z, p(T)), \\ \frac{\mathrm{d} p_{z}(t)}{\mathrm{d} t}&= \sum _{y} p_{y}(t) \lambda ^{*}_{z}(y, \Delta ^{y} \theta (t)), \quad p_{z}(t_{0}) = p_{z, 0}, \end{aligned} \end{aligned}$$
(2.2)

where \(\Delta ^{z}\) is the difference operator, defined as

$$\begin{aligned} \Delta ^{z} \theta (t):= (\theta _{1}(t) - \theta _{z}(t), \ldots , \theta _{K}(t) - \theta _{z}(t)) \end{aligned}$$

and \(H: \Sigma \times {\mathbb {R}}^{K} \rightarrow {\mathbb {R}}\) is the Hamiltonian function, defined for any \(\mu \in {\mathbb {R}}^{K}\) satisfying \(\mu _{z} = 0\) as

$$\begin{aligned} H(z, \mu ):= \sup _{\lambda \in ({\mathbb {R}}^{+})^{K}} \{ \lambda \cdot \mu + F(z, \lambda ) \} \end{aligned}$$

and \(\lambda ^{*}(z, \mu ) = (\lambda ^{*}_{1}(z, \mu ), \ldots , \lambda ^{*}_{K}(z, \mu ))\) is the optimizer of \(H(z, \mu )\) except for \(\lambda ^{*}_{z}(z, \mu )\), which can be any value since in the proof of our main result we always let \(\mu _z = [\Delta ^{z} \theta (t)]_{z} = \theta _z(t) - \theta _z(t) = 0\) and \(F(z, \lambda )\) is independent to \(\lambda _z\). For notation convenience, we define

$$\begin{aligned} \lambda ^{*}_{z}(z, \mu ):= - \sum _{y \ne z} \lambda ^{*}_{y}(z, \mu ). \end{aligned}$$
(2.3)

The backward equation in (2.2) comes from the optimization problem (2.1) given p and the forward equations comes from the consistent condition for probability measure p on mean field of states when everyone follows the equilibrium strategy. According to [12, Proposition 1], if H is differentiable and \(\lambda ^{*}(z, \mu )\) is positive except the zth element, for \(y \ne z\), we have

$$\begin{aligned} \lambda ^{*}_{y}(z, \mu ) = [D_{\mu } H(z, \mu )]_y, \end{aligned}$$

where \(\lambda ^{*}_{y}(z, \mu )\) is the intensity from state z to state y and \([D_{\mu } H(z, \mu )]_y\) the yth component of the gradient \(D_{\mu } H(z, \mu )\). In the proofs of main results, we always have \(\mu _z = 0\) when we use \(H(z, \mu )\), \(D_{\mu } H(z, \mu )\) or \(D^2_{\mu \mu } H(z, \mu )\), the Hessian matrix. For proof simplicity, with a little abuse of notation, we follow Cecchin and Pelino [9] to define artificially that

$$\begin{aligned} {[}D_{\mu } H(z, \mu )]_z = \lambda ^{*}_{z}(z, \mu ). \end{aligned}$$

Then we can conclude that

$$\begin{aligned} \lambda ^{*}(z, \mu ) = D_{\mu } H(z, \mu ), \end{aligned}$$

and the feedback control \(\lambda (t, z) = \lambda ^{*}(z, \Delta ^{z} \theta (t))\) under the equilibrium.

We next assume H, G and \(\lambda ^{*}\) satisfy the following assumptions.

Assumption 2.1

Assume that, under (2.3), \(H(z, \mu )\) has a unique optimizer \(\lambda ^{*}(z, \mu )\) for every \(\mu \). H is \(C^{2}\) in \(\mu \) on bounded set, H, \(D_{\mu } H\) and \(D^{2}_{\mu \mu } H\) are locally Lipschitz in \(\mu \), and the second derivatives is bounded away from 0 on bounded set, i.e., there exists a constant C such that for any \(\mu \) in that bounded set satisfying \(\mu _z = 0\), we have

$$\begin{aligned} C^{-1} \Vert \mu \Vert ^2\le \mu \cdot D^{2}_{\mu \mu } H(z, \mu ) \cdot \mu \le C \Vert \mu \Vert ^2. \end{aligned}$$
(2.4)

Moreover, assume that G is differentiable, and there exists a constant C such that when p is bounded, its directional derivative in any vector w satisfies

$$\begin{aligned} \left| \frac{\partial G}{\partial w}(z, p + \Delta p) - \frac{\partial G}{\partial w}(z, p) \right| \le C \Vert \Delta p \Vert \Vert w \Vert \end{aligned}$$
(2.5)

and that G is decreasing in p, i.e., for all \(p, {\bar{p}} \in {\mathbb {R}}^{K}\),

$$\begin{aligned} \sum _{z \in \Sigma } (G(z, p) - G(z, {\bar{p}}))(p_{z} - {\bar{p}}_z) \le 0. \end{aligned}$$
(2.6)

Remark 2.2

H satisfies \(H(z, \mu ) \ge H(z, {\bar{\mu }})\) for any \(z \in \Sigma \) if two vectors \(\mu = (\mu _1, \ldots , \mu _K)\) and \({\bar{\mu }} = ({\bar{\mu }}_1, \ldots , {\bar{\mu }}_K)\) satisfy

$$\begin{aligned} \mu _{i} \ge {\bar{\mu }}_{i}, \quad i \in \Sigma , i \ne z. \end{aligned}$$

Then from [12, Proposition 2], solution to (2.2) has a prior bound \(C_{GH}\) as long as H satisfies Remark 2.2 and G is bounded for all p(T) in compact set \([0, 1]^{K}\). \(C_{GH}\) is defined as,

$$\begin{aligned} \Vert \theta \Vert \le C_{GH}:= \max _{z \in \Sigma , p \in [0, 1]^K} \{G(z, p)\} + 2 \max _{z \in \Sigma } H(z, 0) T, \end{aligned}$$
(2.7)

where the norm \(\Vert \cdot \Vert \) is the \(\infty \) norm. G is bounded because it is continuous and defined on a compact set. For given H and G, as \(\theta \) satisfies ODE system (2.2), and both H is Lipschitz continuous in Assumption 2.1, \(\frac{\mathrm{d} \theta _{z}(t)}{\mathrm{d} t}\) is also bounded. Similarly, as \(D_{\mu } H\) and \(\frac{\mathrm{d} \theta _{z}(t)}{\mathrm{d} t}\) are bounded, we can further see that \(\frac{\mathrm{d}^2 \theta _{z}(t)}{\mathrm{d} t^2}\) is bounded. From similar argument on p and \(\lambda ^{*}\), \(\frac{\mathrm{d} p_{z}(t)}{\mathrm{d} t}\) and \(\frac{\mathrm{d}^2 p_{z}(t)}{d t^2}\) are also bounded. It means for given H and G, there exists constants \(C_{\theta GH}\) and \(C_{p GH}\), such that

$$\begin{aligned} \begin{aligned} \left\| \frac{\mathrm{d} \theta _{z}(t)}{\mathrm{d} t} \right\|&\le C_{\theta GH}, \quad \left\| \frac{\mathrm{d}^2 \theta _{z}(t)}{\mathrm{d} t^2} \right\| \le C_{\theta GH}, \\ \left\| \frac{\mathrm{d} p_{z}(t)}{\mathrm{d} t} \right\|&\le C_{p GH}, \quad \left\| \frac{\mathrm{d}^2 p_{z}(t)}{\mathrm{d} t^2} \right\| \le C_{p GH}. \end{aligned} \end{aligned}$$
(2.8)

We summarize [9, Theorem 1], [12, Theorem 2], and state the following theorem without proof.

Theorem 2.3

Under Assumption 2.1, ODE system (2.2) has a unique solution \((\theta , p)\) for any initial value \(p(t_{0}) \in P(\Sigma )\) and the MFG has a unique Nash equilibrium point.

We assume in the rest of the paper that Assumption 2.1 holds, which guarantees the existence, uniqueness and convergence of the finite state MFG. However, to find the equilibrium, we need to solve (2.2), which generally does not have analytical solution. As (2.2) is a FBODE system, we cannot solve it numerically by simple discretization. We apply the deep neural network approach in Sirignano and Spiliopoulos [30] to numerically solve (2.2).

Define two sets of neural network functions as

$$\begin{aligned} \begin{aligned} {\varvec{\Theta }}^{n}(\nu _1, \nu )&:= \left\{ {\tilde{\theta }}: [0, T] \rightarrow {\mathbb {R}}^{K}; \quad {\tilde{\theta }}(t) \right. \\&\left. = \left( \nu _1\left( \sum _{i = 1}^{n} \beta _{1, i} \nu ( \alpha _{i} t + c_{i}) \right) \right) , \ldots , \nu _1\left( \sum _{i = 1}^{n} \beta _{K, i} \nu ( \alpha _{i} t + c_{i}) )\right) \right\} , \\ {\mathbf {P}}^{n}(\nu _2, \nu )&:= \left\{ {\tilde{p}}: [0, T] \rightarrow {\mathbb {R}}^{K-1}; \quad {\tilde{p}}(t) \right. \\&\left. = \left( \nu _2\left( \sum _{i = n+1}^{2 n} \beta _{1, i} \nu ( \alpha _{i} t + c_{i}) \right) \right) , \ldots , \nu _2\left( \sum _{i = n+1}^{2 n} \beta _{K - 1, i} \nu ( \alpha _{i} t + c_{i}) )\right) \right\} , \end{aligned} \end{aligned}$$

where \(\nu : {\mathbb {R}} \rightarrow {\mathbb {R}}\) is the triple continuously differentiable activation function, and two strictly increasing triple continuously differentiable activation functions \(\nu _1, \nu _2: {\mathbb {R}} \rightarrow {\mathbb {R}}\) have twice continuously differentiable inverse functions \(\nu _1^{-1}\) and \(\nu _2^{-1}\). They satisfy

$$\begin{aligned} \sup | \nu _1 | = C_{GH} + e, \quad \inf \nu _2 = -e, \quad \sup \nu _2 = 1 + e, \end{aligned}$$
(2.9)

where e is any constant that is small.

In the numerical tests of this paper, we use hyperbolic tangent functions \(\tanh \) for activation functions, in particular, \(\nu _1(x)=a\tanh x +b\) for some constants ab and \(\nu _2(x)=(\tanh x +1)/2\). We approximate the solution \((\theta , p)\) to (2.2) numerically by \(({\tilde{\theta }}^{(N)}, {\tilde{p}}^{(N)})\) which satisfies

$$\begin{aligned} ({\tilde{\theta }}_{1}^{(N)}, \ldots , {\tilde{\theta }}^{(N)}_{K}) \in {\varvec{\Theta }}^{N}(\nu _1, \nu ), \quad ({\tilde{p}}^{(N)}_{1}, \cdots , {\tilde{p}}^{(N)}_{K-1}) \in {\mathbf {P}}^{N}(\nu _2, \nu ), \quad {\tilde{p}}^{(N)}_{K} = 1 - \sum _{i \ne K} {\tilde{p}}^{(N)}_{i}.\nonumber \\ \end{aligned}$$
(2.10)

By considering both the differential operators and boundary conditions in (2.2), we define the loss function \(\Psi \) for any approximate solution \(({\tilde{\theta }}, {\tilde{p}})\) as

$$\begin{aligned} \begin{aligned} \Psi ({\tilde{\theta }}, {\tilde{p}})&:= \sum _{z \in \Sigma }\left\{ \int _{t_{0}}^{T} \left( \frac{\mathrm{d} {\tilde{\theta }}_{z}(t)}{\mathrm{d} t} + H(z, \Delta ^{z} {\tilde{\theta }}(t))\right) ^2 \mathrm{d} t \right. \\&\quad + \int _{t_{0}}^{T} \left( \frac{\mathrm{d} {\tilde{p}}_{z}(t)}{\mathrm{d} t} - \sum _{y} {\tilde{p}}_{y}(t) \lambda ^{*}_{z}(y, \Delta ^{y} {\tilde{\theta }}(t))\right) ^2 \mathrm{d} t + \int _{t_{0}}^{T} \left( \sum _{z} ({\tilde{p}}_{z}(t))^{-}\right) ^{2} \mathrm{d} t \\&\quad + ({\tilde{p}}_{z}(t_{0}) - {\tilde{p}}_{z, 0})^2 + ({\tilde{\theta }}_{z}(T) - G(z, {\tilde{p}}(T)))^2 \\&\left. \quad + \sum _{z \in \Sigma } \left( B_{\theta } - \max _{t \in [0, T]} \bigg | \frac{\mathrm{d}^2 {\tilde{\theta }}_{z}(t)}{\mathrm{d} t^2} \bigg | \right) ^{-} + \sum _{z \in \Sigma } \left( B_{p} - \max _{t \in [0, T]} \bigg |\frac{\mathrm{d}^2 {\tilde{p}}_{z}(t)}{\mathrm{d} t^2} \bigg | \right) ^{-} \right\} , \end{aligned}\qquad \quad \end{aligned}$$
(2.11)

where \(({\tilde{p}}_{K}(t))^{-}:= - {\tilde{p}}_{K}(t) \mathbb {1}_{ \{ p_{K}(t) \le 0 \} }\) and \(B_{\theta }\), \(B_{p}\) can be any constants that satisfy

$$\begin{aligned} \begin{aligned} B_{\theta }> C_{\theta GH}&\ge \max _{t \in [0, T]} \bigg | \frac{\mathrm{d}^2 \theta _{z}(t)}{\mathrm{d} t^2} \bigg |, \\ B_{p} > C_{p GH}&\ge \max _{t \in [0, T]} \bigg | \frac{\mathrm{d}^2 p_{z}(t)}{\mathrm{d} t^2} \bigg |, \end{aligned} \end{aligned}$$

where constants \(C_{\theta GH}\) and \(C_{p GH}\) are from (2.8). Then it follows

$$\begin{aligned} \sum _{z \in \Sigma } \left( B_{\theta } - \max _{t \in [0, T]} \bigg | \frac{\mathrm{d}^2 \theta _{z}(t)}{\mathrm{d} t^2} \bigg | \right) ^{-} + \sum _{z \in \Sigma } \left( B_{p} - \max _{t \in [0, T]} \bigg |\frac{\mathrm{d}^2 p_{z}(t)}{\mathrm{d} t^2} \bigg | \right) ^{-} = 0. \end{aligned}$$

Both the integral term and maximum term in (2.11) can be calculated via Monte Carlo simulation. Practically, we use similar approach as in Sirignano and Spiliopoulos [30] to calculate these two to increase the robustness of training. Given N, the structure of the neural network has been determined. We train the network by finding the optimal values of \(\{ \beta _{j, i} \}^{2 K - 1, 2 n}_{i, j = 1}\), \(\{ \alpha _{i} \}^{2 n}_{i = 1}\) and \(\{ c_{i} \}^{2 n}_{i = 1}\) that determine \(({\tilde{\theta }}^{(N)}, {\tilde{p}}^{(N)})\) such that they minimize \(\Psi \). For the true solution \((\theta , p)\), \(\Psi (\theta , p) = 0\). Since \((\theta , p)\) exists and is unique, \(\Psi \) has unique minimal point \(\Psi (\theta , p) = 0\). We provide the convergence result Theorem 2.4 similar to the Theorem 7.1 in Sirignano and Spiliopoulos [30].

Theorem 2.4

There exists a sequence of \(({\tilde{\theta }}^{(N)}, {\tilde{p}}^{(N)})\) defined in (2.10) such that

$$\begin{aligned} \lim _{N \rightarrow + \infty } \Psi ({\tilde{\theta }}^{(N)}, {\tilde{p}}^{(N)}) = 0. \end{aligned}$$

Proof

See Sect. 4.

When the Loss function \(\Psi \) is smaller than certain value, because of the uniform bounds on the first and second derivatives of the approximation function, the maximum error on the time interval is also smaller than certain value.

Theorem 2.5

There exists constant \(\varepsilon _{0}\), such that for any \(\varepsilon < \varepsilon _{0}\), if \(\Psi < \varepsilon \) then there exists constant C such that for all \(t \in [t_{0}, T]\) and \(z \in \Sigma \), we have

$$\begin{aligned} \begin{aligned}&\bigg \Vert \frac{\mathrm{d} {\tilde{\theta }}_{z}(t)}{\mathrm{d} t} + H(z, \Delta ^{z} {\tilde{\theta }}(t)) \bigg \Vert< C \varepsilon ^{\frac{1}{3}}, \quad \bigg \Vert {\tilde{\theta }}_{z}(T) - G(z, {\tilde{p}}(T))\bigg \Vert< C \varepsilon ^{\frac{1}{2}}, \\&\quad \bigg \Vert \frac{\mathrm{d} {\tilde{p}}_{z}(t)}{\mathrm{d} t} - \sum _{y} {\tilde{p}}_{y}(t) \lambda ^{*}_{z}(y, \Delta ^{y} {\tilde{\theta }}(t)) \bigg \Vert< C \varepsilon ^{\frac{1}{3}}, \quad \bigg \Vert {\tilde{p}}_{z}(t_{0}) - p_{z, 0} \bigg \Vert < C \varepsilon ^{\frac{1}{2}}, \end{aligned} \end{aligned}$$
(2.12)

where \(\Vert \cdot \Vert \) is the infinity norm.

Proof

See Sect. 4.

Note that the constant C in Theorem 2.5 depends on the FBODE system and the bound of its true solution, but is independent of the DNN structure. Theorem 2.5 is an algorithm independent result. We now state our main result on the error estimation for numerical solution to the finite state mean field game.

Theorem 2.6

For every \(t \in [t_{0}, T]\) and \(z \in \Sigma \), assume \({\tilde{\theta }}(t)\) and \({\tilde{p}}(t)\) satisfy:

$$\begin{aligned} \begin{aligned} \frac{\mathrm{d} {\tilde{\theta }}_{z}(t)}{\mathrm{d} t}&= - H(z, \Delta ^{z} {\tilde{\theta }}(t)) + \epsilon _{1}(t, z), \quad {\tilde{\theta }}_{z}(T) = G(z, {\tilde{p}}(T)) + \epsilon _{3}(z), \\ \frac{\mathrm{d} {\tilde{p}}_{z}(t)}{\mathrm{d} t}&= \sum _{y} {\tilde{p}}_{y}(t) \lambda ^{*}_{z}(y, \Delta ^{y} {\tilde{\theta }}(t)) + \epsilon _{2}(t, z), \quad {\tilde{p}}_{z}(t_{0}) = p_{z, 0} + \epsilon _{4}(z), \end{aligned} \end{aligned}$$
(2.13)

where \(p_{0} \in P(\Sigma )\), \({\tilde{p}}_{K}(t) = 1 - \sum _{z \ne K} {\tilde{p}}_{z}(t)\) and \({\tilde{p}}_{z}(t) \in [0, 1]\) for \(z < K\). Then there exists uniform constant B and \(N_{0}\), such that when \(N > N_{0}\) and

$$\begin{aligned} \sum _{i=1}^{2} |\epsilon _{i}(t, z)| + \sum _{i=3}^{4} |\epsilon _{i}(z)| + ({\tilde{p}}_{K}(t))^{-} \le \frac{1}{N}, \quad \forall (t, z) \in [t_{0}, T] \times \Sigma , \end{aligned}$$

we have for all \(t \in [t_{0}, T]\) and \(z \in \Sigma \),

$$\begin{aligned} |\theta _{z}(t) - {\tilde{\theta }}_{z}(t)| + |p_{z}(t) - {\tilde{p}}_{z}(t)| \le \frac{B}{N}. \end{aligned}$$

Proof

See Sect. 4.

Note that the constant B depends on the FBODE system and the bound of its true solution, but is independent of the DNN structure, which implies that Theorem 2.6 is an algorithm independent result.

Combining Theorems 2.4 and 2.6, we immediately have the following result.

Theorem 2.7

The sequence \(({\tilde{\theta }}^{(N)}, {\tilde{p}}^{(N)})\) defined in (2.10) converges uniformly to the true solution \((\theta , p)\) of the FBODE (2.2), that is, for \(t \in [0, T]\),

$$\begin{aligned} \lim _{N \rightarrow + \infty } {\tilde{\theta }}^{(N)}(t) = \theta (t), \quad \lim _{N \rightarrow + \infty } {\tilde{p}}^{(N)}(t) = p(t). \end{aligned}$$

Remark 2.8

Although we only prove the convergence and error estimation results for a two-layer neural network structure characterized by \({\varvec{\Theta }}^{n}(\nu _1, \nu )\) and \({\mathbf {P}}^{n}(\nu _2, \nu )\), the idea and the proof can be easily adapted to more sophisticated neural network models (more layers, LSTM, etc.) as they share similar structures.

3 Application: Optimal Market Making with Rank-Based Reward

The model setting is similar to Guéant [15], except that exchange provides incentive reward for market making. The terminal payoffs of market makers depend on their trading volumes and rankings in the market. The optimization problems of different market makers are coupled. It is in general difficult to solve a finite players game due to high dimensionality, but MFG provides a good approximation, see [8, 9] and [12]. We therefore use MFG as a proxy to solve the optimal market marking problem.

Consider a continuum family of market makers \(\Omega _{mm}\) who keep quoting bid/ask limit orders. Select one of them as a reference market maker. Assume asset price \(S_t\) follows a Brownian motion with initial value S,

$$\begin{aligned} dS_{t} = \sigma dW_{t}, \end{aligned}$$

where \(W_{t}\) is a standard Brownian adapted to the natural filtration \(\{ {\mathcal {F}}^{W}_{t} \}_{t \in {\mathbb {R}}_{+} }\) and \(\sigma \) the volatility of the stock. Assume \(\delta ^{a}_{t}\) and \(\delta ^{b}_{t}\) are ask/bid spreads, which are controls determined by the reference market maker. Denote by \(N_{t}^{a}\) and \(N_{t}^{b}\) the jump processes for buy/sell market order arrivals to the reference market maker, with intensities \(\Lambda (\delta ^{a}_{t})\) and \(\Lambda (\delta ^{b}_{t})\), respectively. Assume \(\Lambda : {\mathbb {R}} \rightarrow {\mathbb {R}}\) has continuous inverse function, and is decreasing, continuously differentiable satisfying:

$$\begin{aligned} \frac{\partial ^2 \Lambda }{\partial \delta ^2}(\delta ) \Lambda (\delta ) < 2 \left( \frac{\partial \Lambda }{\partial \delta }(\delta )\right) ^2. \end{aligned}$$
(3.1)

The reference market maker has state variables \((X_{t}, q_{t}, v_{t})\) in which \(X_t\) is cash account, \(q_t\) inventory, \(v_t\) accumulated trading volume, with initial value (xqv). We assume \(q_{t}\) can only take values in a finite set \({\mathbf {Q}} = \{-Q, \cdots , Q\}\) and \(v_{t}\) can only take values in a finite set \({\mathbf {V}}:= \{0, \ldots , v_{\max }\}\) and any trading volume above \(v_{\max }\) is not counted in the reward calculation. Denote by \(I^{b}:= \mathbb {1}_{q + 1 \in {\mathbf {Q}}}\) and \(I^{a}:= \mathbb {1}_{q - 1 \in {\mathbf {Q}}}\) the indicators of market maker’s buying and selling capabilities.

The dynamic of \(X_{t}\) is given by

$$\begin{aligned} \mathrm{d} X_{t} = (S_{t} + \delta ^{a}_{t}) I^{a}(q_{t}) \mathrm{d} N_{t}^{a} - (S_{t} - \delta ^{b}_{t}) I^{b}(q_{t}) \mathrm{d} N_{t}^{b}, \end{aligned}$$

that of \(q_{t}\) by

$$\begin{aligned} d q_{t} = I^{b}(q_{t}) \mathrm{d} N_{t}^{b} - I^{a}(q_{t}) \mathrm{d} N_{t}^{a}, \end{aligned}$$

and that of \(v_t\) by

$$\begin{aligned} d v_{t} = (I^{b}(q_{t}) \mathrm{d} N_{t}^{b} + I^{a}(q_{t}) \mathrm{d} N_{t}^{a}) \mathbb {1}_{\{ v_{t} < v_{\max } \}}. \end{aligned}$$

Denote by \(p(t, q_{t}, v_{t})\) the probability measure on the mean field of discrete states \((q_{t}, v_{t})\) for all market makers. The reference market maker wants to solve the following optimization problem:

$$\begin{aligned} \sup _{\delta ^{a}, \delta ^{b}} {\mathbb {E}}_0\left[ X_{T}+ q_{T} S_{T} - l (|q_{T}|) + R(v_{T}) - \frac{1}{2} \gamma \sigma ^{2} \int _{0}^{T} q_{t}^{2} \mathrm{d} t\right] , \end{aligned}$$
(3.2)

where \(X_T+q_TS_T\) is the cash value at time T, \(l (|q_{T}|)\) the terminal inventory holding penalty with l an increasing convex function on \(R_+\) with \(l(0)=0\), \(\gamma \sigma ^{2} \int _{0}^{T} q_{t}^{2} dt\) the accumulated running inventory holding penalty with \(\gamma \) a positive constant representing the risk adverse level, and \(R(v_{T})\) the cash reward given by the exchange as incentive for market markers to increase trading volume \(v_T\). We consider the rank-based trading volume reward, defined by

$$\begin{aligned} R(v_{T}):= R \left( 1 - \sum _{j = v_{T}}^{v_{\max }} \sum _{i = -Q}^{Q} p(T, i, j)\right) , \end{aligned}$$
(3.3)

where \(1 - \sum _{j = v_{T}}^{v_{\max }} \sum _{i = -Q}^{Q} p(T, i, j)\) is the percentage of market makers that the reference market maker exceeds in trading volumes and R a positive constant representing the maximum reward set by the exchange.

Using the martingale property, (3.2) can be reduced to \(x + q S + \theta (0, q, v)\), where \(\theta \) is the value function defined by

$$\begin{aligned} \theta (t, q, v):= & {} \sup _{\delta ^{a}, \delta ^{b}} {\mathbb {E}}_t\left[ \int _{t}^{T} \left[ \delta ^{a}_{s} \Lambda (\delta ^{a}_{s}) + \delta ^{b}_{s} \Lambda (\delta ^{b}_{s}) - \frac{1}{2} \gamma \sigma ^{2} q_{s}^{2} \right] \mathrm{d} s \right. \\&\left. - l (|q_{T}|) + R \left( 1 - \sum _{j = v_{T}}^{v_{\max }} \sum _{i = -Q}^{Q} p(T, i, j)\right) \right] \end{aligned}$$

and \({\mathbb {E}}_t[\cdot ]\) is the conditional expectation given \(q_{t} = q, v_{t} = v\).

We assume market maker takes closed loop feedback control, i.e., when market maker has state (qv),

$$\begin{aligned} \delta ^{a}_{t} = \delta ^{a}(t, q, v), \quad \delta ^{b}_{t} = \delta ^{b}(t, q, v). \end{aligned}$$

Since the only relevant states are \(q_{t}\) and \(v_{t}\) that both take values in finite sets, the problem can be reduced to a continuous time finite state MFG discussed in Cecchin and Pelino [9] by reformulating some notations as follows. Define \(K:= (2 Q + 1) (v_{\max } + 1)\) and \(\Sigma := \{1, \cdots , K\}\). There is a one-to-one mapping \(Z: {\mathbf {Q}} \times {\mathbf {V}} \rightarrow \Sigma \): For every \((q, v) \in {\mathbf {Q}} \times {\mathbf {V}}\), there exists \(z \in \Sigma \) such that

$$\begin{aligned} z = Z(q, v) \end{aligned}$$

and for every \(z \in \Sigma \), there exists \((q, v) \in {\mathbf {Q}} \times {\mathbf {V}}\) such that

$$\begin{aligned} (q, v) = Z^{-1}(z):=(Z^{-1}_{1}(z), Z^{-1}_{2}(z)). \end{aligned}$$

The state (qv) is then reformulated by state z. The value function \(\theta \) and probability measure on mean field of state p are reformulated as \(\theta , p: [0, T] \rightarrow {\mathbb {R}}^{K}\), where

$$\begin{aligned} \begin{aligned} \theta (t)&:= (\theta _{1}(t), \ldots , \theta _{K}(t)), \quad \theta _{z}(t) = \theta (t, Z^{-1}_{1}(z), Z^{-1}_{2}(z)) \\ p(t)&:= (p_{1}(t), \ldots , p_{K}(t)), \quad p_{z}(t) = p(t, Z^{-1}_{1}(z), Z^{-1}_{2}(z)). \end{aligned} \end{aligned}$$

Define \(\lambda \) as

$$\begin{aligned} \lambda (t, z):= (\lambda _{1}(t, z), \ldots , \lambda _{K}(t, z)), \end{aligned}$$

where \(\lambda \) satisfy

$$\begin{aligned} \begin{aligned} \lambda _{\beta ^{a}(z)}(t, z)&:= \lambda (\delta ^{a}_{t})> 0; \quad \lambda _{\beta ^{b}(z)}(t, z):= \lambda (\delta ^{b}_{t}) > 0; \\ \lambda _{z}(t, z)&:= - \sum _{y \ne z} \lambda _{y}(t, z); \quad \lambda _{y}(t, z):= 0 \quad y \ne \beta ^{a}(z), \beta ^{b}(z), z. \end{aligned} \end{aligned}$$
(3.4)

\(\beta ^{a}(z)\) and \(\beta ^{b}(z)\) are defined as the two accessible states from state z,

$$\begin{aligned} \begin{aligned} \beta ^{a}(z)&= \left\{ \begin{array}{lr} Z(Z^{-1}_{1}(z) -1, Z^{-1}_{2}(z) +1) &{} {Z^{-1}_{1}(z)> -Q, Z^{-1}_{2}(z)< v_{\max }}\\ Z(Z^{-1}_{1}(z)-1, v_{\max }) &{} {Z^{-1}_{1}(z) > -Q, Z^{-1}_{2}(z) = v_{\max }}\\ z &{} {Z^{-1}_{1}(z) = -Q} \end{array} \right. \\ \beta ^{b}(z)&= \left\{ \begin{array}{lr} Z(Z^{-1}_{1}(z)+1, Z^{-1}_{2}(z)+1) &{} {Z^{-1}_{1}(z)< Q, Z^{-1}_{2}(z)< v_{\max }} \\ Z(Z^{-1}_{1}(z)+1, v_{\max }) &{} {Z^{-1}_{1}(z) < Q, Z^{-1}_{2}(z) = v_{\max }} \\ z &{} {Z^{-1}_{1}(z) = Q} \end{array} \right. \end{aligned} \end{aligned}$$

Define F and G as

$$\begin{aligned} \begin{aligned} F(t, z, \lambda (t, z))&:= \Lambda ^{-1}(\lambda _{\beta ^{a}(z)}(t, z)) \lambda _{\beta ^{a}(z)}(t, z) \\&\quad + \Lambda ^{-1}(\lambda _{\beta ^{b}(z)}(t, z)) \lambda _{\beta ^{b}(z)}(t, z) - \frac{1}{2} \gamma \sigma ^{2} Z^{-1}_{1}(z)^2 \\ G(z, p)&:= - l (|Z^{-1}_{1}(z)|) + R \left( 1 - \sum _{j = v}^{v_{\max }} \sum _{i = -Q}^{Q} p_{Z(i, j)}\right) . \end{aligned} \end{aligned}$$
(3.5)

The optimal market making problem is now reduced to a continuous time finite state MFG discussed in Sect. 2 of this paper. Denote the game as \(G_{R}\). We have the following result.

Proposition 3.1

\(G_{R}\) satisfies Assumption 2.1.

Proof

See Sect. 4.

According to Cecchin and Pelino [9], the Nash equilibrium of MFG \(G_{R}\) and that of finite players game exist and are unique, and the game with N players converges to the limiting MFG case in the order \(O(\frac{1}{N})\). We can numerically solve the FBODE system corresponding to the finite state MFG in (3.5) and find the equilibrium value function and probability of mean field by the DNN technique.

We next do some numerical tests. We use a LSTM (long short-term memory) neural network to approximate the solution \((\theta , p)\). Denote the function constructed by LSTM neural network as \(({\tilde{\theta }}(t, \beta ), {\tilde{p}}(t, \beta ))\), where \(\beta \) is the parameters set for neural network, designed by the following steps: Layer 0 is the input \(t \in [0, T]\) and layer k with output \(h_k\) is designed as follows:

$$\begin{aligned} \begin{aligned} f_k&= \sigma _g(W_{f} t + U_{f} h_{k-1} + b_f) \\ i_k&= \sigma _g(W_{i} t + U_{i} h_{k-1} + b_i) \\ o_k&= \sigma _g(W_{o} t + U_{o} h_{k-1} + b_o) \\ {\tilde{c}}_t&= \sigma _c(W_{c} t + U_{c} h_{k-1} + b_c) \\ c_k&= f_k \circ c_{k-1} + i_k \circ {\tilde{c}}_k \\ h_k&= o_k \circ \sigma _h(c_k) \end{aligned} \end{aligned}$$

with the initial values \(c_{0} = h_{0} = 0\), where the operator \(\circ \) is the element-wise product, functions \(\sigma \) some scaled \(\tanh \) activation functions (hence satisfying all assumptions in our main results), \(t \in [0, T]\) input to the LSTM network, \(f_k \in {\mathbb {R}}^{h}\) forget gate’s activation vector, \(i_k \in {\mathbb {R}}^{h}\) input/update gate’s activation vector, \(o_k \in {\mathbb {R}}^{h}\) output gate’s activation vector, \(h_k \in {\mathbb {R}}^{h}\) hidden state vector, \({\tilde{c}}_k \in {\mathbb {R}}^{h}\) cell input activation vector, \(c_k \in {\mathbb {R}}^{h}\) cell state vector, \(W \in {\mathbb {R}}^{h \times 1}, U \in {\mathbb {R}}^{h \times h}, b \in {\mathbb {R}}^{h}\) weight matrices and bias vector parameters which need to be learned during training, and h the number of hidden units. LTSM network is an advanced version of a traditional neural network and provides more accurate approximation for complicated functions. For our model, this specific structure performs better than that of traditional neural network.

We use a LSTM network with 3 layers and 32 nodes per layer. The network is trained by stochastic gradient approach with mesh-free randomly sampling points in [0, T]. This randomness adds to the robustness of the network. The detailed training procedure is similar to that in Sirignano and Spiliopoulos [30].

The market order arrival intensity function is given by \( \Lambda (\delta ):= A e^{-k \delta }\) and the liquidity penalty function \(l(q):= a q^2\). We assume all market makers start at 0 inventory and 0 trading volume. The benchmark data used are \(S=20\), hourly volatility \(\sigma =0.01\), \(\gamma =1\), \(T=10\) hours, capacity \(Q=1\), \(v_{\max }=10\), \(k=2\), \(a=2\), \(A=0.5\), and \(R=2\).

There are two typical schemes of trading volume rewards in most of exchanges’ incentive programs. One is the rank-based trading volume reward as in (3.3), and the other is the linear trading volume reward, defined by

$$\begin{aligned} R_L(v_{T}):= R \frac{v_{T}}{v_{\max }}. \end{aligned}$$
(3.6)

Since \(R_L(v_{T})\) is independent of the mean field of state, the FBODE system for the value function and the probability of mean field of state is decoupled and can be solved numerically with a standard Euler scheme. We next do numerical tests and compare the value functions, optimal bid/ask spreads and probability distributions of trading volumes under three different trading volume reward schemes: 1. no trading volume reward (\(R = 0\), benchmark case), 2. linear trading volume reward (R in (3.6)), and 3. rank-based trading volume reward (R in (3.3)). The rank-based reward introduces competition between market makers, whereas the linear reward does not. The training result is satisfactory and the average loss is less than 0.003.

Fig. 1
figure 1

\(\theta (t, 0, v)\) Path for two schemes

Fig. 2
figure 2

\(\theta (t, 1, v)\) Path for two schemes

Figures 1 and 2 show the value functions \(\theta (t, q, v)\) for fixed \(q=0,1\) and varied v with the ‘Benchmark’ representing no trading volume reward, ‘\(v = 0\) lin’ path the linear reward and initial trading volume \(v = 0\), and ‘\(v = 0\)’ the rank-based reward. It is clear that the introduction of market incentive R increases the value functions for market makers, and the higher the initial trading volume v, the higher the value function \(\theta \). Even for market makers with initial trading volume \(v = 0\), the value functions are still higher than the benchmark one as they benefit from their potential market incentive gains, which explains the convergence of the curves for \(v=0\) to the benchmark one as t tends to T. The value functions for linear and rank awards are largely the same.

4 Proofs

4.1 Proof of Theorem 2.4

Proof

According to Theorem 2.3, there exists unique solution \((\theta , p)\) to ODE system (2.2), which is also the unique minimal point for \(\Psi \) such that

$$\begin{aligned} \Psi (\theta , p) = 0. \end{aligned}$$

We use \((\nu _i^{-1})'\) to denote the first-order derivative of \(\nu _i^{-1}\) for \(i = 1, 2\). From (2.7) we know \(\theta \) is bounded by \(C_{GH}\). Hence, \(\frac{\mathrm{d}}{\mathrm{d} t} \theta _{z}(t)\) is also bounded uniformly for t and z. Moreover, \(p(t) \in P(\Sigma )\) for any \(t \in [0, T]\), and hence, is also bounded. From the assumption on \(\nu _1\), \(\nu _2\), we know

$$\begin{aligned} \begin{aligned} \theta _{z}(t)&< \sup \Vert \nu _{1} \Vert , \\ \inf \nu _{2}&< p_{z}(t) < \sup \nu _{2}. \end{aligned} \end{aligned}$$

It means \(\theta _{z}\)’s image is bounded and a strict subset of \(\nu _{1}^{-1}\)’s domain. It is similar for \(p_{z}\) and \(\nu _{2}^{-1}\). Combining with the continuously differentiability of \(\nu _{1}^{-1}\) and \(\nu _{2}^{-1}\), we know \(\nu _{1}^{-1}(\theta _{z}(t))\), \((\nu _{1}^{-1})'(\theta _{z}(t))\), \(\nu _{2}^{-1}(p_{z}(t))\) and \((\nu _{2}^{-1})'(p_{z}(t))\) are bounded by some constant C uniformly for t and z. \((\nu _{i})'\) and \((\nu _{i})''\) are Lipschitz continuous on \([-2 C, 2 C]\) with coefficient L for \(i = 1, 2\). Define \( C^{N}(\nu ):= \{ \zeta : [0, T] \rightarrow {\mathbb {R}}; \quad \zeta (t) = \sum _{i = 1}^{N} \beta _{i} \nu ( \alpha _{i} t + c_{i}) \}\). According to the proof of Theorem 7.1 in Sirignano and Spiliopoulos [30], for any \(0< \varepsilon < C\), there exists \(N > 0\) and \(y_{z} \in C^{N}(\nu )\) such that

$$\begin{aligned}&\left\| y_{z}(t) - \nu _{1}^{-1}(\theta _{z}(t)) \right\| + \left\| \frac{\mathrm{d}}{\mathrm{d} t} y_{z}(t) - \frac{\mathrm{d}}{\mathrm{d} t} \nu _{1}^{-1}(\theta _{z}(t)) \right\| \nonumber \\&\quad + \left\| \frac{\mathrm{d}^2}{\mathrm{d} t^2} y_{z}(t) - \frac{\mathrm{d}^2}{\mathrm{d} t^2} \nu _{1}^{-1}(\theta _{z}(t)) \right\| \le \varepsilon . \end{aligned}$$
(4.1)

Hence, we have

$$\begin{aligned} \Vert \nu _{1}(y_{z}(t)) - \theta _{z}(t) \Vert \le C \Vert y_{z}(t) - \nu _{1}^{-1}(\theta _{z}(t)) \Vert \le C \varepsilon . \end{aligned}$$

On the other hand,

$$\begin{aligned} \begin{aligned} \frac{\mathrm{d}}{\mathrm{d} t} \nu _{1}(y_{z}(t)) - \frac{\mathrm{d}}{\mathrm{d} t} \theta _{z}(t)&= \frac{\mathrm{d}}{\mathrm{d} t} \nu _{1}(y_{z}(t)) - \frac{\mathrm{d}}{\mathrm{d} t} \nu _{1}(\nu ^{-1}_{1}(\theta _{z}(t))) \\&= (\nu _{1})'(y_{z}(t)) \frac{\mathrm{d}}{\mathrm{d} t} y_{z}(t) - (\nu _{1})'(\nu ^{-1}_{1}(\theta _{z}(t))) \frac{\mathrm{d}}{\mathrm{d} t} \nu _{1}^{-1}(\theta _{z}(t)) \\&= (\nu _{1})'(y_{z}(t)) \left[ \frac{\mathrm{d}}{\mathrm{d} t} y_{z}(t) - \frac{\mathrm{d}}{\mathrm{d} t} \nu _{1}^{-1}(\theta _{z}(t))\right] \\&\quad + \frac{\mathrm{d}}{\mathrm{d} t} \nu _{1}^{-1}(\theta _{z}(t)) [(\nu _{1})'(y_{z}(t)) - (\nu _{1})'(\nu ^{-1}_{1}(\theta _{z}(t)))]. \end{aligned} \end{aligned}$$

As \(y_{z}(t) \in [-2 C, 2 C]\), there exists constant \(C_1\) such that \((\nu _{1})'(y_{z}(t))\) is bounded by it uniformly. Moreover, we have

$$\begin{aligned} \left\| \frac{\mathrm{d}}{\mathrm{d} t} \nu _{1}^{-1}(\theta _{z}(t)) \Vert \le \Vert (\nu _{1}^{-1})'(\theta _{z}(t)) \right\| \left\| \frac{\mathrm{d}}{\mathrm{d} t} \theta _{z}(t) \right\| \le C^{2}. \end{aligned}$$

Hence, we have

$$\begin{aligned} \begin{aligned} \left\| \frac{\mathrm{d}}{\mathrm{d} t} \nu _{1}(y_{z}(t)) - \frac{\mathrm{d}}{\mathrm{d} t} \theta _{z}(t) \right\|&\le \left\| \frac{\mathrm{d}}{\mathrm{d} t} y_{z}(t) - \frac{\mathrm{d}}{\mathrm{d} t} \nu _{1}^{-1}(\theta _{z}(t)) \right\| \Vert \nu _1'(y_{z}(t)) \Vert \\&\quad + \left\| \nu _1'(y_{z}(t)) - \nu _1'(\nu _{1}^{-1}(\theta _{z}(t)) ) \right\| \left\| (\nu _{1}^{-1})'(\theta _{z}(t)) \right\| \left\| \frac{\mathrm{d}}{\mathrm{d} t} \theta _{z}(t) \right\| \\&\le C_1 \left\| \frac{\mathrm{d}}{\mathrm{d} t} y_{z}(t) - \frac{\mathrm{d}}{\mathrm{d} t} \nu _{1}^{-1}(\theta _{z}(t)) \right\| \\&\quad + C^{2} L \Vert y_{z}(t) - \nu ^{-1}_{1}(\theta _{z}(t)) \Vert \le (C_1 + C^{2} L) \varepsilon . \end{aligned} \end{aligned}$$

The first inequality above comes from the boundedness and Lipschitz continuity of \(\nu _{1}'\), as well as the boundedness of \((\nu _{1}^{-1})'(\theta _{z}(t))\). Moreover, for second-order derivatives, we have

$$\begin{aligned} \left\| \frac{\mathrm{d}^2}{\mathrm{d} t^2} \nu _{1}^{-1}(\theta _{z}(t)) \right\| = \left\| (\nu _{1}^{-1})''(\theta _{z}(t)) \frac{\mathrm{d}}{\mathrm{d} t} \theta _{z}(t) + (\nu _{1}^{-1})'(\theta _{z}(t)) \left( \frac{\mathrm{d}}{\mathrm{d} t} \theta _{z}(t)\right) ^2 \right\| . \end{aligned}$$

As \(\theta _{z}(t)\) and \(\frac{\mathrm{d}}{\mathrm{d} t} \theta _{z}(t)\) are bounded and \((\nu _{1}^{-1})\) is twice continuously differentiable, \(\frac{\mathrm{d}^2}{\mathrm{d} t^2} \nu _{1}^{-1}(\theta _{z}(t))\) is bounded. To estimate the difference of second-order derivatives between approximation function and true function, we have

$$\begin{aligned} \begin{aligned}&\frac{\mathrm{d}^2}{\mathrm{d} t^2} \nu _{1}(y_{z}(t)) - \frac{\mathrm{d}^2}{\mathrm{d} t^2} \theta _{z}(t) = \left( \frac{\mathrm{d}}{\mathrm{d} t} y_{z}(t)\right) ^2 \nu _1''(y_{z}(t)) + \nu _1'(y_{z}(t)) \frac{\mathrm{d}^2}{\mathrm{d} t^2} y_{z}(t) \\&\qquad \quad - \left( \frac{\mathrm{d}}{\mathrm{d} t} \nu _{1}^{-1}(\theta _{z}(t))\right) ^2 \nu _1''(\nu _{1}^{-1}(\theta _{z}(t))) - \nu _1'(\nu _{1}^{-1}(\theta _{z}(t))) \frac{\mathrm{d}^2}{\mathrm{d} t^2} \nu _{1}^{-1}(\theta _{z}(t)). \end{aligned} \end{aligned}$$

Define

$$\begin{aligned} \begin{aligned} a&:= \left\| \left( \frac{\mathrm{d}}{\mathrm{d} t} y_{z}(t)\right) ^2 - \left( \frac{\mathrm{d}}{\mathrm{d} t} \nu _{1}^{-1}(\theta _{z}(t))\right) ^2 \right\| \left\| \nu _1''(y_{z}(t)) \right\| \\ b&:= \left\| \nu _1''(y_{z}(t)) - \nu _1''(\nu _{1}^{-1}(\theta _{z}(t))) \right\| \left\| \left( \frac{\mathrm{d}}{\mathrm{d} t} \nu _{1}^{-1}(\theta _{z}(t))\right) ^2 \right\| \\ c&:= \left\| \frac{\mathrm{d}^2}{\mathrm{d} t^2} y_{z}(t) - \frac{\mathrm{d}^2}{\mathrm{d} t^2} \nu _{1}^{-1}(\theta _{z}(t)) \right\| \left\| \nu _1'(y_{z}(t)) \right\| \\ d&:= \left\| \nu _1'(y_{z}(t)) - \nu _1'(\nu _{1}^{-1}(\theta _{z}(t))) \right\| \left\| \frac{\mathrm{d}^2}{\mathrm{d} t^2} \nu _{1}^{-1}(\theta _{z}(t)) \right\| , \end{aligned} \end{aligned}$$

and we have

$$\begin{aligned} \left\| \frac{\mathrm{d}^2}{\mathrm{d} t^2} \nu _{1}(y_{z}(t)) - \frac{\mathrm{d}^2}{\mathrm{d} t^2} \theta _{z}(t) \right\| \le a + b + c + d. \end{aligned}$$

As \(y_{z}(t)\) and \(\frac{\mathrm{d}}{\mathrm{d} t} \nu _{1}^{-1}(\theta _{z}(t))\) are bounded from previous proof, and \(\nu _1\) is triple continuously differentiable function by definition, \(\nu _1''(y_{z}(t))\), \(\nu _1'(y_{z}(t))\), \((\frac{\mathrm{d}}{\mathrm{d} t} \nu _{1}^{-1}(\theta _{z}(t)))^2\) and \(\frac{\mathrm{d}^2}{\mathrm{d} t^2} \nu _{1}^{-1}(\theta _{z}(t))\) are also bounded. Moreover, \(\Vert \frac{\mathrm{d}}{\mathrm{d} t} y_{z}(t) \Vert \le \Vert \frac{\mathrm{d}}{\mathrm{d} t} \nu _{1}^{-1}(\theta _{z}(t)) \Vert + \varepsilon \), and hence bounded. According to the Lipschitz continuity of \(\nu _1'\) and \(\nu _1''\), as well as (4.1), we know there exists constants \(C_2\), such that

$$\begin{aligned} \left\| \frac{\mathrm{d}^2}{\mathrm{d} t^2} \nu _{1}(y_{z}(t)) - \frac{\mathrm{d}^2}{\mathrm{d} t^2} \theta _{z}(t) \right\| \le a + b + c + d \le C_2 \varepsilon . \end{aligned}$$

By making transformation on \(\varepsilon \) in above proof, we know for any \(0< \varepsilon < C\), there exists \(N > 0\) and \(y_{z} \in C^{N}(\nu )\) such that

$$\begin{aligned} \left\| \nu _{1}(y_{z}(t)) - \theta _{z}(t) \right\| + \left\| \frac{\mathrm{d}}{\mathrm{d} t} \nu _{1}(y_{z}(t)) - \frac{\mathrm{d}}{\mathrm{d} t} \theta _{z}(t) \right\| + \left\| \frac{\mathrm{d}^2}{\mathrm{d} t^2} \nu _{1}(y_{z}(t)) - \frac{\mathrm{d}^2}{\mathrm{d} t^2} \theta _{z}(t) \right\| \le \varepsilon . \end{aligned}$$

Hence, we know there exists \(N > 0\) and \(y_{z} \in C^{N}(\nu )\) such that

$$\begin{aligned} \left\| \frac{\mathrm{d}^2}{\mathrm{d} t^2} \nu _{1}(y_{z}(t)) \right\| \le \varepsilon + \left\| \frac{\mathrm{d}^2}{\mathrm{d} t^2} \theta _{z}(t) \right\| \le \varepsilon + C_{\theta GH} < B_{\theta }. \end{aligned}$$

Similarly, we know \(\Vert \frac{\mathrm{d}}{\mathrm{d} t} \nu _{2}(y_{z}(t)) \Vert \le C_{p GH} < B_{p}\). Then we get

$$\begin{aligned} \left( B_{\theta } - \max _{t \in [0, T]} \bigg | \frac{d {\tilde{\theta }}_{z}(t)}{\mathrm{d} t} \bigg | \right) ^{-} = \left( B_{p} - \max _{t \in [0, T]} \bigg | \frac{\mathrm{d} {\tilde{p}}_{z}(t)}{\mathrm{d} t} \bigg | \right) ^{-} = 0 \end{aligned}$$

If we define

$$\begin{aligned} \begin{aligned} {\hat{\varvec{\Theta }}}^{N}(\nu _1, \nu )&:= \bigg \{ \zeta : [0, T] \rightarrow {\mathbb {R}}^{K}; \quad \zeta (t) \\&= \left( \nu _1\left( \sum _{i = 1}^{N} \beta _{1, i} \nu ( \alpha _{1, i} t + c_{1, i}) \right) \right) , \ldots , \nu _1\left( \sum _{i = 1}^{n} \beta _{K, i} \nu ( \alpha _{K, i} t + c_{K, i}) )\right) \bigg \} \end{aligned} \end{aligned}$$

Then from proof above we know for any \(0< \varepsilon < C\), there exists \(N > 0\) and \({\tilde{\theta }}^{(N)} \in {\hat{\varvec{\Theta }}}^{N}(\nu _1, \nu )\) such that

$$\begin{aligned} \left\| {\tilde{\theta }}_{z}^{(N)}(t) - \theta _{z}(t) \right\| + \left\| \frac{\mathrm{d}}{\mathrm{d} t} {\tilde{\theta }}_{z}^{(N)}(t) - \frac{\mathrm{d}}{\mathrm{d} t} \theta _{z}(t) \right\| \le \varepsilon . \end{aligned}$$

On the other hand, notice that any function \(f_{N} \in {\hat{\varvec{\Theta }}}^{N}(\nu _1, \nu )\), there should exists \(f_{K N} \in {\varvec{\Theta }}^{K N}(\nu _1, \nu )\) such that \(f_{K N} = f_{N}\), by letting some \(\beta _{j, i} = 0\). It means \({\hat{\varvec{\Theta }}}^{N}(\nu _1, \nu ) \subset {\varvec{\Theta }}^{K N}(\nu _1, \nu )\), and \({\tilde{\theta }}^{(N)} \in {\varvec{\Theta }}^{K N}(\nu _1, \nu )\). For p and \({\mathbf {P}}^{n}(\nu _2, \nu )\), we can have similar argument. Hence, we conclude that for any \(0< \varepsilon < C\), there exists \(N > 0\) and \({\tilde{\theta }}^{(N)} \in {\varvec{\Theta }}^{N}(\nu _1, \nu )\), \({\tilde{p}}^{(N)} \in {\mathbf {P}}^{N}(\nu _2, \nu )\) such that

$$\begin{aligned} \begin{aligned}&\left\| {\tilde{\theta }}_{z}^{(N)}(t) - \theta _{z}(t) \right\| + \left\| \frac{\mathrm{d}}{\mathrm{d} t} {\tilde{\theta }}_{z}^{(N)}(t) - \frac{\mathrm{d}}{\mathrm{d} t} \theta _{z}(t) \right\| \le \varepsilon \\&\quad \left\| {\tilde{p}}_{z}^{(N)}(t) - p_{z}(t) \right\| + \left\| \frac{\mathrm{d}}{\mathrm{d} t} {\tilde{p}}_{z}^{(N)}(t) - \frac{\mathrm{d}}{\mathrm{d} t} p_{z}(t) \right\| \le \varepsilon . \end{aligned} \end{aligned}$$

Then similar to the proof for Theorem 7.1 in Sirignano and Spiliopoulos [30], we know there exists a uniform constant M which only depends on boundedness of \(\theta \), \(\lambda ^{*}\) and Lipshcitz coefficient of \(\lambda ^{*}\) and H, such that

$$\begin{aligned} \Psi ({\tilde{\theta }}^{(N)}, {\tilde{p}}^{(N)}) \le M \varepsilon . \end{aligned}$$

It concludes the proof. \(\square \)

4.2 Proof of Theorem 2.5

Proof

Since for different \(t_{0}\), the following proof is the same, we assume \(t_{0} = 0\) for the ease of notation. We first focus on proving the first inequality in (2.12). Define

$$\begin{aligned} e(t, z):= \frac{\mathrm{d} {\tilde{\theta }}_{z}(t)}{\mathrm{d} t} + H(z, \Delta ^{z} {\tilde{\theta }}(t)). \end{aligned}$$

As \(\Psi < \varepsilon \), from the definition of \(\Psi \) we know \(\frac{\mathrm{d}^2 \theta _{z}(t)}{\mathrm{d} t^2}\) is uniform bounded on [0, T]. Furthermore, H is Lipschitz continuous, and \(\theta \) has bounded first-order derivative to t. Hence, e(tz) is a Lipschitz continuous function on [0, T]. Denote its Lipschitz coefficient as L. There exists \({\hat{t}} \in [0, T]\) such that

$$\begin{aligned} | e({\hat{t}}, z) | = \sup _{t \in [0, T]} | e(t, z) |. \end{aligned}$$

For any \(\Delta t\) such that

$$\begin{aligned} \frac{ | e({\hat{t}}, z) | }{16 L}< \Delta t < \frac{ | e({\hat{t}}, z) | }{2 L}. \end{aligned}$$
(4.2)

From Lipschitz continuity of e, we have

$$\begin{aligned} | |e({\hat{t}} \pm \Delta t, z)| - |e({\hat{t}}, z)| | \le |e({\hat{t}} \pm \Delta t, z) - e({\hat{t}}, z)| \le L \Delta t. \end{aligned}$$

It means

$$\begin{aligned} |e({\hat{t}}, z)| - L \Delta t \le |e({\hat{t}} \pm \Delta t, z) | \le |e({\hat{t}}, z)| + L \Delta t. \end{aligned}$$

From (4.2), we know \( L \Delta t \le \frac{ | e({\hat{t}}, z) | }{2}\); therefore,

$$\begin{aligned} |e({\hat{t}}, z)| - L \Delta t \ge \frac{1}{2} |e({\hat{t}}, z)| \ge 0. \end{aligned}$$

Combining the above two inequalities, we know for any \(t \in [{\hat{t}} - \Delta t, {\hat{t}} + \Delta t]\),

$$\begin{aligned} e^2(t, z) \ge (|e({\hat{t}}, z)| - L \Delta t)^2 > \frac{1}{4} |e({\hat{t}}, z)|^2. \end{aligned}$$

We can have the following estimation. Without loss of generality, we can assume \(T - {\hat{t}} \ge \frac{T}{2}\) (otherwise we can use the other side \( [{\hat{t}} - \Delta t, {\hat{t}}]\) as the limit of integration on the second inequality in the following),

$$\begin{aligned} \begin{aligned} \Psi&\ge \int _{{\hat{t}}}^{{\hat{t}} + \Delta t} e^2(t, z) dt \ge \min (\Delta t (|e({\hat{t}}, z)| - L \Delta t)^2, (T - {\hat{t}}) (|e({\hat{t}}, z)| - L \Delta t)^2) \\&> \min \left( \frac{ |e({\hat{t}}, z)|^3}{64 L}, \frac{|e({\hat{t}}, z)|^2}{8} T\right) , \end{aligned} \end{aligned}$$

which implies that for any \(t \in [0, T]\),

$$\begin{aligned} | e(t, z) | < \max \left( 4 L^{\frac{1}{3}} \Psi ^{\frac{1}{3}}, \frac{2 \sqrt{2 T}}{T} \Psi ^{\frac{1}{2}}\right) . \end{aligned}$$

There exists \(\Psi _0\) such that for all \(\Psi < \Psi _0\), \(4\,L^{\frac{1}{3}} \Psi ^{\frac{1}{3}} \ge \frac{2 \sqrt{2 T}}{T} \Psi ^{\frac{1}{2}}\). Hence, there exists constant C such that

$$\begin{aligned} | e(t, z) |< 4 L^{\frac{1}{3}} \Psi ^{\frac{1}{3}} < C \varepsilon ^{\frac{1}{3}}. \end{aligned}$$

Using the similar arguments, the third inequality in (2.12) can be proved. The second and fourth inequalities are trivial. \(\square \)

4.3 Proof of Theorem 2.6

The general structure of the proof is similar to those in Cecchin and Pelino [9] with one key difference: In Cecchin and Pelino [9], p satisfies a non-perturbed Kolmogorov forward equation and has initial value in \(P(\Sigma )\) with nonnegative components, whereas \({\tilde{p}}\) satisfies a perturbed Kolmogorov forward equation (2.13) and its initial value is not necessarily in \(P(\Sigma )\) and may be negative, which makes some prior estimations in Cecchin and Pelino [9] not applicable for our case. We need to provide extra modifications by adding and subtracting an extra term \(M_1\) such that \({\tilde{p}}(t) + M_1\) is nonnegative, and need to modify every step in the proof to estimate the extra terms introduced by \(M_1\). For the completeness, we give the whole proof.

Note that the solution pair \(({\tilde{\theta }}, {\tilde{p}})\) to (2.13) is determined only by initial time \(t_{0}\) and initial value \({\tilde{p}}(t_{0})\). We first prove that \({\tilde{\theta }}\) is well defined (Proposition 4.5), continuous at \({\tilde{p}}(t_{0})\) (Proposition 4.2), and continuously differentiable at \({\tilde{p}}(t_{0})\) (Theorem 4.7) by discussing the linearized system (4.11), we then prove that \({\tilde{\theta }}\) satisfies a PDE similar to the master equation in Cecchin and Pelino [9] (Theorem 4.8) and that the master equation on some discrete grids of \(P(\Sigma )\) can be approximated by a backward ODE with extra error terms (Proposition 4.9), and we finally estimate the difference between \(\theta \) and \({\tilde{\theta }}\) by comparing the two backward ODE systems and conclude the proof.

Denote by \(\Vert x \Vert := \max _{1 \le z \le K} | x_{z} | \), the norm of x in \({\mathbb {R}}^{K}\) and \(\Vert f \Vert := \max _{t \in [0, T]} \max _{1 \le z \le K} | f_{z}(t) |\), the norm of f in \({\mathcal {C}}^{0}([0, T]; {\mathbb {R}}^{K})\). Due to the introduction of perturbation terms in ODE system (2.13), the existence and uniqueness of its solution can no longer be guaranteed for every initial value \({\tilde{p}}(t_{0})\). However, under certain conditions on (2.13), we can have the existence and prior bound estimation of solution to (2.13).

Proposition 4.1

Given constant \(M > 0\), define \(I_{p, M}:= [-M, 1+M]^{K}\) and

$$\begin{aligned} \begin{aligned} C_{G}(M)&:= \max _{z \in \Sigma , p \in I_{p, M}} | G(z, p) | + \Vert \epsilon _{3} \Vert + 2 \Vert \epsilon _{1} \Vert T + 2 \max _{z \in \Sigma } H(z, 0) T, \\ A_{G}(M)&:= [-2 C_{G}(M), 2 C_{G}(M)]^K, \quad \Lambda (M):= \max _{y, z \in \Sigma , \mu \in A_{G}(M)} | \lambda ^{*}_{y}(z, \mu ) |. \end{aligned} \end{aligned}$$

If functions \(\epsilon _i, i = 2, 4\) satisfy

$$\begin{aligned} \Vert \epsilon _{2} \Vert + \Vert \epsilon _{4} \Vert < \frac{1}{N_{0}}, \end{aligned}$$
(4.3)

where \(\frac{1}{N_{0}}:= \frac{1}{3} M e^{- \Lambda (M) T}\). Then for any initial time \(t_{0} \in [0, T]\) and \({\tilde{p}}(t_{0}) \in {\bar{B}}(P(\Sigma ), \frac{1}{N_{0}})\), ODE system (2.13) has solution \(({\tilde{\theta }}, {\tilde{p}})\), where

$$\begin{aligned} {\bar{B}}\left( P(\Sigma ), \frac{1}{N_{0}}\right) = \left\{ {\tilde{p}} \in {\mathbb {R}}^{K}, \quad s.t \quad \min _{p \in P(\Sigma )} \Vert {\tilde{p}} - p \Vert \le \frac{1}{N_{0}} \right\} . \end{aligned}$$

Moreover, \(({\tilde{\theta }}, {\tilde{p}})\) satisfy the following on \([t_{0}, T]\) uniformly for any initial time \(t_{0} \in [0, T]\) and initial value \({\tilde{p}}(t_{0})\).

$$\begin{aligned} {\tilde{\theta }}_{z}(t) \in [-C_{G}(M), C_{G}(M)], \quad {\tilde{p}}_{z}(t) \in [-M, 1 + M]. \end{aligned}$$

Proof

Given a prior \({\bar{p}}\) such that \({\bar{p}}(t) \in [-M, 1+M]^{K}\) for all \(t \in [t_{0}, T]\), Lipschitz continuous with Lipschitz coefficient bounded by L(M), where

$$\begin{aligned} \bigg \Vert \frac{\mathrm{d} {\tilde{p}}}{\mathrm{d} t} \bigg \Vert \le L(M) = K (2 M + 1) \Lambda (M) + \frac{1}{N_{0}}, \end{aligned}$$

and starts with the same \({\bar{p}}(t_{0}) = {\tilde{p}}(t_{0}) \in {\bar{B}}(P(\Sigma ), \frac{1}{N_{0}})\), with which we solve the backward ODE in (2.13):

$$\begin{aligned} \frac{\mathrm{d} {\tilde{\theta }}_{z}(t)}{\mathrm{d} t} = - H(z, \Delta ^{z} {\tilde{\theta }}(t)) + \epsilon _{1}(t, z), \quad {\tilde{\theta }}_{z}(T) = G(z, {\bar{p}}(T)) + \epsilon _{3}(z). \end{aligned}$$

We know function \({\tilde{\theta }}(t)\) is bounded by constant \(C_{G}(M)\) following a similar proof as [12, Proposition 2]. Note that \(C_{G}(M)\) is monotonically non-decreasing w.r.t M, hence \(\Lambda (M)\) is also non-decreasing w.r.t M. Since \({\bar{p}}(t_{0}) \in {\bar{B}}(P(\Sigma ), \frac{1}{N_{0}})\), there exists \(p_{0} \in P(\Sigma )\) such that \({\bar{p}}(t_{0}) - p_{0} = \epsilon _{4}\) where \(\Vert \epsilon _{4} \Vert \le \frac{1}{N_{0}}\). Consider two functions \({\tilde{p}}\) and p satisfying

$$\begin{aligned} \begin{aligned} \frac{\mathrm{d} {\tilde{p}}_{z}(t)}{\mathrm{d} t}&= \sum _{y} {\tilde{p}}_{y}(t) \lambda ^{*}_{z}(y, \Delta ^{y} {\tilde{\theta }}(t)) + \epsilon _{2}(t, z), \quad {\tilde{p}}_{z}(t_{0}) = p_{z, 0} + \epsilon _{4}(z), \\ \frac{\mathrm{d} p_{z}(t)}{\mathrm{d} t}&= \sum _{y} p_{y}(t) \lambda ^{*}_{z}(y, \Delta ^{y} {\tilde{\theta }}(t)), \quad p_{z}(t_{0}) = p_{z, 0} \end{aligned} \end{aligned}$$

Integrating both side and subtracting \({\tilde{p}}\) and p, we get

$$\begin{aligned} \Vert {\tilde{p}}(t) - p(t) \Vert \le \Lambda (M) \int _{t_{0}}^{t} \Vert {\tilde{p}}(s) - p(s) \Vert \mathrm{d} s + \Vert \epsilon _{2} \Vert + \Vert \epsilon _{4} \Vert . \end{aligned}$$

By Gronwall inequality, we have

$$\begin{aligned} \Vert {\tilde{p}}(t) - p(t) \Vert \le (\Vert \epsilon _{2} \Vert + \Vert \epsilon _{4} \Vert ) e^{\Lambda (M) T} < M. \end{aligned}$$

As p is the solution to a Kolmogorov equation, \(p(t) \in P(\Sigma )\). Hence, the solution \({\tilde{p}}(t) \in [-M, 1+M]^{K}\) for all \(t \in [t_{0}, T]\), and \({\tilde{p}}\) is also Lipschitz continuous with Lipschitz coefficient bounded by L(M), as \(\Vert \frac{\mathrm{d} {\tilde{p}}}{\mathrm{d} t} \Vert \le L(M)\).

Let \({\mathcal {F}}([t_{0}, T])\) be the set of Lipschitz continuous functions defined on \([t_{0}, T]\), with Lipschitz coefficient bounded by L(M), taking values in \([-M, 1+M]^{K}\) and starting at the same initial value \({\tilde{p}}(t_{0})\) at \(t_{0}\). We can define mapping \(\xi : {\mathcal {F}}([t_{0}, T]) \rightarrow {\mathcal {F}}([t_{0}, T])\) in the following way: given \({\tilde{p}} \in {\mathcal {F}}([t_{0}, T])\), let \({\tilde{\theta }}\) be the solution of terminal value problem in (2.13). Then \({\tilde{\theta }}(t)\) is bounded by \(C_{G}(M)\). Let \(\xi ({\tilde{p}})\) be the solution to the initial value problem in (2.13). \(\xi ({\tilde{p}}) \in {\mathcal {F}}([t_{0}, T])\) from the above argument. Following the proof of [12, Proposition 4], \({\mathcal {F}}([t_{0}, T])\) is a set of uniformly bounded and equicontinuous functions. Thus, by Arzela–Ascoli theorem, it is a relatively compact set. It is also clear that it is a convex set. Hence, by Brouwer fixed point theorem, we know there exists fixed point for \(\xi \), which proves the existence of solution to (2.13). \(\square \)

We next prove that under certain condition, \(({\tilde{\theta }}, {\tilde{p}})\) is unique and continuous w.r.t initial condition.

Proposition 4.2

There exist positive constants \(N_0\) and C, such that if we have condition (4.3), then for any \(t_0 \in [0, T]\) and initial condition \({\tilde{p}}(t_{0}) \in {\bar{B}}(P(\Sigma ), \frac{1}{N_{0}})\), the solution to (2.13) is unique. Moreover, let \(({\tilde{\theta }}, {\tilde{p}})\) and \(({\hat{\theta }}, {\hat{p}})\) be two solutions to ODE system (2.13) with different initial conditions \({\tilde{p}}(t_{0}), {\hat{p}}(t_{0}) \in {\bar{B}}(P(\Sigma ), \frac{1}{N_{0}})\), then

$$\begin{aligned} \Vert {\tilde{\theta }} - {\hat{\theta }} \Vert + \Vert {\tilde{p}} - {\hat{p}} \Vert \le C \Vert {\tilde{p}}(t_0) - {\hat{p}}(t_0) \Vert \end{aligned}$$
(4.4)

Proof

Start with any M and the corresponding \(N_0\) defined in Proposition 4.1. Then both \({\tilde{\theta }}\) and \({\hat{\theta }}\) uniform bounded by \(C_{G}(M)\). Let us first assume \({\tilde{p}}_{z}(t), {\hat{p}}_{z}(t) \ge - M_1\) uniformly, and we will decide later the value for \(M_1\) and prove the condition for it. Similarly to the proof for [9, Proposition 5], we first try to obtain estimation on LHS of (4.7) given later. Define \(\phi := {\tilde{\theta }} - {\hat{\theta }}\) and \(\pi := {\tilde{p}} - {\hat{p}}\). Then the couple \((\phi , \pi )\) solves

$$\begin{aligned} \begin{aligned} \frac{\mathrm{d} \phi _{z}(t)}{\mathrm{d} t}&= - H(z, \Delta ^{z} {\tilde{\theta }}(t)) + H(z, \Delta ^{z} {\hat{\theta }}(t)), \quad \phi _{z}(T) = G(z, {\tilde{p}}(T)) - G(z, {\hat{p}}(T)), \\ \frac{\mathrm{d} \pi _{z}(t)}{\mathrm{d} t}&= \sum _{y} \{ {\tilde{p}}_{y}(t) \lambda ^{*}_{z}(y, \Delta ^{y} {\tilde{\theta }}(t)) - {\hat{p}}_{y}(t) \lambda ^{*}_{z}(y, \Delta ^{y} {\hat{\theta }}(t)) \}, \quad \pi _{z}(t_0) = {\tilde{p}}_{z}(t_0) - {\hat{p}}_{z}(t_0), \end{aligned} \end{aligned}$$
(4.5)

Integrating \(\frac{\mathrm{d}}{\mathrm{d} t} \sum _{z \in \Sigma } \phi _{z}(t) \pi _{z}(t)\) over the interval \([t_0, T]\), using the product rule and (4.5), also noting that \(\sum _{z} \lambda ^{*}_{z}(y, \Delta ^{y} {\tilde{\theta }}(t)) \phi _{y}(t) = 0\), after some simple calculus, we have

$$\begin{aligned} \begin{aligned} \sum _{z \in \Sigma } \phi _{z}(t_0) \pi _{z}(t_0)&= \int _{t_0}^{T} \sum _{z \in \Sigma } [H(z, \Delta ^{z} {\tilde{\theta }}(t)) - H(z, \Delta ^{z} {\hat{\theta }}(t)) \\&\quad - \Delta ^{z} \phi (t) \cdot \lambda ^{*}(z, \Delta ^{z} {\tilde{\theta }}(t))] ({\tilde{p}}_{z}(t) + M_1) \mathrm{d} t \\&\quad + \int _{t_0}^{T} \sum _{z \in \Sigma } [H(z, \Delta ^{z} {\hat{\theta }}(t)) - H(z, \Delta ^{z} {\tilde{\theta }}(t)) \\&\quad + \Delta ^{z} \phi (t) \cdot \lambda ^{*}(z, \Delta ^{z} {\hat{\theta }}(t))] ({\hat{p}}_{z}(t) + M_1) \mathrm{d} t + \sum _{z \in \Sigma } \phi _{z}(T) \pi _{z}(T) \\&\quad + M_1 \int _{t_0}^{T} \sum _{z \in \Sigma } \Delta ^{z} \phi (t) \cdot [\lambda ^{*}(z, \Delta ^{z} {\tilde{\theta }}(t)) - \lambda ^{*}(z, \Delta ^{z} {\hat{\theta }}(t))] \mathrm{d} t. \end{aligned} \end{aligned}$$
(4.6)

As \(\lambda ^{*}(z, \mu ) = D_{\mu } H(z, \mu )\), by Taylor theorem, there exists point a on the line between \(\Delta ^{z} {\tilde{\theta }}(t)\) and \(\Delta ^{z} {\hat{\theta }}(t)\) such that

$$\begin{aligned}&H(z, \Delta ^{z} {\tilde{\theta }}(t)) - H(z, \Delta ^{z} {\hat{\theta }}(t)) - \Delta ^{z} \phi (t) \cdot \lambda ^{*}(z, \Delta ^{z} {\tilde{\theta }}(t)) \\&\quad = - \Delta ^{z} \phi (t) \cdot D^{2}_{\mu \mu } H(z, a) \cdot \Delta ^{z} \phi (t). \end{aligned}$$

Then from assumption (2.4), and do above similar on another way round, we have the following estimations:

$$\begin{aligned} H(z, \Delta ^{z} {\tilde{\theta }}(t)) - H(z, \Delta ^{z} {\hat{\theta }}(t)) - \Delta ^{z} \phi (t) \cdot \lambda ^{*}(z, \Delta ^{z} {\tilde{\theta }}(t)) \le - C^{-1} \Vert \Delta ^{z} \phi (t) \Vert ^2. \end{aligned}$$

Similarly, we have

$$\begin{aligned} H(z, \Delta ^{z} {\hat{\theta }}(t)) - H(z, \Delta ^{z} {\tilde{\theta }}(t)) + \Delta ^{z} \phi (t) \cdot \lambda ^{*}(z, \Delta ^{z} {\hat{\theta }}(t)) \le - C^{-1} \Vert \Delta ^{z} \phi (t) \Vert ^2. \end{aligned}$$

Since \(\sum _{z \in \Sigma } \phi _{z}(T) \pi _{z}(T) \le 0\) by (2.6), \({\tilde{p}}_{z}(t), {\hat{p}}_{z}(t) > -M_1\) by the choice of \(M_1\), using (4.6) and the inequalities above, we have

$$\begin{aligned}&\phi (t_0) \cdot \pi (t_0) + C^{-1} \int _{t_0}^{T} \sum _{z \in \Sigma } \Vert \Delta ^{z} \phi (t) \Vert ^2 ({\tilde{p}}_{z}(t) + {\hat{p}}_{z}(t) + 2 M_1) \mathrm{d} t \\&\quad \le M_1 \int _{t_0}^{T} \sum _{z \in \Sigma } \Delta ^{z} \phi (t) \cdot [\lambda ^{*}(z, \Delta ^{z} {\tilde{\theta }}(t)) - \lambda ^{*}(z, \Delta ^{z} {\hat{\theta }}(t))] \mathrm{d} t. \end{aligned}$$

By Lipschitz continuity of \(\lambda ^{*}\), there exists C such that

$$\begin{aligned} | \int _{t_0}^{T} \sum _{z \in \Sigma } \Vert \Delta ^{z} \phi (t) \Vert ^2 ({\tilde{p}}_{z}(t) + {\hat{p}}_{z}(t) + 2 M_1) \mathrm{d} t | \le C (\Vert \pi (t_0) \Vert \Vert \phi \Vert + M_1 \Vert \phi \Vert ^2 ). \end{aligned}$$
(4.7)

Note that unlike the proof for [9, Proposition 5], both \({\tilde{p}}_{z}\) and \({\hat{p}}_{z}\) can be negative in our setting. If they were nonnegative, then we could choose \(M_1=0\), the same technique in [9, Proposition 5] would work. In the case here, we have to introduce \(M_1\) that requires us to do many more prior estimations.

We next derive the bound for \(\pi \). Integrating the second equation in (4.5) over \([t_{0}, t]\), we have

$$\begin{aligned} \pi _{z}(t) = \pi _{z}(t_{0}) + \int _{t_0}^{t} \sum _{y} \{ {\tilde{p}}_{y}(s) \lambda ^{*}_{z}(y, \Delta ^{y} {\tilde{\theta }}(s)) - {\hat{p}}_{y}(s) \lambda ^{*}_{z}(y, \Delta ^{y} {\hat{\theta }}(s)) \} \mathrm{d} s. \end{aligned}$$

As \(\lambda ^{*}\) is both bounded and Lipschitz continuous, there exists C such that

$$\begin{aligned} \begin{aligned} \max _{z \in \Sigma } | \pi _{z}(t) |&\le \max _{z \in \Sigma } | \pi _{z}(t_0) | + C \int _{t_0}^{t} \max _{z \in \Sigma } | \pi _{z}(s) | \mathrm{d} s + C \int _{t_0}^{t} \sum _{z \in \Sigma } \Vert \Delta ^{z} \phi (s) \Vert |{\tilde{p}}_{z}(s)| \mathrm{d} s \\&\le \max _{z \in \Sigma } | \pi _{z}(t_0) | + C \int _{t_0}^{t} \max _{z \in \Sigma } | \pi _{z}(s) | \mathrm{d} s \\&\quad + C \int _{t_0}^{t} \sum _{z \in \Sigma } \Vert \Delta ^{z} \phi (s) \Vert ({\tilde{p}}_{z}(s) + M_1) \mathrm{d} s + M_1 C \int _{t_0}^{t} \sum _{z \in \Sigma } \Vert \Delta ^{z} \phi (s) \Vert \mathrm{d} s, \end{aligned} \end{aligned}$$

where the second line holds because \({\tilde{p}}_{z}(s) + M_1 > 0\). Moreover, we have

$$\begin{aligned}&\int _{t_0}^{t} \sum _{z \in \Sigma } \Vert \Delta ^{z} \phi (s) \Vert ({\tilde{p}}_{z}(s) + M_1) \mathrm{d} s \\&\quad \le \sqrt{\int _{t_0}^{t} \sum _{z \in \Sigma } \Vert \Delta ^{z} \phi (s) \Vert ^2 ({\tilde{p}}_{z}(s) + M_1) \mathrm{d} s} \sqrt{\int _{t_0}^{t} \sum _{z \in \Sigma } ({\tilde{p}}_{z}(s) + M_1) \mathrm{d} s}. \end{aligned}$$

Applying Gronwall inequality, as \({\tilde{p}}_{z}(s) \in [-M, 1 + M]\), there exists C such that

$$\begin{aligned} \begin{aligned} \Vert \pi \Vert&\le C \Vert \pi (t_0) \Vert + C \sqrt{ \int _{t_0}^{T} \sum _{z \in \Sigma } \Vert \Delta ^{z} \phi (t) \Vert ^2 ({\tilde{p}}_{z}(t) + M_1) \mathrm{d} t} + M_1 C \Vert \phi \Vert \\&\le C \Vert \pi (t_0) \Vert + C \sqrt{\Vert \pi (t_0) \Vert \Vert \phi \Vert + M_1 \Vert \phi \Vert ^2} + C M_1 \Vert \phi \Vert \\&\le C \Vert \pi (t_0) \Vert + C \Vert \pi (t_0) \Vert ^{\frac{1}{2}} \Vert \phi \Vert ^{\frac{1}{2}} + C (M_1 + \sqrt{M_1}) \Vert \phi \Vert , \end{aligned} \end{aligned}$$
(4.8)

where C also only depends on M in Proposition 4.1.

We next derive the bound for \(\phi \). Integrating the first equation in (4.5) over \([t_{0}, t]\), from the Lipschitz continuity of G, H, there exists C such that

$$\begin{aligned} \max _{z \in \Sigma } \phi _{z}(t) \le C \max _{z \in \Sigma } |\pi _{z}(T)| + C \int _{t}^{T} \max _{z \in \Sigma } |\phi _{z}(s)| \mathrm{d} s. \end{aligned}$$

Applying Gronwall inequality, there exists constant C such that

$$\begin{aligned} \Vert \phi \Vert \le C \Vert \pi \Vert \end{aligned}$$
(4.9)

By combining (4.8) and (4.9), using \(A B \le \varepsilon A^2 + \frac{1}{\varepsilon } B^2\) for \(A, B > 0\), there exists C such that

$$\begin{aligned} \Vert \pi \Vert \le C \Vert \pi (t_0) \Vert + \left[ \frac{1}{4} + C^2 (M_1 + \sqrt{M_1}) \right] \Vert \pi \Vert . \end{aligned}$$

As C only depend on the boundedness and Lipschitz coefficient of H, G, \(\lambda ^{*}\) and the bound of \(D^2_{\mu \mu } H\), \({\tilde{\theta }}\), \({\hat{\theta }}\), which depend on the M in Proposition 4.1. We only need to select \(M_1\) such that

$$\begin{aligned} C^2 (M_1 + \sqrt{M_1}) < \frac{1}{4}, \end{aligned}$$

and we can have (4.4). Then it remains to decide the new \(N_0\) such that we have \({\tilde{p}}_{z}(t), {\hat{p}}(t) > -M_1\) uniformly as we assumed. From Proposition 4.1, \(N_1:= \frac{3 e^{\Lambda (M_1)}}{M_1}\) and we can simply define our new \(N_0\) as \(\max {N_0, N_1}\). On the other hand, the uniqueness of solution comes directly from (4.4). \(\square \)

According to Proposition 4.1 and 4.2, take any \(t \in [t_0, T]\) and \({\bar{p}}_{0} \in {\bar{B}}(P(\Sigma ), \frac{1}{N_{0}})\) as the initial value for ODE system (2.13), there exists an unique solution \(({\bar{\theta }}(s), {\bar{p}}(s))\) on [tT]. Note that \({\bar{\theta }}(s)\) might not equal \({\tilde{\theta }}(s)\) stated as the solution to (2.13) in Theorem 2.6, since \({\bar{\theta }}\) depends on the values of initial time t and initial condition \({\bar{p}}_{0}\) chosen above. And we can define a function \({\tilde{U}}\) on \(t \in [t_{0}, T]\) and \({\tilde{p}}_{0} \in {\bar{B}}(P(\Sigma ), \frac{1}{N_{0}})\) by the corresponding \({\bar{\theta }}(t)\) explained above.

$$\begin{aligned} {\tilde{U}}(t, z, {\tilde{p}}_0):= {\bar{\theta }}(t, z). \end{aligned}$$
(4.10)

According to Proposition 4.1 and 4.2, \({\tilde{U}}\) is well defined and continuous w.r.t \({\tilde{p}}_{0}\). Moreover, for \(({\tilde{\theta }}, {\tilde{p}})\), the solution to (2.13) in Theorem 2.6 on \([t_{0}, T]\), which is the approximated solution we got from DNN and want to estimate error on, we have for all \(t \in [t_{0}, T]\) that:

$$\begin{aligned} {\tilde{U}}(t, z, {\tilde{p}}(t)):= {\tilde{\theta }}(t, z). \end{aligned}$$

It suggests \({\tilde{U}}\) has all information of \({\tilde{\theta }}\). If we can compare \({\tilde{U}}\) with the U defined similar in Cecchin and Pelino [9] corresponding to the true solution to (2.2), we can estimate the error of \({\tilde{\theta }}\). To compare \({\tilde{U}}\) with the U, we need to prove that \({\tilde{U}}\) also satisfy the master equation similar to U in Cecchin and Pelino [9]. To achieve this goal, we are to prove the continuously differentiability of \({\tilde{U}}\) in the following steps. We first define the derivative of \({\tilde{U}}\) w.r.t vector \({\tilde{p}}_0\) in a similar way to in Cecchin and Pelino [9], Define operator \(D^{y}_{p}\) as follows.

Definition 4.3

Define operator of a function \(U: {\mathbb {R}}^{K} \rightarrow {\mathbb {R}}\) as \(D^{y} U: {\mathbb {R}}^{K} \rightarrow {\mathbb {R}}^K\) for \(y \in \Sigma \).

$$\begin{aligned}{}[D^{y} U(p)]_{z}:= \lim _{s \rightarrow 0} \frac{U(p + s (\delta _{z} - \delta _{y})) - U(p)}{s}, \end{aligned}$$

where \(D^{y} U(p) = ([D^{y} U(p)]_{1}, \ldots , [D^{y} U(p)]_{K})\), and \(\delta _{z} \in {\mathbb {R}}^{K}\) such that all elements are 0 except the z element is 1.

By noticing that \(\mu = \sum _{z \ne 1} \mu _{z} (\delta _{z} - \delta _{1}) + (\sum _{z = 1}^{K} \mu _{z}) \delta _{1}\), if \({\tilde{U}}\) is differentiable, we have the following lemma from the linearity of directional derivative.

Lemma 4.4

Define the derivative of function U(p) along the direction \(\mu \in {\mathbb {R}}^{K}\) as a map \(\frac{\partial }{\partial \mu } U: {\mathbb {R}}^{K} \rightarrow {\mathbb {R}}\),

$$\begin{aligned} \frac{\partial }{\partial \mu } U(p):= \lim _{s \rightarrow 0} \frac{U(p + s \mu ) - U(p)}{s}. \end{aligned}$$

It satisfies

$$\begin{aligned} \frac{\partial }{\partial \mu } U(p) = D^{1} U(p) \cdot \mu + \frac{\partial }{\partial \delta _{1}} U(p) \left( \sum _{z = 1}^{K} \mu _{z}\right) , \end{aligned}$$

where \(\frac{\partial }{\partial \delta _{1}}\) is in fact the first component of the gradient of \({\tilde{U}}\), and \(D U(p):= D^{1} U(p)\) for notation simplicity. When \(\sum _{z = 1}^{K} \mu _{z} = 0\), for any \(y \in \Sigma \), we have

$$\begin{aligned} D^{y} U(p) \cdot \mu = D U(p) \cdot \mu = \frac{\partial }{\partial \mu } U(p). \end{aligned}$$

In order to characterize the directional derivative of \({\tilde{U}}\) w.r.t \({\tilde{p}}_{0}\), given \({\tilde{\theta }}\) and \({\tilde{p}}\), let us define a linear system of ODE for \((u, \rho )\) similar to [9, Equation (80)], which will be used quite a few times in the following.

$$\begin{aligned} \begin{aligned} \frac{\mathrm{d} u_{z}(t)}{\mathrm{d} t}&= - \lambda ^{*}(z, \Delta ^{z} {\tilde{\theta }}(t)) \cdot \Delta ^{z} u(t) - b(t, z) \\ \frac{\mathrm{d} \rho _{z}(t)}{\mathrm{d} t}&= \sum _{y} \rho _{y}(t) \lambda ^{*}_{z}(y, \Delta ^{y} {\tilde{\theta }}(t)) + \sum _{y} {\tilde{p}}_{y}(t) D_{\mu } \lambda ^{*}_{z}(y, \Delta ^{y} {\tilde{\theta }}(t)) \cdot \Delta ^{y} u(t) + c(t, z) \\ u_{z}(T)&= \frac{\partial G}{\partial \rho (T)}(z, {\tilde{p}}(T)) + u_{T, z} = \nabla G(z, {\tilde{p}}(T)) \cdot \rho (T) + u_{T, z} \\ \rho _{z}(t_{0})&= \rho _{z, 0}. \end{aligned} \end{aligned}$$
(4.11)

Similar to [9, Equation (80)], \(D_{\mu } \lambda ^{*}_{z}(y, \Delta ^{y} {\tilde{\theta }}(t))\) is the gradient of \(\lambda ^{*}_{z}\) w.r.t its second variable in \({\mathbb {R}}^{K}\). The unknowns are u and \(\rho \), while b, c, \(u_{T}\), \(\rho _{0}\) are given measurable functions, with c satisfying \(\sum _{z = 1}^{K} c(t, z) = 0\). In fact, (4.11) is generalization of [9, Equation (80)]. In (4.11), it is a general directional derivatives of any direction in the terminal condition of \(u_{z}(T)\), while in [9, Equation (80)], it is directional derivatives of specific directions.

We first prove in Proposition 4.5 that the linear system (4.11) has a unique solution, which is linear bounded by its initial and boundary conditions.

Proposition 4.5

There exist positive constants \(N_0\) and C, such that if we have (4.3) and \({\tilde{p}}(t_0) \in {\bar{B}}(P(\Sigma ), \frac{1}{N_{0}})\), then for any measurable function b, c and vector \(u_{T}\), the linear system (4.11) has a unique solution \((u, \rho )\). Moreover, it satisfies

$$\begin{aligned} \Vert u \Vert + \Vert \rho \Vert \le C [ \Vert u_{T} \Vert + \Vert \rho _{0} \Vert + \Vert b \Vert + \Vert c \Vert ]. \end{aligned}$$
(4.12)

Proof

We only discuss the case when \(t_{0} = 0\), as it can be extended to any \(t_{0} \in [0, T]\) by the same argument.

We first let \(N_0\) bigger than the one in Proposition 4.2. And similar to the proof for Proposition 4.2, to cope with the potential negativeness, we first assume \({\tilde{p}}_{z}(t) \ge - M_1\) uniformly and \(M_1 \le M\), and we will decide later the value for \(M_1\) small enough and find the \(N_0\) such that it holds. As \(\sum _{z \in \Sigma } \lambda ^{*}_{z}(y, \Delta ^{y} {\tilde{\theta }}(t)) = 0\), we have \(\sum _{z, y \in \Sigma } {\tilde{p}}_{y}(t) D_{\mu } \lambda ^{*}_{z}(y, \Delta ^{y} {\tilde{\theta }}(t)) \cdot \Delta ^{y} u(t) = 0\), and \(\sum _{z \in \Sigma } \frac{d \rho _{z}(t)}{\mathrm{d} t} = 0\). Hence, for any \(t \in [0, T]\), we have

$$\begin{aligned} \eta := \sum _{z \in \Sigma } \rho _{z}(t) = \sum _{z \in \Sigma } \rho _{z, 0}. \end{aligned}$$
(4.13)

Define set \(P_{\eta }(\Sigma )\) as

$$\begin{aligned} P_{\eta }(\Sigma ):= \left\{ p \in {\mathbb {R}}^{K}, \quad s.t \quad \sum _{z = 1}^{K} p_{z} = \eta \right\} . \end{aligned}$$

We define map \(\xi \) from \({\mathcal {C}}^{0}([0, T]; P_{\eta }(\Sigma ))\) to itself as follows: for a fixed \(\rho \in {\mathcal {C}}^{0}([0, T]; P_{\eta }(\Sigma ))\), we consider the solution \(u = u(\rho )\) to the backward ODE for u in (4.11), and define \(\xi (\rho )\) to be the solution to the forward ODE for \(\rho \) in (4.11) with \(u = u(\rho )\). From (4.13), \(\xi (\rho )\) is well defined as \(\xi (\rho )(t) \in P_{\eta }(\Sigma )\) for any t.

Similar to the proof for [9, Proposition 6], the solution to (4.11) is the fixed point of mapping \(\xi \), and we prove its existence by Schaefer’s fixed point theorem, which asserts that a continuous and compact mapping \(\xi \) of a Banach space X into itself has fixed point if the set \(\{\rho \in X: \rho = \omega \xi (\rho ), \omega \in [0, 1]\}\) is bounded. Firstly, \(\xi \) is continuous as the system (4.11) is linear in u and \(\rho \). \({\mathcal {C}}^{0}([0, T]; P_{\eta }(\Sigma ))\) is a convex subset of Banach space \({\mathcal {C}}^{0}([0, T]; {\mathbb {R}}^{K})\). Moreover, from the linearity and bounded coefficients of system (4.11), \(\xi \) maps any bounded set of \({\mathcal {C}}^{0}([0, T]; P_{\eta }(\Sigma ))\) into set of bounded and Lipschitz continuous functions with uniform Lipschitz coefficient in \({\mathcal {C}}^{1}([0, T]; P_{\eta }(\Sigma ))\), which by Arzela–Ascoli theorem, is relatively compact. By compact map definition, \(\xi \) is a compact map. Hence, to apply Schaefer’s fixed point theorem, it remains to prove that the set \(\{ \rho : \rho = \omega \xi (\rho ) \}\) is uniform bounded for \(\forall \omega \in [0, 1]\). We can restrict to \(\omega > 0\) since otherwise \(\rho = 0\). Fix a \(\rho \) such that \(\rho = \omega \xi (\rho )\), which means the couple \((u(\rho ), \xi (\rho ))\) solves (for notation simplicity we neglect their dependency on \(\rho \))

$$\begin{aligned} \begin{aligned} \frac{\mathrm{d} u_{z}(t)}{\mathrm{d} t}&= - \lambda ^{*}(z, \Delta ^{z} {\tilde{\theta }}(t)) \cdot \Delta ^{z} u(t) - b(t, z) \\ \frac{\mathrm{d} \xi _{z}(t)}{\mathrm{d} t}&= \sum _{y} \xi _{y}(t) \lambda ^{*}_{z}(y, \Delta ^{y} {\tilde{\theta }}(t)) \\&\quad + \sum _{y} {\tilde{p}}_{y}(t) D_{\mu } \lambda ^{*}_{z}(y, \Delta ^{y} {\tilde{\theta }}(t)) \cdot \Delta ^{y} u(t) + c(t, z) \\ u_{z}(T)&= \nabla G(z, {\tilde{p}}(T)) \cdot \omega \xi (T) + u_{T, z} \\ \xi _{z}(t_{0})&= \rho _{z, 0}. \end{aligned} \end{aligned}$$
(4.14)

We need to prove the solution \((u, \xi )\) if existed, are bounded uniformly for any \(\omega \in (0, 1]\). For notation simplicity, we omit the dependence of \(\lambda ^{*}\) on the second variable. From (4.14),

$$\begin{aligned} \begin{aligned}&\sum _{z \in \Sigma } \frac{\mathrm{d}}{\mathrm{d} t} (u_{z}(t) \xi _{z}(t)) = - \sum _{z, y \in \Sigma } \xi _{z}(t) \lambda ^{*}_{y}(z) (u_{y}(t) - u_{z}(t)) + \sum _{z, y \in \Sigma } \xi _{y}(t) \lambda ^{*}_{z}(y) u_{z}(t) \\&\quad + \sum _{z, y \in \Sigma } u_{z}(t) {\tilde{p}}_{y}(t) D_{\mu } \lambda ^{*}_{z}(y) \cdot \Delta ^{y} u(t) + \sum _{z \in \Sigma } c(t, z) u_{z}(t) - \sum _{z \in \Sigma } \xi _{z}(t) b(t, z). \end{aligned} \end{aligned}$$

The first line is 0 by exchanging z and y in the second double sum and using (2.3). Integrating over [0, T] and using the expression of \(u_{z}(T)\), we have

$$\begin{aligned} \begin{aligned}&\sum _{z \in \Sigma } \xi _{z}(T) [ \nabla G(z, {\tilde{p}}(T)) \cdot \omega \xi (T) + u_{T, z}] - u(0) \cdot \rho _{0} \\&\quad = \int _{0}^{T} \sum _{z \in \Sigma } c(t, z) u_{z}(t) \mathrm{d} t - \int _{0}^{T} \sum _{z \in \Sigma } \xi _{z}(t) b(t, z) \mathrm{d} t \\&\qquad + \int _{0}^{T} \sum _{z, y \in \Sigma } {\tilde{p}}_{y}(t) D_{\mu } \lambda ^{*}_{z}(y) \cdot \Delta ^{y} u(t) (u_{z}(t) - u_{y}(t)) \mathrm{d} t. \end{aligned} \end{aligned}$$

Reorganize the terms and we get

$$\begin{aligned} \begin{aligned}&\int _{0}^{T} \sum _{z, y \in \Sigma } {\tilde{p}}_{y}(t) D_{\mu } \lambda ^{*}_{z}(y) \cdot \Delta ^{y} u(t) (u_{z}(t) - u_{y}(t)) \mathrm{d} t - \omega \sum _{z \in \Sigma } \xi _{z}(T) \nabla G(z, {\tilde{p}}(T)) \cdot \xi (T) \\&\quad = \int _{0}^{T} \sum _{z \in \Sigma } \xi _{z}(t) b(t, z) \mathrm{d} t - \int _{0}^{T} \sum _{z \in \Sigma } c(t, z) u_{z}(t) \mathrm{d} t + \sum _{z \in \Sigma } \xi _{z}(T) u_{T, z} - u(0) \cdot \rho _{0}. \end{aligned} \end{aligned}$$

From assumption on G in (2.6) and definition of directional derivative, we have

$$\begin{aligned} - \omega \sum _{z \in \Sigma } \xi _{z}(T) \nabla G(z, {\tilde{p}}(T)) \cdot \xi (T) = - \omega \sum _{z \in \Sigma } \xi _{z}(T) \frac{\partial G}{\partial \xi (T)}(z, {\tilde{p}}(T)) \ge 0. \end{aligned}$$

Moreover, as \(\lambda ^{*}(y) = D_{\mu } H(y)\) (we also neglect the dependence of H on the second variable),

$$\begin{aligned}&\int _{0}^{T} \sum _{z, y \in \Sigma } {\tilde{p}}_{y}(t) D_{\mu } \lambda ^{*}_{z}(y) \cdot \Delta ^{y} u(t) (u_{z}(t) - u_{y}(t)) \mathrm{d} t \\&\quad = \int _{0}^{T} \sum _{y \in \Sigma } {\tilde{p}}_{y}(t) \Delta ^{y} u(t) \cdot D^{2}_{\mu \mu } H(y) \cdot \Delta ^{y} u(t) \mathrm{d} t. \end{aligned}$$

Since \({\tilde{p}}\) and \({\hat{p}}\) can be negative, the same step in [9, Proposition 6] to obtain estimation on RHS of above is not applicable. However, as \({\tilde{p}}_{y}(t) + M_1 \ge 0\) for all \(y \in \Sigma \), from (2.4), we can rewrite the RHS of the equation and get the following estimation instead.

$$\begin{aligned} \begin{aligned}&\int _{0}^{T} \sum _{y \in \Sigma } {\tilde{p}}_{y}(t) \Delta ^{y} u(t) \cdot D^{2}_{\mu \mu } H(y) \cdot \Delta ^{y} u(t) \mathrm{d} t \\&\quad = \int _{0}^{T} \sum _{y \in \Sigma } ({\tilde{p}}_{y}(t) + M_1 ) \Delta ^{y} u(t) \cdot D^{2}_{\mu \mu } H(y) \cdot \Delta ^{y} u(t) \mathrm{d} t\\&\qquad - M_1 \int _{0}^{T} \sum _{y \in \Sigma } \Delta ^{y} u(t) \cdot D^{2}_{\mu \mu } H(y) \cdot \Delta ^{y} u(t) \mathrm{d} t \\&\quad \ge C^{-1} \int _{0}^{T} \sum _{z \in \Sigma } ({\tilde{p}}_{y}(t) + M_1) \Vert \Delta ^{z} u(t) \Vert ^2 \mathrm{d} t \\&\qquad - M_1 C \int _{0}^{T} \sum _{z \in \Sigma } \Vert \Delta ^{z} u(t) \Vert ^2 \mathrm{d} t. \end{aligned} \end{aligned}$$

So there exists constant C and \(C_1\) (\(C_1\) only depends on the dimension of u) such that

$$\begin{aligned} \begin{aligned} \int _{0}^{T} \sum _{z \in \Sigma } ({\tilde{p}}_{z}(t) + M_1) \Vert \Delta ^{z} u(t) \Vert ^2 \mathrm{d} t&\le C \bigg (\int _{0}^{T} | c(t) \cdot u(t) | \mathrm{d} t + \int _{0}^{T} | \xi (t) \cdot b(t) | \mathrm{d} t \\&\quad + \Vert \xi (T) \Vert \Vert u_{T} \Vert + \Vert u(0) \Vert \Vert \rho _{0} \Vert + M_1 C_1 \Vert u \Vert ^2\bigg ), \end{aligned} \end{aligned}$$
(4.15)

where \(b(t):= (b(t, 1), \ldots , b(t, K))\), and c(t) is defined similarly. As \(\lambda ^{*}\) and \(D_{\mu } \lambda ^{*}\) is bounded by constant C, from ODE for \(\xi \) in (4.14) we have

$$\begin{aligned} \begin{aligned} | \xi _{z}(t) |&\le |\rho _{0, z}| + C \int _{0}^{t} \sum _{y \in \Sigma } | \xi _{y}(s) | \mathrm{d} s \\&\quad + C \int _{0}^{t} \left[ \sum _{y \in \Sigma } ( {\tilde{p}}_{y}(s) + M_1) \Vert \Delta ^{y} u(s) \Vert + |c(s, z)| \right] \mathrm{d} s \\&\quad + C M_1 \int _{0}^{t} \sum _{y \in \Sigma } \Vert \Delta ^{y} u(s) \Vert \mathrm{d} s. \end{aligned} \end{aligned}$$

So that by Gronwall’s inequality, there exists constant C such that

$$\begin{aligned} \Vert \xi \Vert \le C (\Vert \rho _{0} \Vert + \Vert c \Vert ) + C \int _{0}^{T} \sum _{y \in \Sigma } ({\tilde{p}}_{y}(t) + M_1) \Vert \Delta ^{y} u(t) \Vert \mathrm{d} t + C M_1 \int _{0}^{T} \sum _{y \in \Sigma } \Vert \Delta ^{y} u(t) \Vert \mathrm{d} t, \end{aligned}$$

where there exists C such that \(\sum _{y \in \Sigma } ({\tilde{p}}_{y}(t) + M_1) \le C^2 \) and

$$\begin{aligned} \begin{aligned}&\int _{0}^{T} \sum _{y \in \Sigma } ({\tilde{p}}_{y}(t) + M_1) \Vert \Delta ^{y} u(t) \Vert \mathrm{d} t = \int _{0}^{T} \sum _{y \in \Sigma } \sqrt{{\tilde{p}}_{y}(t) + M_1} \sqrt{{\tilde{p}}_{y}(t) + M_1} \Vert \Delta ^{y} u(t) \Vert \mathrm{d} t \\&\quad \le \int _{0}^{T} \sqrt{ \sum _{y \in \Sigma } ({\tilde{p}}_{y}(t) + M_1)} \sqrt{\sum _{y \in \Sigma } ({\tilde{p}}_{y}(t) + M_1) \Vert \Delta ^{y} u(t) \Vert ^{2}} \mathrm{d} t \\&\quad \le C \int _{0}^{T} \sqrt{\sum _{y \in \Sigma } ({\tilde{p}}_{y}(t) + M_1) \Vert \Delta ^{y} u(t) \Vert ^{2}} \mathrm{d} t \\&\quad \le C \sqrt{\int _{0}^{T} \sum _{y \in \Sigma } ({\tilde{p}}_{y}(t) + M_1) \Vert \Delta ^{y} u(t) \Vert ^{2} \mathrm{d} t}. \end{aligned} \end{aligned}$$

From (4.15), there exist different constants C at each line such that

$$\begin{aligned} \begin{aligned} \Vert \xi \Vert&\le C (\Vert \rho _{0} \Vert + \Vert c \Vert ) + C \int _{0}^{T} \sum _{y \in \Sigma } ({\tilde{p}}_{y}(t) + M_1) \Vert \Delta ^{y} u(t) \Vert \mathrm{d} t + C M_1 \int _{0}^{T} \sum _{y \in \Sigma } \Vert \Delta ^{y} u(t) \Vert \mathrm{d} t \\&\le C (\Vert \rho _{0} \Vert + \Vert c \Vert ) + C (M_1 + \sqrt{M_1}) \Vert u \Vert \\&\qquad + C( \Vert c \Vert ^{\frac{1}{2}} \Vert u \Vert ^{\frac{1}{2}} + \Vert \xi \Vert ^{\frac{1}{2}} \Vert b \Vert ^{\frac{1}{2}} + \Vert \xi (T) \Vert ^{\frac{1}{2}} \Vert u_{T} \Vert ^{\frac{1}{2}} + \Vert u(0) \Vert ^{\frac{1}{2}} \Vert \rho _{0} \Vert ^{\frac{1}{2}} ). \end{aligned} \end{aligned}$$

Moreover, using Gronwall inequality on the backward ODE in (4.14) for function u, there exists C such that

$$\begin{aligned} \Vert u \Vert \le C [\Vert u_{T} \Vert + \Vert \xi (T) \Vert + \Vert b \Vert ]. \end{aligned}$$

Then there exists C such that

$$\begin{aligned} \begin{aligned} \Vert \xi \Vert&\le C (\Vert \rho _{0} \Vert + \Vert c \Vert ) + C (M_1 + \sqrt{M_1}) ( \Vert u_{T} \Vert + \Vert \xi (T) \Vert + \Vert b \Vert ) + C \Vert c \Vert ^{\frac{1}{2}} ( \Vert u_{T} \Vert + \Vert \xi (T) \Vert + \Vert b \Vert )^{\frac{1}{2}} \\&\quad + C( \Vert \xi \Vert ^{\frac{1}{2}} \Vert b \Vert ^{\frac{1}{2}} + \Vert \xi (T) \Vert ^{\frac{1}{2}} \Vert u_{T} \Vert ^{\frac{1}{2}} + (\Vert u_{T} \Vert ^{\frac{1}{2}} + \Vert \xi (T) \Vert ^{\frac{1}{2}} + \Vert b \Vert ^{\frac{1}{2}}) \Vert \rho _{0} \Vert ^{\frac{1}{2}} ) \end{aligned} \end{aligned}$$

As \(\Vert \xi (T) \Vert \le \Vert \xi \Vert \), using the inequality \(A B \le \varepsilon A^2 + \frac{1}{4 \varepsilon } B^2\) for \(A, B \ge 0\), there exists C such that

$$\begin{aligned} \Vert \xi \Vert \le C ( \Vert c \Vert + \Vert b \Vert + \Vert \rho _{0} \Vert + \Vert u_{T} \Vert ) + \left( C (M_1 + \sqrt{M_1}) + \frac{1}{4}\right) \Vert \xi \Vert . \end{aligned}$$

Note that the constant C only depends on the boundedness of \({\tilde{\theta }}\), which depends on M in Proposition 4.1. If

$$\begin{aligned} C (M_1 + \sqrt{M_1}) \le \frac{1}{4}, \end{aligned}$$

then we have

$$\begin{aligned} \Vert \xi \Vert \le 2 C ( \Vert c \Vert + \Vert b \Vert + \Vert \rho _{0} \Vert + \Vert u_{T} \Vert ). \end{aligned}$$

Hence, the solution pair \((u, \xi )\) are bounded for all \(\omega \in [0, 1]\), which means \(\rho = \omega \xi (\rho )\) are also uniform bounded, and hence prove the existence of solution to (4.11). Meanwhile, let \(\omega = 1\) leads to the uniform bound estimation for solution \((u, \rho )\) to (4.11), and the uniqueness of it comes directly from (4.12). If \(N_0 > \frac{3 e^{\Lambda (M_1)}}{M_1}\), from Proposition 4.1, we have \({\tilde{p}}_{y}(t) > -M_1\) uniformly, which concludes our proof. Hence, we can just update our \(N_{0}\) set before to satisfy the inequality. \(\square \)

Then we can prove the differentiability of \({\tilde{U}}\) w.r.t \({\tilde{p}}_{0}\) in Proposition 4.6.

Proposition 4.6

Let \(({\tilde{\theta }}, {\tilde{p}})\) and \(({\hat{\theta }}, {\hat{p}})\) be the solutions to ODE system (2.13), respectively, starting from \((t_0, {\tilde{p}}(t_0))\) and \((t_0, {\hat{p}}(t_0))\), and \((v, \zeta )\) be the solution to (4.11) starting from \(\rho _0:= {\hat{p}}(t_0) - {\tilde{p}}(t_0)\). There exist positive constants \(N_0\) and C, such that if we have (4.3), then for any \(t_0 \in [0, T]\) and \({\tilde{p}}(t_0), {\hat{p}}(t_0) \in {\bar{B}}(P(\Sigma ), \frac{1}{N_{0}})\), we have

$$\begin{aligned} \Vert {\hat{\theta }} - {\tilde{\theta }} - v \Vert + \Vert {\hat{p}} - {\tilde{p}} - \zeta \Vert \le C \Vert {\hat{p}}(t_0) - {\tilde{p}}(t_0) \Vert ^2. \end{aligned}$$

Proof

Without loss of generality, we assume \(t_{0} = 0\). Similar to the proof of [9, Theorem 7], we can use results from Proposition 4.5 to prove our conclusion. Define \(N_0\) as the one in Proposition 4.5. Then \({\tilde{p}}_{y}(t), {\hat{p}}_{y}(t) > - M_1\) uniformly on \((t, y) \in [0, T] \times \Sigma \). Define linearized system with \(w:= {\hat{p}}(0) - {\tilde{p}}(0) \):

$$\begin{aligned} \begin{aligned} \frac{d v_{z}(t)}{\mathrm{d} t}&= - \lambda ^{*}(z, \Delta ^{z} {\tilde{\theta }}(t)) \cdot \Delta ^{z} v(t) \\ \frac{d \zeta _{z}(t)}{\mathrm{d} t}&= \sum _{y} \zeta _{y}(t) \lambda ^{*}_{z}(y, \Delta ^{y} {\tilde{\theta }}(t)) + \sum _{y} {\tilde{p}}_{y}(t) D_{\mu } \lambda ^{*}_{z}(y, \Delta ^{y} {\tilde{\theta }}(t)) \cdot \Delta ^{y} v(t) \\ v_{z}(T)&= \frac{\partial G}{\partial \zeta (T)} (z, {\tilde{p}}(T)) = D^{1} G(z, {\tilde{p}}(T)) \cdot \zeta (T) + \frac{\partial G}{\delta _{1}} (z, {\tilde{p}}(T)) \sum _{z = 1}^{K} w_{z} \\ \zeta _{z}(0)&= w_{z}. \end{aligned} \end{aligned}$$
(4.16)

From condition in Theorem 2.6, the sum of every component of \({\tilde{p}}\) equals 1 for all \(t \in [0, T]\). Hence, we know \(\sum _{z \in \Sigma } \epsilon _{2}(t,z) = 0\) and define

$$\begin{aligned} S({\hat{p}}, {\tilde{p}}):= \sum _{z \in \Sigma } ({\hat{p}}_{z}(0) - {\tilde{p}}_{z}(0)) = \sum _{z \in \Sigma } ({\hat{p}}_{z}(T) - {\tilde{p}}_{z}(T)). \end{aligned}$$

We know there exists C such that \(|S({\hat{p}}, {\tilde{p}})| \le C \Vert {\hat{p}}(T) - {\tilde{p}}(T) \Vert \). Set \(u:= {\hat{\theta }} - {\tilde{\theta }} - v\) and \(\rho := {\hat{p}} - {\tilde{p}} - \zeta \), they solve (4.11), where

$$\begin{aligned} \begin{aligned} b(t, z)&:= H(z, \Delta ^{z} {\hat{\theta }}(t)) - H(z, \Delta ^{z} {\tilde{\theta }}(t)) - \lambda ^{*}(z, \Delta ^{z} {\tilde{\theta }}(t)) \cdot (\Delta ^{z} {\hat{\theta }}(t) - \Delta ^{z} {\tilde{\theta }}(t)) \\ c(t, z)&:= \sum _{y} {\hat{p}}_{y}(t) [\lambda ^{*}_{z}(y, \Delta ^{y} {\hat{\theta }}(t)) - \lambda ^{*}_{z}(y, \Delta ^{y} {\tilde{\theta }}(t))] \\&\quad - \sum _{y} {\tilde{p}}_{y}(t) D_{\mu } \lambda ^{*}_{z}(y, \Delta ^{y} {\tilde{\theta }}(t)) \cdot (\Delta ^{y} {\hat{\theta }}(t) - \Delta ^{y} {\tilde{\theta }}(t))) \\ u_{T, z}&:= G(z, {\hat{p}}(T)) - G(z, {\tilde{p}}(T)) - D^{1} G(z, {\tilde{p}}(T)) ({\hat{p}}(T) - {\tilde{p}}(T)) \\&\quad - \frac{\partial G}{\delta _{1}} (z, {\tilde{p}}(T)) S({\hat{p}}, {\tilde{p}}). \end{aligned} \end{aligned}$$

From (2.3), \(\sum _{z \in \Sigma } c(t, z) = 0\). The existence and uniqueness of solution to (4.16) is guaranteed by Proposition 4.5. We can simplify b and c as

$$\begin{aligned} \begin{aligned} b(t, z)&= \int _{0}^{1} [D_{\mu } H(z, \Delta ^{z} {\tilde{\theta }}(t) + s (\Delta ^{z} {\hat{\theta }}(t) - \Delta ^{z} {\tilde{\theta }}(t))) - D_{\mu } H(z, \Delta ^{z} {\tilde{\theta }}(t))] \\&\quad \cdot (\Delta ^{z} {\hat{\theta }}(t) - \Delta ^{z} {\tilde{\theta }}(t)) \mathrm{d} s \\ c(t, z)&= \sum _{y} {\hat{p}}_{y}(t) \int _{0}^{1} [D_{\mu } \lambda ^{*}_{z}(y, \Delta ^{y} {\tilde{\theta }}(t) + s ({\hat{\theta }}(t) - \Delta ^{y} {\tilde{\theta }}(t)))\\&\quad - D_{\mu } \lambda ^{*}_{z}(y, \Delta ^{y} {\tilde{\theta }}(t))] \cdot (\Delta ^{z} {\hat{\theta }}(t) - \Delta ^{z} {\tilde{\theta }}(t)) \mathrm{d} s \\&\quad + \sum _{y} ({\hat{p}}_{y}(t) - {\tilde{p}}_{y}(t)) D_{\mu } \lambda ^{*}_{z}(y, \Delta ^{y} {\tilde{\theta }}(t)) \cdot (\Delta ^{y} {\hat{\theta }}(t) - \Delta ^{y} {\tilde{\theta }}(t))). \end{aligned} \end{aligned}$$

Moreover, since

$$\begin{aligned} \begin{aligned} G(z, {\hat{p}}(T)) - G(z, {\tilde{p}}(T))&= \int _{0}^{1} \frac{\partial G}{\partial ({\hat{p}}(T) - {\tilde{p}}(T))} (z, {\tilde{p}}(T) + s ({\hat{p}}(T) - {\tilde{p}}(T))) \mathrm{d} s \\&= \int _{0}^{1} D^{1} G(z, {\tilde{p}}(T) + s ({\hat{p}}(T) - {\tilde{p}}(T))) \cdot (({\hat{p}}(T) - {\tilde{p}}(T)))) \mathrm{d} s \\&\quad + \int _{0}^{1} \frac{\partial G}{\partial \delta _{1}} (z, {\tilde{p}}(T) + s ({\hat{p}}(T) - {\tilde{p}}(T))) S({\hat{p}}, {\tilde{p}}) \mathrm{d} s \end{aligned} \end{aligned}$$

we have

$$\begin{aligned} \begin{aligned} u_{T, z}&= \int _{0}^{1} ( D^{1} G(z, {\tilde{p}}(T) + s ({\hat{p}}(T) - {\tilde{p}}(T))) - D^{1} G(z, {\tilde{p}}(T))) \cdot (({\hat{p}}(T) - {\tilde{p}}(T)))) \mathrm{d} s \\&\quad + \int _{0}^{1} ( \frac{\partial G}{\partial \delta _{1}} (z, {\tilde{p}}(T) + s ({\hat{p}}(T) - {\tilde{p}}(T))) - \frac{\partial G}{\delta _{1}} (z, {\tilde{p}}(T)) ) S({\hat{p}}, {\tilde{p}}) \mathrm{d} s. \end{aligned} \end{aligned}$$

From Proposition 4.1, \({\tilde{\theta }}\), \({\tilde{p}}\), \({\hat{\theta }}\) and \({\hat{p}}\) are bounded. From Assumption 2.1, namely the Lipschitz continuity of \(D_{\mu } H\), \(D_{\mu } \lambda ^{*}\), \(\frac{\partial G}{\delta _{1}}\) and \(D^{1} G\) in their second variable, there exists constant C such that

$$\begin{aligned} \begin{aligned} \Vert b \Vert&\le C \Vert {\tilde{\theta }} - {\hat{\theta }} \Vert ^2 \\ \Vert u_{T, z} \Vert&\le C \Vert {\tilde{p}}(T) - {\hat{p}}(T) \Vert ^2 \\ \Vert c \Vert&\le C ( \Vert {\tilde{\theta }} - {\hat{\theta }} \Vert ^2 + \Vert {\tilde{\theta }} - {\hat{\theta }} \Vert \cdot \Vert {\tilde{p}} - {\hat{p}} \Vert ). \end{aligned} \end{aligned}$$

Applying Proposition 4.5 and then Proposition 4.2, we have there exists C such that

$$\begin{aligned} \Vert u \Vert + \Vert \rho \Vert \le C \Vert {\hat{p}}(0) - {\tilde{p}}(0) \Vert ^2, \end{aligned}$$

which concludes the proof. \(\square \)

As (4.16) is a linear system. v and \(\zeta \) in (4.16) can be viewed as a linear map of w. Hence, Proposition 4.6 suggests that \({\tilde{U}}\) is differentiable w.r.t \({\tilde{p}}_{0}\) and the directional derivative \(\frac{\partial }{\partial w} {\tilde{U}}(t, z, {\tilde{p}})\) is the solution to ODE system (4.16), with \({\tilde{\theta }}_{z}(t) = {\tilde{U}}(t, z, {\tilde{p}}(t))\).

Theorem 4.7

There exist positive constants \(N_0\) and C, such that if we have (4.3), \({\tilde{U}}\) is differentiable on \(B(P(\Sigma ), \frac{1}{N_{0}})\), and for any vector w, \(\frac{\partial }{\partial w} {\tilde{U}}(t, z, {\tilde{p}}(t))\) exists and is Lipschitz continuous w.r.t \({\tilde{p}}\), uniformly in t, z. \(\frac{\partial }{\partial w} {\tilde{U}}(t, z, {\tilde{p}}(t))\) is also continuous w.r.t t.

Proof

Define \(N_0\) as the one in Proposition 4.5. Let \(({\tilde{\theta }}, {\tilde{p}})\) and \(({\hat{\theta }}, {\hat{p}})\) be two solutions to (2.13), with initial conditions \({\tilde{p}}(t_0), {\hat{p}}(t_0) \in B(P(\Sigma ), \frac{1}{N_{0}})\). Let also \(({\tilde{v}}, {\tilde{\zeta }})\) and \(({\hat{v}}, {\hat{\zeta }})\) characterize \(\frac{\partial }{\partial w} {\tilde{U}}(t_{0}, z, {\tilde{p}}(t_{0}))\) and \(\frac{\partial }{\partial w} {\tilde{U}}(t_{0}, z, {\hat{p}}(t_{0}))\), respectively. Then \(({\tilde{v}}, {\tilde{\zeta }})\) satisfies the following.

$$\begin{aligned} \begin{aligned} \frac{\mathrm{d} {\tilde{v}}_{z}(t)}{\mathrm{d} t}&= - \lambda ^{*}(z, \Delta ^{z} {\tilde{\theta }}(t)) \cdot \Delta ^{z} {\tilde{v}}(t) \\ \frac{\mathrm{d} {\tilde{\zeta }}_{z}(t)}{\mathrm{d} t}&= \sum _{y} {\tilde{\zeta }}_{y}(t) \lambda ^{*}_{z}(y, \Delta ^{y} {\tilde{\theta }}(t)) + \sum _{y} {\tilde{p}}_{y}(t) D_{\mu } \lambda ^{*}_{z}(y, \Delta ^{y} {\tilde{\theta }}(t)) \cdot \Delta ^{y} {\tilde{v}}(t) \\ {\tilde{v}}_{z}(T)&= \frac{\partial G}{\partial {\tilde{\zeta }}(T)} (z, {\tilde{p}}(T)) \\ {\tilde{\zeta }}_{z}(t_{0})&= w_{z}. \end{aligned} \end{aligned}$$
(4.17)

From Proposition 4.5, we know the uniform bound of both \({\tilde{v}}\) and \({\tilde{\zeta }}\) depend linearly on norm of w. Similar is for \(({\hat{v}}, {\hat{\zeta }})\), except for replacing \(({\tilde{\theta }}, {\tilde{p}})\) by \(({\hat{\theta }}, {\hat{p}})\). Set \(u:= {\tilde{v}} - {\hat{v}}\), \(\rho := {\tilde{\zeta }} - {\hat{\zeta }}\). They solve the linear system (4.11) with \(\rho (t_{0}) = 0\) and

$$\begin{aligned} \begin{aligned} b(t, z)&:= (\lambda ^{*}(z, \Delta ^{z} {\tilde{\theta }}(t)) - \lambda ^{*}(z, \Delta ^{z} {\hat{\theta }}(t))) \cdot \Delta ^{z} {\hat{v}}(t) \\ c(t, z)&:= \sum _{y \in \Sigma } {\hat{\zeta }}_{y}(t) (\lambda ^{*}_{z}(y, \Delta ^{y} {\tilde{\theta }}(t)) - \lambda ^{*}_{z}(y, \Delta ^{y} {\hat{\theta }}(t))) \\&\quad + \sum _{y \in \Sigma } [{\tilde{p}}_{y}(t) D_{\mu } \lambda ^{*}_{z}(y, \Delta ^{y} {\tilde{\theta }}(t)) - {\hat{p}}_{y}(t) D_{\mu } \lambda ^{*}_{z}(y, \Delta ^{y} {\hat{\theta }}(t))] \cdot \Delta ^{z} {\hat{v}}(t) \\ u_{T, z}&:= \frac{\partial G}{\partial {\hat{\zeta }}(T)} (z, {\tilde{p}}(T)) - \frac{\partial G}{\partial {\hat{\zeta }}(T)} (z, {\hat{p}}(T)). \end{aligned} \end{aligned}$$

Using the Lipschitz continuity of \(\lambda ^{*}\), \(D_{\mu } \lambda ^{*}\) and directional derivatives of G, applying the bounds (4.12) to \({\hat{v}}\) and \({\hat{\zeta }}\), and the estimation on \(\Vert {\tilde{\theta }} - {\hat{\theta }} \Vert \), \(\Vert {\tilde{p}} - {\hat{p}} \Vert \) from Proposition 4.2, there exists C such that

$$\begin{aligned} \begin{aligned} \Vert b \Vert&\le C \Vert {\tilde{\theta }} - {\hat{\theta }} \Vert \Vert {\hat{v}} \Vert \le C \Vert {\tilde{p}}(t_{0}) - {\hat{p}}(t_{0}) \Vert \Vert w \Vert \\ \Vert c \Vert&\le C \Vert {\tilde{\theta }} - {\hat{\theta }} \Vert \Vert {\hat{\zeta }} \Vert + C \Vert {\tilde{\theta }} - {\hat{\theta }} \Vert \Vert {\hat{v}} \Vert + C \Vert {\tilde{p}} - {\hat{p}} \Vert \Vert {\hat{v}} \Vert \\&\le C \Vert {\tilde{p}}(t_{0}) - {\hat{p}}(t_{0}) \Vert \Vert w \Vert \\ \Vert u_{T} \Vert&\le C \Vert {\tilde{p}} - {\hat{p}} \Vert \Vert {\hat{\zeta }} \Vert \le C \Vert {\tilde{p}}(t_{0}) - {\hat{p}}(t_{0}) \Vert \Vert w \Vert \end{aligned} \end{aligned}$$

From Proposition 4.5, we have

$$\begin{aligned} \Vert u \Vert \le C (\Vert b \Vert + \Vert c \Vert + \Vert u_{T} \Vert ) \le C \Vert {\tilde{p}}(t_{0}) - {\hat{p}}(t_{0}) \Vert \Vert w \Vert . \end{aligned}$$

From Proposition 4.6, we have

$$\begin{aligned} {\tilde{v}}_{z}(t_{0}) = \frac{\partial {\tilde{U}}}{\partial w}(t_{0}, z, {\tilde{p}}(t_{0})), \quad {\hat{v}}_{z}(t_{0}) = \frac{\partial {\tilde{U}}}{\partial w}(t_{0}, z, {\hat{p}}(t_{0})). \end{aligned}$$

Therefore, \(\frac{\partial {\tilde{U}}}{\partial w}\) is Lipschitz continuous, uniform w.r.t t and z.

On the other hand, for another initial time \(t_1 > t_0\), we first compare \(\frac{\partial }{\partial w} {\tilde{U}}(t_{0}, z, {\tilde{p}}(t_{0}))\) and \(\frac{\partial }{\partial w} {\tilde{U}}(t_{1}, z, {\tilde{p}}(t_{1}))\), where \((t_{1}, {\tilde{p}}(t_{1}))\) is on the path \((t, {\tilde{p}}(t))\) start from \(t_{0}\) to T. They are both characterized by system like (4.17), though we need to replace \(t_0\) with \(t_1\) for \(\frac{\partial }{\partial w} {\tilde{U}}(t_{1}, z, {\tilde{p}}(t_{1}))\). Let \(({\tilde{v}}, {\tilde{\zeta }})\) satisfy (4.17). Then we know

$$\begin{aligned} {\tilde{v}}(t_0) = \frac{\partial }{\partial w} {\tilde{U}}(t_{0}, z, {\tilde{p}}(t_{0})), \quad {\tilde{v}}(t_1) = \frac{\partial }{\partial {\tilde{\zeta }}(t_1)} {\tilde{U}}(t_{1}, z, {\tilde{p}}(t_{1})). \end{aligned}$$

\(\frac{\partial }{\partial {\tilde{\zeta }}(t_1)} {\tilde{U}}(t_{1}, z, {\tilde{p}}(t_{1}))\) is also characterized by (4.17), except that \(t_0\) and initial value need to be replaced by \(t_1\) and \({\tilde{\zeta }}(t_1)\). It means \(\frac{\partial }{\partial {\tilde{\zeta }}(t_1)} {\tilde{U}}(t_{1}, z, {\tilde{p}}(t_{1})) - \frac{\partial }{\partial w} {\tilde{U}}(t_{1}, z, {\tilde{p}}(t_{1}))\) can also be characterized by (4.17) except that \(t_0\) and initial value need to be replaced by \(t_1\) and \({{\tilde{\zeta }}}(t_1) - w\). From Proposition 4.5, we have there exists constant C such that

$$\begin{aligned} \left| \frac{\partial }{\partial {\tilde{\zeta }}(t_1)} {\tilde{U}}(t_{1}, z, {\tilde{p}}(t_{1})) - \frac{\partial }{\partial w} {\tilde{U}}(t_{1}, z, {\tilde{p}}(t_{1})) \right| \le C | {\tilde{\zeta }}_z(t_1) - w_z |. \end{aligned}$$

As \(\lambda ^{*}\), \(D_{\mu } \lambda ^{*}\) and the directional derivative of G are Lipschitz continuous and uniform bounded, as well as that both \({\tilde{v}}\) and \({\tilde{\zeta }}\) are uniformly bounded, we know hence both \(\frac{\mathrm{d} {\tilde{v}}_{z}(t)}{\mathrm{d} t}\) and \(\frac{\mathrm{d} {\tilde{\zeta }}_{z}(t)}{\mathrm{d} t}\) are also uniformly bounded by some constant C. We have

$$\begin{aligned} \begin{aligned}&\Vert {\tilde{\zeta }}(t_1) - w \Vert = \Vert {\tilde{\zeta }}(t_1) - {\tilde{\zeta }}(t_0) \Vert \le C | t_1 - t_0 |, \\&\quad \left| \frac{\partial }{\partial w} {\tilde{U}}(t_{0}, z, {\tilde{p}}(t_{0})) - \frac{\partial }{\partial {\tilde{\zeta }}(t_1)} {\tilde{U}}(t_{1}, z, {\tilde{p}}(t_{1})) \right| = | {\tilde{v}}_{z}(t_{0}) - {\tilde{v}}_{z}(t_{1}) | \le C | t_1 - t_0 |. \end{aligned} \end{aligned}$$

Combine above, we know there exists constant C such that

$$\begin{aligned} \left| \frac{\partial }{\partial w} {\tilde{U}}(t_{0}, z, {\tilde{p}}(t_{0})) - \frac{\partial }{\partial w} {\tilde{U}}(t_{1}, z, {\tilde{p}}(t_{1})) \right| \le C | t_1 - t_0 |. \end{aligned}$$

Then by the continuity of \(\frac{\partial }{\partial w} {\tilde{U}}\) w.r.t its third argument, as well as the continuity of \({\tilde{p}}\), we can also conclude that \(\frac{\partial }{\partial w} {\tilde{U}}\) is continuous w.r.t t, its first argument. \(\square \)

From Proposition 4.6 and Theorem 4.7, \({\tilde{U}}\) is \({\mathcal {C}}^{1}\) on compact set \({\bar{B}}(P(\Sigma ), \frac{1}{N_{0}})\). Hence, both \(D {\tilde{U}}\) and the directional derivative of \({\tilde{U}}\) along any direction are well defined, bounded, and Lipschitz continuous, uniformly for \(t \in [0, T]\). Theorem 4.7 also suggests that the directional derivative of \({\tilde{U}}\) along any direction is continuous w.r.t t. Thanks to these properties, we can use similar idea of the proof for existence of solution to the master equation in [9, Section 5.3.1], to show that \({\tilde{U}}\) also satisfies the master equation with some extra error terms.

Theorem 4.8

Let \(({\tilde{\theta }}, {\tilde{p}})\) be the solution to ODE system (2.13). Define \({\tilde{U}}\) as (4.10). There exist positive constants \(N_0\) and C, such that if we have condition (4.3) in Theorem 2.6, then \({\tilde{U}}\) satisfies the following master equation along the path \((t, {\tilde{p}}(t))\) on \([t_{0}, T]\), as long as \({\tilde{p}}(t) \in B(P(\Sigma ), \frac{1}{N_{0}})\).

$$\begin{aligned} \begin{aligned}&\frac{\partial {\tilde{U}}(t, z, {\tilde{p}}(t))}{\partial t} + H(z, \Delta ^{z} {\tilde{U}}) + \sum _{y \in \Sigma } {\tilde{p}}_{y}(t) \lambda ^{*}(y, \Delta ^{y} {\tilde{U}}) \cdot D {\tilde{U}}(t, z, {\tilde{p}}(t))\\&\quad = \epsilon (t, z) {\tilde{U}}(T, z, {\tilde{p}}(T)) = G(z, {\tilde{p}}(T)) + \epsilon _{3}(z), \end{aligned} \end{aligned}$$
(4.18)

where \(\Delta ^{z} {\tilde{U}}:= ( {\tilde{U}}(t, 1, {\tilde{p}}(t)) - {\tilde{U}}(t, z, {\tilde{p}}(t)), \ldots , {\tilde{U}}(t, K, {\tilde{p}}(t)) - {\tilde{U}}(t, z, {\tilde{p}}(t)) )\) and \(\Vert \epsilon \Vert < \frac{C + 1}{N}\), where \(N > N_{0}\) and C comes from the uniform bound coefficient in Proposition 4.2.

Proof

From condition in Theorem 2.6, \({\tilde{p}}(t) \in B(P(\Sigma ), \frac{1}{N_{0}})\) for every \(t \in [t_{0}, T]\) where \(B(P(\Sigma ), \frac{1}{N_{0}})\) being the open neighborhood of \(P(\Sigma )\). Hence, from Proposition 4.1, 4.2 and Theorem 4.7, \({\tilde{U}}\), \(D {\tilde{U}}\) and \(\frac{\partial }{\partial \delta _{1}} {\tilde{U}}\) are well defined on \((t, {\tilde{p}}(t))\). Take t as initial time and \({\tilde{p}}(t)\) as initial value, there exists an unique solution to (2.13), and we can always choose h small enough such that this solution taking value on \(t + h\), i.e., \({\tilde{p}}(t + h) \in B(P(\Sigma ), \frac{1}{N_{0}})\). Note that as \(\sum _{z \in \Sigma } \epsilon _{2}(t, z) = 0\) for all \(t \in [t_{0}, T]\), we have

$$\begin{aligned} \sum _{z \in \Sigma } {\tilde{p}}_{z}(t) = \sum _{z \in \Sigma } {\tilde{p}}_{z}(t + h). \end{aligned}$$

Let us first compute limit of the following when h tends to 0.

$$\begin{aligned} \begin{aligned}&\frac{{\tilde{U}}(t + h, z, {\tilde{p}}(t)) - {\tilde{U}}(t, z, {\tilde{p}}(t))}{h} \\&\quad = \frac{{\tilde{U}}(t + h, z, {\tilde{p}}(t)) - {\tilde{U}}(t + h, z, {\tilde{p}}(t + h))}{h} \\&\qquad + \frac{{\tilde{U}}(t + h, z, {\tilde{p}}(t + h)) - {\tilde{U}}(t, z, {\tilde{p}}(t))}{h} \end{aligned} \end{aligned}$$
(4.19)

For the first term in (4.19), we first define

$$\begin{aligned} W(s):= {\tilde{U}}(t + h, z, {\tilde{p}}(t) + s ({\tilde{p}}(t + h) - {\tilde{p}}(t))). \end{aligned}$$

By definition, we derive the derivative of W as

$$\begin{aligned} W'(s) = \frac{\partial }{\partial ({\tilde{p}}(t + h) - {\tilde{p}}(t))} {\tilde{U}}(t + h, z, {\tilde{p}}(t) + s ({\tilde{p}}(t + h) - {\tilde{p}}(t))). \end{aligned}$$

Then the first term in (4.19) can be reformulated as

$$\begin{aligned} \frac{{\tilde{U}}(t + h, z, {\tilde{p}}(t)) - {\tilde{U}}(t + h, z, {\tilde{p}}(t + h))}{h} = \frac{W(0) - W(1)}{h} = - \frac{1}{h} \int _{0}^{1} W'(s) \mathrm{d} s. \end{aligned}$$

From Lemma 4.4 and \(c(h) = \sum _{z = 1}^{K} ({\tilde{p}}_{z}(t + h) - {\tilde{p}}_{z}(t)) = 0\), we know

$$\begin{aligned} W'(s) = D {\tilde{U}}(t + h, z, {\tilde{p}}(t) + s ({\tilde{p}}(t + h) - {\tilde{p}}(t))) \cdot ({\tilde{p}}(t + h) - {\tilde{p}}(t)) \end{aligned}$$

Substitute above to the first term in (4.19), we get

$$\begin{aligned} \begin{aligned}&\frac{{\tilde{U}}(t + h, z, {\tilde{p}}(t)) - {\tilde{U}}(t + h, z, {\tilde{p}}(t + h))}{h} \\&\quad = - \frac{1}{h} \int _{0}^{1} D {\tilde{U}}(t + h, z, {\tilde{p}}(t) + s ({\tilde{p}}(t + h) - {\tilde{p}}(t))) \cdot ({\tilde{p}}(t + h) - {\tilde{p}}(t)) \mathrm{d} s \\&\quad = - \frac{1}{h} \int _{0}^{1} {\tilde{U}}(t + h, z, {\tilde{p}}(t) + s ({\tilde{p}}(t + h) - {\tilde{p}}(t))) \mathrm{d} s \\&\quad \qquad \cdot \int _{t}^{t + h} \left( \sum _{y} {\tilde{p}}_{y}(u) \lambda ^{*}(y, \Delta ^{y} {\tilde{\theta }}(u)) + \epsilon _{2}(u) \right) d u, \end{aligned} \end{aligned}$$

where \(\epsilon _{2}(t):= (\epsilon _{2}(t, 1), \ldots , \epsilon _{2}(t, z))\). From Theorem 4.7, we know for any \(y \in \Sigma \),

$$\begin{aligned} \lim _{h \rightarrow 0} [D {\tilde{U}}(t + h, z, {\tilde{p}}(t) + s ({\tilde{p}}(t + h) - {\tilde{p}}(t)))]_{y} = [D {\tilde{U}}(t, z, {\tilde{p}}(t))]_{y}. \end{aligned}$$

As \(D {\tilde{U}}\) is uniform bounded, we have the following with dominated convergence theorem:

$$\begin{aligned} \lim _{h \rightarrow 0} \int _{0}^{1} D {\tilde{U}}(t + h, z, {\tilde{p}}(t) + s ({\tilde{p}}(t + h) - {\tilde{p}}(t))) \mathrm{d} s = D {\tilde{U}}(t, z, {\tilde{p}}(t)). \end{aligned}$$

On the other hand, dividing h and letting \(h \rightarrow 0\), we have the following:

$$\begin{aligned} \begin{aligned}&\lim _{h \rightarrow 0} \frac{\int _{t}^{t + h} ( \sum _{y} {\tilde{p}}_{y}(u) \lambda ^{*}(y, \Delta ^{y} {\tilde{\theta }}(u)) + \epsilon _{2}(u) ) d u}{h} = \sum _{y} {\tilde{p}}_{y}(t) \lambda ^{*}(y, \Delta ^{y} {\tilde{\theta }}(t)) + \epsilon _{2}(t) \\&\quad = \sum _{y} {\tilde{p}}_{y}(t) \lambda ^{*}(y, \Delta ^{y} {\tilde{U}}) + \epsilon _{2}(t). \end{aligned} \end{aligned}$$

The last equation comes from Definition of \({\tilde{U}}\), which suggests \(\Delta ^{y} {\tilde{U}} = \Delta ^{y} {\tilde{\theta }}(t)\).

For the second term in (4.19), from definition of \({\tilde{U}}\), we know

$$\begin{aligned} {\tilde{U}}(t + h, z, {\tilde{p}}(t + h)) - {\tilde{U}}(t, z, {\tilde{p}}) = \frac{\mathrm{d} {\tilde{\theta }}_{z}(t)}{\mathrm{d} t} h + o(h), \end{aligned}$$

and hence,

$$\begin{aligned} \lim _{h \rightarrow 0} \frac{{\tilde{U}}(t + h, z, {\tilde{p}}(t + h)) - {\tilde{U}}(t, z, {\tilde{p}}(t))}{h} = \frac{d {\tilde{\theta }}_{z}(t)}{\mathrm{d} t} = - H(z, \Delta ^{z} {\tilde{U}}) + \epsilon _{1}(t, z). \end{aligned}$$

Combining both the results from first and second term in (4.19), taking \(h \rightarrow 0\), we have

$$\begin{aligned} \frac{\partial {\tilde{U}}(t, z, {\tilde{p}}(t))}{\partial t}= & {} - H(z, \Delta ^{z} {\tilde{U}}) - D {\tilde{U}}(t, z, {\tilde{p}}(t))\\&\cdot \left( \sum _{y \in \Sigma } {\tilde{p}}_{y}(t) \lambda ^{*}(y, \Delta ^{y} {\tilde{U}}) + \epsilon _2(t)\right) + \epsilon _1(t, z), \end{aligned}$$

As \(\Vert D {\tilde{U}}(t, z, {\tilde{p}}(t)) \Vert \le C\) uniformly and \(\Vert \epsilon _2(t) \Vert \le \frac{1}{N}\), we know

$$\begin{aligned} | D {\tilde{U}}(t, z, {\tilde{p}}(t)) \cdot \epsilon _2(t) | \le \frac{C}{N}. \end{aligned}$$

Hence, defining \(\epsilon (t, z):= \epsilon _1(t, z) - D {\tilde{U}}(t, z, {\tilde{p}}(t)) \cdot \epsilon _2(t)\) concludes the proof. \(\square \)

Then the DNN approximation \(({\tilde{\theta }}, {\tilde{p}})\) is characterized by (4.18), while the true solution \((\theta , p)\) of the MFG is characterized by similar one, except \(\epsilon \) and \(\epsilon _{3}\) are 0. Although the two master equations are now backward PDE, it is still difficult to directly compare their solutions. Hence, we would like to approximate the two PDEs by two ODE systems on some discrete grids of \(P(\Sigma )\).

Define \(P^{N}(\Sigma ) = \{ (\frac{n_{1}}{N}, \ldots , \frac{n_{K}}{N}), \quad \sum _{z = 0}^{K} n_{z} = N, n_{z} \in {\mathbb {Z}}^{+}\}\). Then \(P^{N}(\Sigma )\) is a discrete grid of \(P(\Sigma )\). For any \(p^{N} \in P^{N}(\Sigma )\), define operators:

$$\begin{aligned} \begin{aligned} \alpha ^{N, i, j}(p^{N})&:= \left\{ \begin{array}{lr} p^{N} + \frac{1}{N}(\delta _{j} - \delta _{i}) &{} {p^{N}_{i} > 0, p^{N}_{j} < 1} \\ p^{N} &{} {else} \end{array} \right. \\ \Delta ^{N, y} {\tilde{U}}(t, z, p^{N})&:= ({\tilde{U}}(t, z, \alpha ^{N, y, 1}(p^{N})) - {\tilde{U}}(t, z, p^{N}), \ldots , {\tilde{U}}(t, z, \alpha ^{N, y, K}(p^{N})) - {\tilde{U}}(t, z, p^{N})) \\ \Delta ^{N, z, z} {\tilde{U}}(t, z, p^{N})&:= ({\tilde{U}}(t, 1, \alpha ^{N, z, 1}(p^{N})) - {\tilde{U}}(t, z, p^{N}), \ldots , {\tilde{U}}(t, K, \alpha ^{N, z, K}(p^{N})) - {\tilde{U}}(t, z, p^{N})). \end{aligned}\nonumber \\ \end{aligned}$$
(4.20)

With the discrete grid and discrete operators defined above, we next show in Proposition 4.9 that the master equation can be approximate by a backward ODE system.

Proposition 4.9

There exists \(N_0\) such that for \(N > N_{0}\), every \(p^{N} \in P^{N}(\Sigma )\) and \(z \in \Sigma \), \({\tilde{U}}\) solves

$$\begin{aligned} \begin{aligned}&\frac{\partial {\tilde{U}}}{\partial t}(t, z, p^{N}) = {\tilde{\epsilon }}^{N}(t, z, p^{N}) - H(z, \Delta ^{N, z, z} {\tilde{U}}(t, z, p^{N})) \\&\quad - \sum _{y \in \Sigma } \left( p^{N}_{y} - \frac{\mathbb {1}_{y = z}}{N}\right) \lambda ^{*}(y, \Delta ^{N, y, y} {\tilde{U}}(t, y, p^{N})) \cdot \Delta ^{N, y} {\tilde{U}}(t, z, p^{N}) \\ {\tilde{U}}(T, z, p^{N})&= G(z, p^{N}) + \epsilon _{3}(z), \end{aligned}\qquad \end{aligned}$$
(4.21)

where \({\tilde{\epsilon }}^{N} \in {\mathcal {C}}^{0}([0, T] \times \Sigma \times P^{N}(\Sigma ))\), \(\Vert {\tilde{\epsilon }}^{N} \Vert \le \frac{C}{N}\).

Proof

From Theorem 4.8, there exists constant \(N_0\), such that when \(N > N_0\) and (4.3), \({\tilde{U}}\) satisfies (4.18) when taking value on point \((t, z, p^{N})\).

$$\begin{aligned} \frac{\partial {\tilde{U}}(t, z, p^{N})}{\partial t} = - H(z, \Delta ^{z} {\tilde{U}}) - \sum _{y \in \Sigma } p^{N}_{y} \lambda ^{*}(y, \Delta ^{y} {\tilde{U}}) \cdot D {\tilde{U}}(t, z, p^{N}) + \epsilon (t, z). \end{aligned}$$

It looks similar to (4.21), except for the discrete operator \(\Delta ^{N, y}\) and the differential operator \(D^{y}\). Hence, we next compare the two operators similar to [9, Proposition 3]. We first discuss the first component \(\delta _{1} - \delta _{y}\) of \(\Delta ^{N, y} {\tilde{U}}(t, z, p^{N})\) defined in (4.20),

$$\begin{aligned} \begin{aligned}&{\tilde{U}}(t, z, p^{N} + \frac{1}{N}(\delta _{1} - \delta _{y})) - {\tilde{U}}(t, z, p^{N}) = \int _{0}^{\frac{1}{N}} [D^{y} {\tilde{U}}(t, z, p^{N} + s(\delta _{1} - \delta _{y}))]_{1} \mathrm{d} s \\&\quad = [D^{y} {\tilde{U}}(t, z, p^{N})]_{1} + \int _{0}^{\frac{1}{N}} ([D^{y} {\tilde{U}}(t, z, p^{N} + s(\delta _{1} - \delta _{y}))]_{1} - [D^{y} {\tilde{U}}(t, z, p^{N})]_{1}) \mathrm{d} s \\&\quad = [D^{y} {\tilde{U}}(t, z, p^{N})]_{1} + O(\frac{1}{N^2}), \end{aligned} \end{aligned}$$

where the last equality is derived by the Lipschitz continuity in \(p^{N} \in P(\Sigma )\) of \(D^{y} {\tilde{U}}\). As above can be applied to every component in \(\Delta ^{N, y} {\tilde{U}}(t, z, p^{N})\), we conclude that there exists \(N_0\) such that for \(N > N_{0}\),

$$\begin{aligned} \Delta ^{N, y} {\tilde{U}}(t, z, p^{N}) = D^{y} {\tilde{U}}(t, z, p^{N}) + \epsilon ^{N, y}(t, z, p^{N}), \end{aligned}$$

where \(\epsilon ^{N, y} \in {\mathcal {C}}^{0}([0, T] \times \Sigma \times P^{N}(\Sigma ); {\mathbb {R}}^{K})\), \(\Vert \epsilon ^{N, y} \Vert \le \frac{C}{N^2}\).

Hence, we have

$$\begin{aligned} \begin{aligned} \frac{\partial {\tilde{U}}}{\partial t}(t, z, p^{N})&= - \sum _{y \in \Sigma } \left( p^{N}_{y} - \frac{\mathbb {1}_{y = z}}{N}\right) \lambda ^{*}(y, \Delta ^{N, y, y} {\tilde{U}}(t, y, p^{N})) \cdot \Delta ^{N, y} {\tilde{U}}(t, z, p^{N}) \\&\quad - H(z, \Delta ^{N, z, z} {\tilde{U}}(t, z, p^{N})) + \sum _{i = 1}^{4} e_{i}(t, z), \end{aligned} \end{aligned}$$

where

$$\begin{aligned} \begin{aligned} e_{1}(t, z)&:= H(z, \Delta ^{N, z, z} {\tilde{U}}(t, z, p^{N})) - H(z, \Delta ^{z} {\tilde{U}}) \\ e_{2}(t, z)&:= \sum _{y \in \Sigma } p^{N}_{y} \Delta ^{N, y} {\tilde{U}}(t, z, p^{N}) \cdot (\lambda ^{*}(y, \Delta ^{N, y, y} {\tilde{U}}(t, y, p^{N})) - \lambda ^{*}(y, \Delta ^{y} {\tilde{U}})) \\ e_{3}(t, z)&:= \sum _{y \in \Sigma } p^{N}_{y} (\Delta ^{N, y} {\tilde{U}}(t, z, p^{N}) - D {\tilde{U}}(t, z, p^{N})) \cdot \lambda ^{*}(y, \Delta ^{y} {\tilde{U}}) \\ e_{4}(t, z)&:= - \frac{\mathbb {1}_{y = z}}{N} \lambda ^{*}(y, \Delta ^{N, y, y} {\tilde{U}}(t, y, p^{N})) \cdot \Delta ^{N, y} {\tilde{U}}(t, z, p^{N}) + \epsilon (t, z). \end{aligned} \end{aligned}$$

From the Lipschitz continuity of H and \(\lambda ^{*}\), as well as that \({\tilde{U}}\) is bounded, there exists constant C such that

$$\begin{aligned} \begin{aligned} |e_{1}(t, z)|&\le C \Vert \Delta ^{N, z, z} {\tilde{U}}(t, z, p^{N}) - \Delta ^{z} {\tilde{U}} \Vert \\ |e_{2}(t, z)|&\le C \max _{z \in \Sigma } \Vert \Delta ^{N, z, z} {\tilde{U}}(t, z, p^{N}) - \Delta ^{z} {\tilde{U}} \Vert . \end{aligned} \end{aligned}$$

From Proposition 4.2, we know there exists constant C such that

$$\begin{aligned} |e_{1}(t, z)| + |e_{2}(t, z)| \le \frac{C}{2 N}. \end{aligned}$$

From Lemma 4.4 and \(\sum _{z \in \Sigma } \lambda ^{*}_{z}(y, \Delta ^{y} {\tilde{U}}) = 0\) for every \(y \in \Sigma \), we have

$$\begin{aligned} D^{y} {\tilde{U}}(t, z, p^{N})) \cdot \lambda ^{*}(y, \Delta ^{y} {\tilde{U}}) = D {\tilde{U}}(t, z, p^{N})) \cdot \lambda ^{*}(y, \Delta ^{y} {\tilde{U}}). \end{aligned}$$

It follows that

$$\begin{aligned} e_{3}(t, z) = \sum _{y \in \Sigma } p^{N}_{y} (\Delta ^{N, y} {\tilde{U}}(t, z, p^{N}) - D^{y} {\tilde{U}}(t, z, p^{N})) \cdot \lambda ^{*}(y, \Delta ^{y} {\tilde{U}}). \end{aligned}$$

From the boundedness of \(\lambda ^{*}\) and \(\epsilon \), there is constant C such that

$$\begin{aligned} \begin{aligned} |e_{3}(t, z)|&\le \sum _{y \in \Sigma } p^{N}_{y} (\sum _{i \in \Sigma } \frac{C}{N^2}) \le \frac{C}{4 N} \\ |e_{4}(t, z)|&\le \frac{C}{4 N}. \end{aligned} \end{aligned}$$

We can conclude the proof by defining

$$\begin{aligned} {\tilde{\epsilon }}^{N}(t, z, p^{N}):= \sum _{i = 1}^{4} e_{i}(t, z) < \frac{C}{N}. \end{aligned}$$

\(\square \)

Finally we can proceed to the proof of our main result. The main idea of the proof is to characterize both the DNN approximation \(({\tilde{\theta }}, {\tilde{p}})\) and the true solution \((\theta , p)\) by their corresponding master equations, which are further approximated by two backward ODE systems on certain discrete grid points. Then the error of the two can be directly estimated on these grid points using Gronwall inequality. As both \(({\tilde{\theta }}, {\tilde{p}})\) and \((\theta , p)\) are uniformly Lipschitz continuous with respect to their initial conditions, the error between the grid points can also be estimated.

Completion of proof of Theorem 2.6

Since ODE system (2.2) admits a solution to any initial value \(p_{0} \in P(\Sigma )\), we can define

$$\begin{aligned} U(t, z, p):= \theta (t, z). \end{aligned}$$

From Cecchin and Pelino [9], U satisfy the master equation for any \(p \in P(\Sigma )\):

$$\begin{aligned} \begin{aligned}&\frac{\partial U(t, z, p)}{\partial t} + H(z, \Delta ^{z} U) + \sum _{y \in \Sigma } p_{y} D {\tilde{U}}(t, z, p) \cdot \lambda ^{*}(y, \Delta ^{y} U) = 0, \\&\quad U(T, z, p) = G(z, p). \end{aligned} \end{aligned}$$

Similar to the proof of Proposition 4.9, we know \(U(t, z, p^{N})\) satisfy ODE:

$$\begin{aligned} \begin{aligned} \frac{\partial U}{\partial t}(t, z, p^{N})&= \epsilon ^{N}(t, z, p^{N}) - H(z, \Delta ^{N, z, z} U(t, z, p^{N})) \\&\quad - \sum _{y \in \Sigma } \left( p^{N}_{y} - \frac{\mathbb {1}_{y = z}}{N}\right) \lambda ^{*}(y, \Delta ^{N, y, y} U(t, y, p^{N})) \cdot \Delta ^{N, y} U(t, z, p^{N}), \\ U(T, z, p^{N})&= G(z, p^{N}), \end{aligned}\nonumber \\ \end{aligned}$$
(4.22)

where \(\epsilon ^{N} \in {\mathcal {C}}^{0}([0, T] \times \Sigma \times P^{N}(\Sigma ))\), \(\Vert \epsilon ^{N} \Vert \le \frac{C}{N}\). From (4.21) and (4.22), there exists \(N_0\) such that when \(N > N_0\) and (4.3) holds, we have

$$\begin{aligned} \begin{aligned} {\tilde{U}}(t, z, p^{N}) - U(t, z, p^{N})&= \epsilon _{3}(z) + e + A + \sum _{y \in \Sigma } \left( p^{N}_{y} - \frac{\mathbb {1}_{y = z}}{N}\right) ( B_{y} + C_{y}), \\ e&:= \int _{t}^{T} ({\tilde{\epsilon }}^{N}(s, z, p^{N}) - \epsilon ^{N}(s, z, p^{N})) \mathrm{d} s, \\ A&:= \int _{t}^{T} (H(z, \Delta ^{N, z, z} {\tilde{U}}(s, z, p^{N})) {-} H(z, \Delta ^{N, z, z} U(s, z, p^{N}))) \mathrm{d} s, \\ B_{y}&:= \int _{t}^{T} [\lambda ^{*}(y, \Delta ^{N, y, y} {\tilde{U}}(s, y, p^{N}))\\&\quad - \lambda ^{*}(y, \Delta ^{N, y, y} U(s, y, p^{N}))] \cdot \Delta ^{N, y} {\tilde{U}}(s, z, p^{N}) \mathrm{d} s, \\ C_{y}&:= \int _{t}^{T} \lambda ^{*}(y, \Delta ^{N, y, y} U(s, y, p^{N}))\\&\quad \cdot [\Delta ^{N, y} {\tilde{U}}(s, z, p^{N}) - \Delta ^{N, y} U(s, z, p^{N})] \mathrm{d} s. \end{aligned} \end{aligned}$$

From Proposition 4.1, both U and \({\tilde{U}}\) are bounded. Hence, H and \(\lambda ^{*}\) are Lipschitz continuous w.r.t their second variable. Define

$$\begin{aligned} d(t):= \max _{z \in \Sigma , p^{N} \in P^{N}(\Sigma )} | {\tilde{U}}(t, z, p^{N}) - U(t, z, p^{N}) |. \end{aligned}$$

There exists a constant C such that

$$\begin{aligned} |A| + |B_{y}| + |C_{y}| \le C \int _{t}^{T} d(s) \mathrm{d} s. \end{aligned}$$

As \(p_{N} \in P^{N}(\Sigma )\), there exists constant C such that

$$\begin{aligned} d(t)\le & {} \max _{z \in \Sigma , p^{N} \in P^{N}(\Sigma )} \left\{ \int _{t}^{T} | {\tilde{\epsilon }}^{N}(s, z, p^{N}) - \epsilon ^{N}(s, z, p^{N}) | \mathrm{d} s + \epsilon _{3}(z) \right\} \\&+ C \int _{t}^{T} d(s) \mathrm{d} s \le \frac{C}{N} + C \int _{t}^{T} d(s) \mathrm{d} s. \end{aligned}$$

By applying Gronwall inequality, there is constant C such that for every \(t \in [0, T]\), \(z \in \Sigma \) and \(p^{N} \in P^{N}(\Sigma )\) we have

$$\begin{aligned} | {\tilde{U}}(t, z, p^{N}) - U(t, z, p^{N}) | \le \frac{C}{N}. \end{aligned}$$
(4.23)

For \(N > 2 N_{0}\) where \(N_{0}\) is defined in Proposition 4.9 above, if \({\tilde{p}} \in B(P(\Sigma ), \frac{1}{N})\), there is \(p \in P(\Sigma )\) such that \({\tilde{p}} = p + \epsilon _{4}\) and \(\epsilon _{4} < \frac{1}{N}\). And there exists \(p^{N} \in P^{N}(\Sigma )\) such that

$$\begin{aligned} \begin{aligned} \Vert p - p^{N} \Vert&< \frac{1}{N} \\ \Vert {\tilde{p}} - p^{N} \Vert&\le \Vert {\tilde{p}} - p \Vert + \Vert p - p^{N} \Vert< \frac{2}{N} < \frac{1}{N_{0}}. \end{aligned} \end{aligned}$$

From Proposition 4.1, \({\tilde{U}}(t, z, {\tilde{p}})\) is well defined, and from Proposition 4.2, there exists constant C independent of N and p, such that for every \(t \in [0, T]\) and \(z \in \Sigma \),

$$\begin{aligned} | U(t, z, p) - U(t, z, p^{N}) | \le \frac{C}{N}, \quad | {\tilde{U}}(t, z, {\tilde{p}}) - {\tilde{U}}(t, z, p^{N}) | \le \frac{2 C}{N}. \end{aligned}$$

Combining the above inequalities with (4.23), we have \(| {\tilde{U}}(t, z, {\tilde{p}}) - U(t, z, p) | \le \frac{C}{N}\) for some constant C independent to N and p, which is equivalent to

$$\begin{aligned} \Vert {\tilde{\theta }} - \theta \Vert \le \frac{C}{N}. \end{aligned}$$

By using the uniform boundedness and Lipschitz continuity of \(\lambda ^{*}\), we can prove p and \({\tilde{p}}\) are Lipschitz continuous w.r.t \(\theta \) and \({\tilde{\theta }}\), respectively, with the help of Gronwall inequality and technique similar to the proof of Proposition 4.2. Note also that the Lipschitz coefficient only depends on the uniform bound and Lipschitz continuous coefficient of \(\lambda ^{*}\), which again only depend on the preliminary M given in Proposition 4.1. Hence, we know there exists a uniform constant C independent on N such that

$$\begin{aligned} \Vert {\tilde{p}} - p \Vert \le \frac{C}{N}. \end{aligned}$$

This concludes the proof. \(\square \)

4.4 Proof of Proposition 3.1

Proof

The proof is divided to several steps to prove the conditions for H and G, respectively.

Step 1: proof of \(\lambda ^{*}\) for Assumption 2.1.

Let us first write out the Hamilton operator H for \(G_{c, R}\). Define \({\mathcal {A}}\) as the admissible control set for all \(\lambda \) that satisfy (3.4). Define \(\delta ^{a}:= \Lambda ^{-1}(\lambda _{\beta ^{a}(z)}(t, z))\) and \(\delta ^{b}:= \Lambda ^{-1}(\lambda _{\beta ^{b}(z)}(t, z))\), then we have

$$\begin{aligned} \begin{aligned} H(z, \mu )&= \sup _{\lambda \in {\mathcal {A}}} \left\{ g(\Lambda ^{-1}(\lambda _{\beta ^{a}(z)}(t, z)), \mu _{\beta ^{a}(z)}) + g(\Lambda ^{-1}(\lambda _{\beta ^{b}(z)}(t, z)), \mu _{\beta ^{b}(z)}) - \frac{1}{2} \gamma \sigma ^{2} Z^{-1}_{1}(z)^2 \right\} \\&= \sup _{\delta ^{a} \in {\mathbb {R}}} \left\{ g(\delta ^{a}, \mu _{\beta ^{a}(z)}) \right\} + \sup _{\delta ^{b} \in {\mathbb {R}}} \left\{ g(\delta ^{b}, \mu _{\beta ^{b}(z)}) \right\} - \frac{1}{2} \gamma \sigma ^{2} Z^{-1}_{1}(z)^2, \end{aligned} \end{aligned}$$

where \( g(\delta , \mu ):= \Lambda (\delta ) (\delta - c + \mu )\). From (3.1) and according to the proof of Lemma 3.1 in Guéant [15], \(\zeta (\mu ):= \sup _{\delta } \{ g(\delta , \mu ) \}\) is increasing w.r.t \(\mu \). Moreover, the optimal \(\delta ^{*}\) exists and is unique, which is a continuously differentiable function of \(\mu \).

Step 2: proof of H satisfying Assumption 2.1.

We only need to prove that the second-order derivative \(\zeta ''(\mu )\) is positive. From the proof of Lemma 3.1 in Guéant [15], \(\zeta \) is \({\mathcal {C}}^{2}\), \(\zeta '(\mu ) = \Lambda (\delta ^{*})\), and \(\delta ^{*}\) is strictly decreasing w.r.t \(\mu \). Hence, \(\Lambda (\delta ^{*})\) is strictly increasing w.r.t \(\mu \), which implies \(\zeta ''(\mu ) > 0\). Then there exists constant C such that \(\zeta ''(\mu ) > C\) when \(\mu \) is bounded.

Step 3: proof of G satisfying Assumption 2.1.

From (3.5), the differentiability and (2.5) of G are trivial. We then only need to prove (2.6). Note that

$$\begin{aligned} \begin{aligned}&\sum _{z \in \Sigma } (G(z, p_{z}) - G(z, {\bar{p}}_z))(p_{z} - {\bar{p}}_z)\\&\quad = \sum _{v = 0}^{v_{\max }} \sum _{q = -Q}^{Q} \sum _{i = v}^{v_{\max }} ({\bar{p}}(T, i) - p(T, i))(p(T, q, v) - {\bar{p}}(T, q, v)) R \\&\quad =\sum _{v = 0}^{v_{\max }} \sum _{i = v}^{v_{\max }} ({\bar{p}}(T, i) - p(T, i))(p(T, v) - {\bar{p}}(T, v)) R \\&\quad = -{R\over 2}\left( \bigg (\sum _{v = 0}^{v_{\max }} \big (p(T, v) - {\bar{p}}(T, v)\big )\bigg )^2 + \sum _{v = 0}^{v_{\max }} \big (p(T, v) - {\bar{p}}(T, v)\big )^2 \right) , \end{aligned} \end{aligned}$$

which is non-positive, this concludes the proof. \(\square \)

5 Conclusions

In this paper, we have solved the finite state mean field games problem by the deep neural network method. By transforming the fully coupled FBODE system to the master equation, we have proved that the error between the true solution and the approximate solution is linear to the square root of DNN loss function. We have also applied the DNN method to solve the optimal market making problem with terminal rank-based trading volume reward which is shown to perform better in liquidity provision and trading cost reduction than the linear trading volume reward. There remain many open questions such as general heterogeneous interaction structure and infinite state MFG. We leave these and other questions for future research.