Deep Neural Network Solution for Finite State Mean Field Game with Error Estimation

We discuss the numerical solution to a class of continuous time finite state mean field games. We apply the deep neural network (DNN) approach to solving the fully coupled forward and backward ordinary differential equation system that characterizes the equilibrium value function and probability measure of the finite state mean field game. We prove that the error between the true solution and the approximate solution is linear to the square root of DNN loss function. We give an example of applying the DNN method to solve the optimal market making problem with terminal rank-based trading volume reward.


Introduction
There has been extensive research in mean field game (MFG) since its introduction by Lasry and Lions [21] and Huang et al. [18] as a limit of symmetric nonzero sum noncooperative N -player dynamic games when the number of players tends to infinite, see Carmona and Delarue [4] and Guéant et al. [16] for excellent introduction to MFG. The main research focus nowadays is in two areas. One is to study the existence and uniqueness of MFG equilibrium with the partial differential equation (PDE) system that characterizes the equilibrium value function and mean field state, see Lasry and Lions [21]. The other is to analyze the convergence from the stochastic differential game among large but finite number of players to the MFG limit when the number of players tends to infinity and the numerical approximation for MFG, see Achdou et al. [1] and Achdou and Capuzzo-Dolcetta [2] The MFG theory has been applied to many modeling problems in economics, politics, social welfare and other areas, see, for example, Guéant ([14] and Lasry et al. [22]. In this paper, we focus on MFG with finite time horizon and continuous time state dynamic of each agent taking values in a finite set under fully symmetric payoff and complete information. Gomes et al. [12] first study finite state MFG and prove the existence and uniqueness of Nash equilibrium with the coupled forward and backward ordinary differential equation (FBODE) system and show the convergence of N -player game's Nash equilibrium to that of the limiting MFG when N tends to infinite and time horizon is small. [8] analyze the MFG with a probabilistic approach. Carmona and Wang [7] tackle both the mean field of states and that of controls and prove the existence of equilibrium with backward stochastic differential equation and the uniqueness of equilibrium when the Hamiltonian does not depend on mean field controls. Carmona and Wang [6] analyze finite state MFG between one major player and infinite number of minor players. Cardaliaguet et al. [3] make the breakthrough in convergence analysis for a diffusion model with common noise and characterize the equilibrium with the master equation and its regular solution. Cecchin and Pelino [9] apply the master equation to obtain the convergence of feedback Nash equilibrium in the finite state space, which extends the convergence result in Gomes et al. [12] without requiring the time horizon being sufficiently small.
Despite the progress in existence, uniqueness and convergence for Nash equilibrium of the finite state MFG, there is still a considerable obstacle to approximate the N -player game with a simpler MFG. One main difficulty is that the Nash equilibrium of finite state MFG is characterized by a FBODE system with both the initial and terminal conditions, which in generally has no analytical solution and is difficult to solve numerically. One commonly used method for solving FBODE is the shooting method but it tends to work better when the dimension is low and the boundary condition is simple. The shooting method fails to work in our case. Gomes and Saude [13] propose a numerical scheme to solve finite state MFG under some monotone conditions that do not hold in many applications.
There has been active research in recent years on using the deep neural network (DNN) to solve PDEs and ODEs with different boundary conditions, see, e.g., Lagaris et al. [19], Malek and Beidokhti [27], Lee and Kang [24], Lagaris et al. [20]. Given that the feature of FBODE system is similar to that of a PDE, we are motivated to use DNN to numerically solve the FBODE system in the finite state MFG problem. Sirignano and Spiliopoulos [30] propose DGM (deep Galerkin method) to solve high-dimensional PDEs with a mesh-free DNN and show the convergence of approximate solutions to the true solution under some conditions, which is similar in spirit to the Galerkin method except that the solution is approximated by a neural network instead of a linear combination of basis functions. [5] provide a comprehensive literature review on deep learning method applied on MFG. Many papers apply the DGM approach to numerically solve high-dimensional PDEs derived from different types of MFG (see, e.g., Han et al. [17], Ruthotto et al. [29]), while others apply the DNN to solve the corresponding BSDEs (see, e.g., Fouque and Zhang [11], Lauriere [23]). Most of these papers only provide numerical results without rigorous proof for the numerical solutions. There is no guarantee that the neural network approximation can converge to the true solution, and the approximation may not be accurate enough albeit the loss function is already small as there is no relation established between the loss function and the error between approximate and true solutions. Li et al. [25] and Li et al. [26] prove the strong and uniform convergence of the DGM approach. Parallel to this paper, Mishra and Molinaro [28] focus on error estimation of physical informed neural network (PINN), which is the other name of DGM. For PDEs satisfying certain conditions, they provide the abstract framework to relate the loss function of neural network to the error between the true solution and the approximate solution generated by neural network and prove the error bound for several specific types of PDEs. Their assumptions on the regularity of PDEs are strong (see Assumption 2.1 in Mishra and Molinaro [28]) and are not necessarily satisfied by the FBODE system derived in this paper. To our best knowledge, there is no existing literature addressing the error between the approximation and the true solution via the loss function for FBODEs derived from continuous time finite state MFG problems. We provide the error bound estimation to fill the gap.
The main result of the paper, Theorem 2.6, states that the error between the true solution (θ, p) of the FBODE system and the DNN approximate solution (θ,p) is linear to the square root of the loss function in the DNN method, which provides the magnitude of the error bound for the DNN approximation as well as the convergence result. To bridge θ andθ , we use the master equation for θ in Cecchin and Pelino [9] and prove thatθ satisfies a similar equation. Cecchin and Pelino [9] prove the equilibrium of finite players finite state game converges to that of the corresponding MFG with the former satisfying a backward ODE while the latter a FBODE which is equivalent to a backward PDE (master equation) and can be compared with the backward ODE system. In contrast, we want to estimate the error between the true solution and the DNN approximation to MFG with both satisfying FBODE systems and the one for the DNN approximation having extra error terms compared with the one for the true solution. We leverage the master equations to connect the two FBODE systems and do error analysis. Due to perturbation terms in the FBODE system, we need to address the issue of negativep, prohibited in [9, Theorem 6] and find a new way to bypass that difficulty.
As an application, we apply the DNN to numerically solve an optimal market making problem with the same framework as that of Guéant [15], except that the terminal reward depends on the trading volume ranking that is determined in a so-called market maker incentive program contract designed by the exchange to encourage market maker to provide more liquidity (i.e., trading volume). El Euch et al. [10] discuss the market maker incentive contract and analyze how exchange should optimally decide the commission fee schedule for market makers. The trading volume ranking-related reward, commonly seen in market incentive programs from various exchanges, is not considered in El Euch et al. [10]. In this paper, we use a finite state MFG to model the competition between market makers in the presence of the trading volume ranking reward and solve the Nash equilibrium using the DNN approach. The results may help exchanges design better market incentive program by better understanding market makers' behavior responding to the contract.
The rest of the paper is organized as follows. In Sect. 2, we formulate the finite state MFG model and state the main result of the paper, Theorem 2.6, on the error estimation of DNN approximate solution. In Sect. 3, we discuss an application in the optimal market making with rank-based trading volume reward. Section 4 contains the proofs of Theorems 2.4, 2.5, 2.6 and Proposition 3.1. Section 5 concludes.

Model and Main Results
Define a finite state MFG in continuous time similar to the one in Cecchin and Pelino [9]. The finite state space is = {1, . . . , K }, and the reference game player's state is denoted by Z , which is a Markov chain. The player at state z can decide the switching intensities with feedback controls λ : [0, T ] × → (R + ) K from to (R + ) K . The dynamic for the player is given by where N z t is a Poisson process with controlled intensity λ z (t, Z t ). If there are some states that state z cannot access, then we can simply set the corresponding components in the intensity vector to be zero. The probability measure on mean field of states is a function p : Start at time t ∈ [0, T ], given any probability measure p on the mean field of state, game player with controlled state process Z t that start at state z wants to optimize is the conditional expectation given the initial state Z t = z at time t, F the running payoff, G the terminal payoff and A the admissible control set containing all measurable functions λ : [0, T ] × → (R + ) K . We assume for any z ∈ , F(z, λ) is an upper bounded function which does not depend on λ z , the zth component of λ.
. According to Cecchin and Pelino [9], the equilibrium value function θ and the mean field probability p satisfy the following FBODE system: where z is the difference operator, defined as and H : × R K → R is the Hamiltonian function, defined for any μ ∈ R K satisfying μ z = 0 as , which can be any value since in the proof of our main result we always let μ z = [ z θ(t)] z = θ z (t) − θ z (t) = 0 and F(z, λ) is independent to λ z . For notation convenience, we define 3) The backward equation in (2.2) comes from the optimization problem (2.1) given p and the forward equations comes from the consistent condition for probability measure p on mean field of states when everyone follows the equilibrium strategy. According to [12,Proposition 1], if H is differentiable and λ * (z, μ) is positive except the zth element, for y = z, we have where λ * y (z, μ) is the intensity from state z to state y and [D μ H (z, μ)] y the yth component of the gradient D μ H (z, μ). In the proofs of main results, we always have μ z = 0 when we use H (z, μ), D μ H (z, μ) or D 2 μμ H (z, μ), the Hessian matrix. For proof simplicity, with a little abuse of notation, we follow Cecchin and Pelino [9] to define artificially that and the feedback control λ(t, z) = λ * (z, z θ(t)) under the equilibrium.
We next assume H , G and λ * satisfy the following assumptions. H are locally Lipschitz in μ, and the second derivatives is bounded away from 0 on bounded set, i.e., there exists a constant C such that for any μ in that bounded set satisfying μ z = 0, we have Moreover, assume that G is differentiable, and there exists a constant C such that when p is bounded, its directional derivative in any vector w satisfies and that G is decreasing in p, i.e., for all p,p ∈ R K , Then from [12,Proposition 2], solution to (2.2) has a prior bound C G H as long as H satisfies Remark 2.2 and G is bounded for all p(T ) in compact set [0, 1] K . C G H is defined as, (2.8) We summarize [9, Theorem 1], [12,Theorem 2], and state the following theorem without proof.

Theorem 2.3
Under Assumption 2.1, ODE system (2.2) has a unique solution (θ, p) for any initial value p(t 0 ) ∈ P( ) and the MFG has a unique Nash equilibrium point.
We assume in the rest of the paper that Assumption 2.1 holds, which guarantees the existence, uniqueness and convergence of the finite state MFG. However, to find the equilibrium, we need to solve (2.2), which generally does not have analytical solution. As (2.2) is a FBODE system, we cannot solve it numerically by simple discretization. We apply the deep neural network approach in Sirignano and Spiliopoulos [30] to numerically solve (2.2).
Define two sets of neural network functions as where ν : R → R is the triple continuously differentiable activation function, and two strictly increasing triple continuously differentiable activation functions ν 1 , ν 2 : R → R have twice continuously differentiable inverse functions ν −1 1 and ν −1 2 . They satisfy sup |ν 1 | = C G H + e, inf ν 2 = −e, sup ν 2 = 1 + e, (2.9) where e is any constant that is small. In the numerical tests of this paper, we use hyperbolic tangent functions tanh for activation functions, in particular, ν 1 (x) = a tanh x + b for some constants a, b and ν 2 (x) = (tanh x + 1)/2. We approximate the solution (θ, p) to (2.2) numerically by (θ (N ) ,p (N ) ) which satisfies (2.10) By considering both the differential operators and boundary conditions in (2.2), we define the loss function for any approximate solution (θ,p) as ≤0} and B θ , B p can be any constants that satisfy where constants C θ G H and C pG H are from (2.8). Then it follows Both the integral term and maximum term in (2.11) can be calculated via Monte Carlo simulation. Practically, we use similar approach as in Sirignano and Spiliopoulos [30] to calculate these two to increase the robustness of training. Given N , the structure of the neural network has been determined. We train the network by finding the optimal values of (N ) ) such that they minimize . For the true solution (θ, p), (θ, p) = 0. Since (θ, p) exists and is unique, has unique minimal point (θ, p) = 0. We provide the convergence result Theorem 2.4 similar to the Theorem 7.1 in Sirignano and Spiliopoulos [30].

Theorem 2.4
There exists a sequence of (θ (N ) ,p (N ) ) defined in (2.10) such that When the Loss function is smaller than certain value, because of the uniform bounds on the first and second derivatives of the approximation function, the maximum error on the time interval is also smaller than certain value.

Theorem 2.5
There exists constant ε 0 , such that for any ε < ε 0 , if < ε then there exists constant C such that for all t ∈ [t 0 , T ] and z ∈ , we have where · is the infinity norm.
Note that the constant C in Theorem 2.5 depends on the FBODE system and the bound of its true solution, but is independent of the DNN structure. Theorem 2.5 is an algorithm independent result. We now state our main result on the error estimation for numerical solution to the finite state mean field game.
Note that the constant B depends on the FBODE system and the bound of its true solution, but is independent of the DNN structure, which implies that Theorem 2.6 is an algorithm independent result.
Combining Theorems 2.4 and 2.6, we immediately have the following result.

Remark 2.8
Although we only prove the convergence and error estimation results for a twolayer neural network structure characterized by n (ν 1 , ν) and P n (ν 2 , ν), the idea and the proof can be easily adapted to more sophisticated neural network models (more layers, LSTM, etc.) as they share similar structures.

Application: Optimal Market Making with Rank-Based Reward
The model setting is similar to Guéant [15], except that exchange provides incentive reward for market making. The terminal payoffs of market makers depend on their trading volumes and rankings in the market. The optimization problems of different market makers are coupled.
It is in general difficult to solve a finite players game due to high dimensionality, but MFG provides a good approximation, see [8], [9] and [12]. We therefore use MFG as a proxy to solve the optimal market marking problem. Consider a continuum family of market makers mm who keep quoting bid/ask limit orders. Select one of them as a reference market maker. Assume asset price S t follows a Brownian motion with initial value S, where W t is a standard Brownian adapted to the natural filtration {F W t } t∈R + and σ the volatility of the stock. Assume δ a t and δ b t are ask/bid spreads, which are controls determined by the reference market maker. Denote by N a t and N b t the jump processes for buy/sell market order arrivals to the reference market maker, with intensities (δ a t ) and (δ b t ), respectively. Assume : R → R has continuous inverse function, and is decreasing, continuously differentiable satisfying: The reference market maker has state variables (X t , q t , v t ) in which X t is cash account, q t inventory, v t accumulated trading volume, with initial value (x, q, v). We assume q t can only take values in a finite set Q = {−Q, · · · , Q} and v t can only take values in a finite set V := {0, . . . , v max } and any trading volume above v max is not counted in the reward calculation. Denote by I b := 1 q+1∈Q and I a := 1 q−1∈Q the indicators of market maker's buying and selling capabilities.
The dynamic of X t is given by that of q t by and that of v t by Denote by p(t, q t , v t ) the probability measure on the mean field of discrete states (q t , v t ) for all market makers. The reference market maker wants to solve the following optimization problem: where X T + q T S T is the cash value at time T , l(|q T |) the terminal inventory holding penalty with l an increasing convex function on R + with l(0) = 0, γ σ 2 T 0 q 2 t dt the accumulated running inventory holding penalty with γ a positive constant representing the risk adverse level, and R(v T ) the cash reward given by the exchange as incentive for market markers to increase trading volume v T . We consider the rank-based trading volume reward, defined by is the percentage of market makers that the reference market maker exceeds in trading volumes and R a positive constant representing the maximum reward set by the exchange.
Using the martingale property, (3.2) can be reduced to We assume market maker takes closed loop feedback control, i.e., when market maker has state (q, v), Since the only relevant states are q t and v t that both take values in finite sets, the problem can be reduced to a continuous time finite state MFG discussed in Cecchin and Pelino [9] by reformulating some notations as follows. Define K := (2Q + 1)(v max + 1) and := {1, · · · , K }. There is a one-to-one mapping Z : Q×V → : For every (q, v) ∈ Q×V, there exists z ∈ such that The state (q, v) is then reformulated by state z. The value function θ and probability measure on mean field of state p are reformulated as θ, p : β a (z) and β b (z) are defined as the two accessible states from state z, The optimal market making problem is now reduced to a continuous time finite state MFG discussed in Sect. 2 of this paper. Denote the game as G R . We have the following result. According to Cecchin and Pelino [9], the Nash equilibrium of MFG G R and that of finite players game exist and are unique, and the game with N players converges to the limiting MFG case in the order O( 1 N ). We can numerically solve the FBODE system corresponding to the finite state MFG in (3.5) and find the equilibrium value function and probability of mean field by the DNN technique.
We next do some numerical tests. We use a LSTM (long short-term memory) neural network to approximate the solution (θ, p). Denote the function constructed by LSTM neural network as (θ(t, β),p(t, β)), where β is the parameters set for neural network, designed by the following steps: Layer 0 is the input t ∈ [0, T ] and layer k with output h k is designed as follows: with the initial values c 0 = h 0 = 0, where the operator • is the element-wise product, functions σ some scaled tanh activation functions (hence satisfying all assumptions in our main results matrices and bias vector parameters which need to be learned during training, and h the number of hidden units. LTSM network is an advanced version of a traditional neural network and provides more accurate approximation for complicated functions. For our model, this specific structure performs better than that of traditional neural network.
We use a LSTM network with 3 layers and 32 nodes per layer. The network is trained by stochastic gradient approach with mesh-free randomly sampling points in [0, T ]. This randomness adds to the robustness of the network. The detailed training procedure is similar to that in Sirignano and Spiliopoulos [30].
There are two typical schemes of trading volume rewards in most of exchanges' incentive programs. One is the rank-based trading volume reward as in (3.3), and the other is the linear trading volume reward, defined by Since R L (v T ) is independent of the mean field of state, the FBODE system for the value function and the probability of mean field of state is decoupled and can be solved numerically with a standard Euler scheme. We next do numerical tests and compare the value functions, optimal bid/ask spreads and probability distributions of trading volumes under three different trading volume reward schemes: 1. no trading volume reward (R = 0, benchmark case), 2. linear trading volume reward (R in (3.6)), and 3. rank-based trading volume reward (R in (3.3)). The rank-based reward introduces competition between market makers, whereas the  linear reward does not. The training result is satisfactory and the average loss is less than 0.003. Figures 1 and 2 show the value functions θ(t, q, v) for fixed q = 0, 1 and varied v with the 'Benchmark' representing no trading volume reward, 'v = 0 lin' path the linear reward and initial trading volume v = 0, and 'v = 0' the rank-based reward. It is clear that the introduction of market incentive R increases the value functions for market makers, and the higher the initial trading volume v, the higher the value function θ . Even for market makers with initial trading volume v = 0, the value functions are still higher than the benchmark one as they benefit from their potential market incentive gains, which explains the convergence of the curves for v = 0 to the benchmark one as t tends to T . The value functions for linear and rank awards are largely the same.

Proof of Theorem 2.4
Proof According to Theorem 2.3, there exists unique solution (θ, p) to ODE system (2.2), which is also the unique minimal point for such that is also bounded uniformly for t and z. Moreover, p(t) ∈ P( ) for any t ∈ [0, T ], and hence, is also bounded. From the assumption on ν 1 , ν 2 , we know It means θ z 's image is bounded and a strict subset of ν −1 1 's domain. It is similar for p z and ν −1 2 . Combining with the continuously differentiability of ν −1 1 and ν −1 According to the proof of Theorem 7.1 in Sirignano and Spiliopoulos [30], for any 0 < ε < C, there exists N > 0 and y z ∈ C N (ν) such that Hence, we have As y z (t) ∈ [−2C, 2C], there exists constant C 1 such that (ν 1 ) (y z (t)) is bounded by it uniformly. Moreover, we have Hence, we have The first inequality above comes from the boundedness and Lipschitz continuity of ν 1 , as well as the boundedness of (ν −1 1 ) (θ z (t)). Moreover, for second-order derivatives, we have To estimate the difference of second-order derivatives between approximation function and true function, we have and we have As y z (t) and d dt ν −1 1 (θ z (t)) are bounded from previous proof, and ν 1 is triple continuously differentiable function by definition, ν 1 (y z (t)), ν 1 (y z (t)), ( d dt ν −1 1 (θ z (t))) 2 and d 2 are also bounded. Moreover, d dt y z (t) ≤ d dt ν −1 1 (θ z (t)) +ε, and hence bounded. According to the Lipschitz continuity of ν 1 and ν 1 , as well as (4.1), we know there exists constants C 2 , such that By making transformation on ε in above proof, we know for any 0 < ε < C, there exists N > 0 and y z ∈ C N (ν) such that Hence, we know there exists N > 0 and y z ∈ C N (ν) such that Similarly, we know d dt ν 2 (y z (t)) ≤ C pG H < B p . Then we get Then from proof above we know for any 0 < ε < C, there exists N > 0 andθ (N ) ∈ N (ν 1 , ν) such that On the other hand, notice that any function f N ∈ˆ ν). For p and P n (ν 2 , ν), we can have similar argument. Hence, we conclude that for any 0 < ε < C, there exists N > 0 andθ (N ) ∈ N (ν 1 , ν), Then similar to the proof for Theorem 7.1 in Sirignano and Spiliopoulos [30], we know there exists a uniform constant M which only depends on boundedness of θ , λ * and Lipshcitz coefficient of λ * and H , such that (θ (N ) ,p (N ) ) ≤ Mε.
It concludes the proof.

Proof of Theorem 2.5
Proof Since for different t 0 , the following proof is the same, we assume t 0 = 0 for the ease of notation. We first focus on proving the first inequality in (2.12). Define As < ε, from the definition of we know d 2 θ z (t) dt 2 is uniform bounded on [0, T ]. Furthermore, H is Lipschitz continuous, and θ has bounded first-order derivative to t. Hence, e(t, z) is a Lipschitz continuous function on [0, T ]. Denote its Lipschitz coefficient as L. There existst ∈ [0, T ] such that For any t such that From Lipschitz continuity of e, we have It means From (4.2), we know L t ≤ |e(t,z)| 2 ; therefore, Combining the above two inequalities, we know for any t ∈ [t − t,t + t], We can have the following estimation. Without loss of generality, we can assume T −t ≥ T 2 (otherwise we can use the other side [t − t,t] as the limit of integration on the second inequality in the following), which implies that for any t ∈ [0, T ], There exists 0 such that for all < 0 , 4 L  Using the similar arguments, the third inequality in (2.12) can be proved. The second and fourth inequalities are trivial.

Proof of Theorem 2.6
The general structure of the proof is similar to those in Cecchin and Pelino [9] with one key difference: In Cecchin and Pelino [9], p satisfies a non-perturbed Kolmogorov forward equation and has initial value in P( ) with nonnegative components, whereasp satisfies a perturbed Kolmogorov forward equation (2.13) and its initial value is not necessarily in P( ) and may be negative, which makes some prior estimations in Cecchin and Pelino [9] not applicable for our case. We need to provide extra modifications by adding and subtracting an extra term M 1 such thatp(t) + M 1 is nonnegative, and need to modify every step in the proof to estimate the extra terms introduced by M 1 . For the completeness, we give the whole proof.
Note that the solution pair (θ,p) to (2.13) is determined only by initial time t 0 and initial valuep(t 0 ). We first prove thatθ is well defined (Proposition 4.5), continuous atp(t 0 ) (Proposition 4.2), and continuously differentiable atp(t 0 ) (Theorem 4.7) by discussing the linearized system (4.11), we then prove thatθ satisfies a PDE similar to the master equation in Cecchin and Pelino [9] (Theorem 4.8) and that the master equation on some discrete grids of P( ) can be approximated by a backward ODE with extra error terms (Proposition 4.9), and we finally estimate the difference between θ andθ by comparing the two backward ODE systems and conclude the proof.
Denote by x := max 1≤z≤K |x z |, the norm of x in R K and f : Due to the introduction of perturbation terms in ODE system (2.13), the existence and uniqueness of its solution can no longer be guaranteed for every initial valuep(t 0 ). However, under certain conditions on (2.13), we can have the existence and prior bound estimation of solution to (2.13).
We know functionθ(t) is bounded by constant C G (M) following a similar proof as [12, Consider two functionsp and p satisfying dp Integrating both side and subtractingp and p, we get We next prove that under certain condition, (θ,p) is unique and continuous w.r.t initial condition. Similarly to the proof for [9, Proposition 5], we first try to obtain estimation on LHS of (4.7) given later. Define φ :=θ −θ and π :=p −p. Then the couple (φ, π) solves Integrating d dt z∈ φ z (t)π z (t) over the interval [t 0 , T ], using the product rule and (4.5), also noting that z λ * z (y, yθ (t))φ y (t) = 0, after some simple calculus, we have (4.6) As λ * (z, μ) = D μ H (z, μ), by Taylor theorem, there exists point a on the line between zθ (t) and zθ (t) such that Then from assumption (2.4), and do above similar on another way round, we have the following estimations: Similarly, we have Since z∈ φ z (T )π z (T ) ≤ 0 by (2.6),p z (t),p z (t) > −M 1 by the choice of M 1 , using (4.6) and the inequalities above, we have By Lipschitz continuity of λ * , there exists C such that Note that unlike the proof for [9, Proposition 5], bothp z andp z can be negative in our setting. If they were nonnegative, then we could choose M 1 = 0, the same technique in [9, Proposition 5] would work. In the case here, we have to introduce M 1 that requires us to do many more prior estimations.
We next derive the bound for π. Integrating the second equation in (4.5) over [t 0 , t], we have As λ * is both bounded and Lipschitz continuous, there exists C such that where the second line holds becausep z (s) + M 1 > 0. Moreover, we have Applying Gronwall inequality, there exists constant C such that φ ≤ C π (4.9) By combining (4.8) and (4.9), using AB ≤ ε A 2 + 1 ε B 2 for A, B > 0, there exists C such that As C only depend on the boundedness and Lipschitz coefficient of H , G, λ * and the bound of D 2 μμ H ,θ ,θ , which depend on the M in Proposition 4.1. We only need to select M 1 such that (4.10) According to Proposition 4.1 and 4.2,Ũ is well defined and continuous w.r.tp 0 . Moreover, for (θ,p), the solution to (2.13) in Theorem 2.6 on [t 0 , T ], which is the approximated solution we got from DNN and want to estimate error on, we have for all t ∈ [t 0 , T ] that: It suggestsŨ has all information ofθ . If we can compareŨ with the U defined similar in Cecchin and Pelino [9] corresponding to the true solution to (2.2), we can estimate the error ofθ . To compareŨ with the U , we need to prove thatŨ also satisfy the master equation similar to U in Cecchin and Pelino [9]. To achieve this goal, we are to prove the continuously differentiability ofŨ in the following steps. We first define the derivative ofŨ w.r.t vector p 0 in a similar way to in Cecchin and Pelino [9], Define operator D y p as follows.

Definition 4.3 Define operator of a function
, and δ z ∈ R K such that all elements are 0 except the z element is 1.

It satisfies
where ∂ ∂δ 1 is in fact the first component of the gradient ofŨ , and DU ( p) := D 1 U ( p) for notation simplicity. When K z=1 μ z = 0, for any y ∈ , we have In order to characterize the directional derivative ofŨ w.r.tp 0 , givenθ andp, let us define a linear system of ODE for (u, ρ) similar to [9,Equation (80)], which will be used quite a few times in the following.
(4.11) Similar to [9,Equation (80)], D μ λ * z (y, yθ (t)) is the gradient of λ * z w.r.t its second variable in R K . The unknowns are u and ρ, while b, c, u T , ρ 0 are given measurable functions, with c satisfying K z=1 c(t, z) = 0. In fact, (4.11) is generalization of [9,Equation (80)]. In (4.11), it is a general directional derivatives of any direction in the terminal condition of u z (T ), while in [9,Equation (80)], it is directional derivatives of specific directions.
We first prove in Proposition 4.5 that the linear system (4.11) has a unique solution, which is linear bounded by its initial and boundary conditions.  a unique solution (u, ρ). Moreover, it satisfies (4.12) Proof We only discuss the case when t 0 = 0, as it can be extended to any t 0 ∈ [0, T ] by the same argument. We first let N 0 bigger than the one in Proposition 4.2. And similar to the proof for Proposition 4.2, to cope with the potential negativeness, we first assumep z (t) ≥ −M 1 uniformly and M 1 ≤ M, and we will decide later the value for M 1 small enough and find the N 0 such that it holds. As z∈ λ * z (y, yθ (t)) = 0, we have z,y∈ p y (t)D μ λ * z (y, yθ (t)) · y u(t) = 0, Define set P η ( ) as We define map ξ from C 0 ([0, T ]; P η ( )) to itself as follows: for a fixed ρ ∈ C 0 ([0, T ]; P η ( )), we consider the solution u = u(ρ) to the backward ODE for u in (4.11), and define ξ(ρ) to be the solution to the forward ODE for ρ in (4.11) with u = u(ρ). From (4.13), ξ(ρ) is well defined as ξ(ρ)(t) ∈ P η ( ) for any t. Similar to the proof for [9,Proposition 6], the solution to (4.11) is the fixed point of mapping ξ , and we prove its existence by Schaefer's fixed point theorem, which asserts that a continuous and compact mapping ξ of a Banach space X into itself has fixed point if the set {ρ ∈ X : ρ = ωξ(ρ), ω ∈ [0, 1]} is bounded. Firstly, ξ is continuous as the system (4.11) is linear in u and ρ. C 0 ([0, T ]; P η ( )) is a convex subset of Banach space C 0 ([0, T ]; R K ). Moreover, from the linearity and bounded coefficients of system (4.11), ξ maps any bounded set of C 0 ([0, T ]; P η ( )) into set of bounded and Lipschitz continuous functions with uniform Lipschitz coefficient in C 1 ([0, T ]; P η ( )), which by Arzela-Ascoli theorem, is relatively compact. By compact map definition, ξ is a compact map. Hence, to apply Schaefer's fixed point theorem, it remains to prove that the set {ρ : ρ = ωξ(ρ)} is uniform bounded for ∀ω ∈ [0, 1]. We can restrict to ω > 0 since otherwise ρ = 0. Fix a ρ such that ρ = ωξ(ρ), which means the couple (u(ρ), ξ(ρ)) solves (for notation simplicity we neglect their dependency on ρ) (4.14) We need to prove the solution (u, ξ) if existed, are bounded uniformly for any ω ∈ (0, 1]. For notation simplicity, we omit the dependence of λ * on the second variable. From (4.14), The first line is 0 by exchanging z and y in the second double sum and using (2.3). Integrating over [0, T ] and using the expression of u z (T ), we have Reorganize the terms and we get From assumption on G in (2.6) and definition of directional derivative, we have Moreover, as λ * (y) = D μ H (y) (we also neglect the dependence of H on the second variable), Sincep andp can be negative, the same step in [9, Proposition 6] to obtain estimation on RHS of above is not applicable. However, asp y (t) + M 1 ≥ 0 for all y ∈ , from (2.4), we can rewrite the RHS of the equation and get the following estimation instead.
So there exists constant C and C 1 (C 1 only depends on the dimension of u) such that where b(t) := (b(t, 1), . . . , b(t, K )), and c(t) is defined similarly. As λ * and D μ λ * is bounded by constant C, from ODE for ξ in (4.14) we have |ξ y (s)|ds So that by Gronwall's inequality, there exists constant C such that where there exists C such that y∈ (p y (t) + M 1 ) ≤ C 2 and T 0 y∈ From (4.15), there exist different constants C at each line such that Moreover, using Gronwall inequality on the backward ODE in (4.14) for function u, there exists C such that Then there exists C such that As ξ(T ) ≤ ξ , using the inequality AB ≤ ε A 2 + 1 4ε B 2 for A, B ≥ 0, there exists C such that Note that the constant C only depends on the boundedness ofθ, which depends on M in Proposition 4.1. If Hence, the solution pair (u, ξ) are bounded for all ω ∈ [0, 1], which means ρ = ωξ(ρ) are also uniform bounded, and hence prove the existence of solution to (4.11). Meanwhile, let ω = 1 leads to the uniform bound estimation for solution (u, ρ) to (4.11), and the uniqueness of it comes directly from (4.12). If N 0 > 3e (M 1 ) M 1 , from Proposition 4.1, we havep y (t) > −M 1 uniformly, which concludes our proof. Hence, we can just update our N 0 set before to satisfy the inequality.

Proposition 4.6 Let
Proof Without loss of generality, we assume t 0 = 0. Similar to the proof of [9, Theorem 7], we can use results from Proposition 4.5 to prove our conclusion. Define N 0 as the one in Proposition 4.5. Thenp y (t),p y (t) > −M 1 uniformly on (t, y) ∈ [0, T ] × . Define linearized system with w :=p(0) −p(0): We know there exists C such that |S(p,p)| ≤ C p(T ) −p(T ) . Set u :=θ −θ − v and ρ :=p −p − ζ , they solve (4.11), where Moreover, since Applying Proposition 4.5 and then Proposition 4.2, we have there exists C such that which concludes the proof.
As (4.16) is a linear system. v and ζ in (4.16) can be viewed as a linear map of w. Hence, Proposition 4.6 suggests thatŨ is differentiable w.r.tp 0 and the directional derivative ∂ ∂wŨ (t, z,p) is the solution to ODE system (4.16), withθ z (t) =Ũ (t, z,p(t)).

Theorem 4.7
There exist positive constants N 0 and C, such that if we have (4.3),Ũ is differentiable on B(P( ), 1 N 0 ), and for any vector w, ∂ ∂wŨ (t, z,p(t)) exists and is Lipschitz continuous w.r.tp, uniformly in t, z. ∂ ∂wŨ (t, z,p(t)) is also continuous w.r.t t.
Using the Lipschitz continuity of λ * , D μ λ * and directional derivatives of G, applying the bounds (4.12) tov andζ , and the estimation on θ −θ , p −p from Proposition 4.2, there Therefore, ∂Ũ ∂w is Lipschitz continuous, uniform w.r.t t and z. On the other hand, for another initial time t 1 > t 0 , we first compare ∂ ∂wŨ (t 0 , z,p(t 0 )) and ∂ ∂wŨ (t 1 , z,p(t 1 )), where (t 1 ,p(t 1 )) is on the path (t,p(t)) start from t 0 to T . They are both characterized by system like (4.17), though we need to replace t 0 with t 1 for ∂ ∂wŨ (t 1 , z,p(t 1 )). Let (ṽ,ζ ) satisfy (4.17). Then we know ) is also characterized by (4.17), except that t 0 and initial value need to be replaced by t 1 andζ (t 1 ). It means ∂ ∂ζ (t 1 )Ũ (t 1 , z,p(t 1 )) − ∂ ∂wŨ (t 1 , z,p(t 1 )) can also be characterized by (4.17) except that t 0 and initial value need to be replaced by t 1 andζ (t 1 )−w. From Proposition 4.5, we have there exists constant C such that As λ * , D μ λ * and the directional derivative of G are Lipschitz continuous and uniform bounded, as well as that bothṽ andζ are uniformly bounded, we know hence both dṽ z (t) dt and dζ z (t) dt are also uniformly bounded by some constant C. We have Combine above, we know there exists constant C such that Then by the continuity of ∂ ∂wŨ w.r.t its third argument, as well as the continuity ofp, we can also conclude that ∂ ∂wŨ is continuous w.r.t t, its first argument.
From Proposition 4.6 and Theorem 4.7,Ũ is C 1 on compact setB(P( ), 1 N 0 ). Hence, both DŨ and the directional derivative ofŨ along any direction are well defined, bounded, and Lipschitz continuous, uniformly for t ∈ [0, T ]. Theorem 4.7 also suggests that the directional derivative ofŨ along any direction is continuous w.r.t t. Thanks to these properties, we can use similar idea of the proof for existence of solution to the master equation in [9,Section 5.3.1], to show thatŨ also satisfies the master equation with some extra error terms. (θ,p) be the solution to ODE system (2.13). DefineŨ as (4.10). There exist positive constants N 0 and C, such that if we have condition (4.3) in Theorem 2.6, theñ U satisfies the following master equation along the path (t,p(t)) on [t 0 , T ], as long as
Let us first compute limit of the following when h tends to 0.
For the first term in (4.19), we first define By definition, we derive the derivative of W as Then the first term in (4.19) can be reformulated as From Lemma 4.4 and c(h) = K z=1 (p z (t + h) −p z (t)) = 0, we know Substitute above to the first term in (4.19), we get , 1), . . . , 2 (t, z)). From Theorem 4.7, we know for any y ∈ , As DŨ is uniform bounded, we have the following with dominated convergence theorem: On the other hand, dividing h and letting h → 0, we have the following: The last equation comes from Definition ofŨ , which suggests yŨ = yθ (t).
Then the DNN approximation (θ,p) is characterized by (4.18), while the true solution (θ, p) of the MFG is characterized by similar one, except and 3 are 0. Although the two master equations are now backward PDE, it is still difficult to directly compare their solutions. Hence, we would like to approximate the two PDEs by two ODE systems on some discrete grids of P( ).
Define P N ( ) = {( n 1 N , . . . , n K N ), K z=0 n z = N , n z ∈ Z + }. Then P N ( ) is a discrete grid of P( ). For any p N ∈ P N ( ), define operators: z, p N )). (4.20) With the discrete grid and discrete operators defined above, we next show in Proposition 4.9 that the master equation can be approximate by a backward ODE system.

Proposition 4.9
There exists N 0 such that for N > N 0 , every p N ∈ P N ( ) and z ∈ ,Ũ solves It looks similar to (4.21), except for the discrete operator N ,y and the differential operator D y . Hence, we next compare the two operators similar to [9,Proposition 3]. We first discuss the first component δ 1 − δ y of N ,yŨ (t, z, p N ) defined in (4.20), where the last equality is derived by the Lipschitz continuity in p N ∈ P( ) of D yŨ . As above can be applied to every component in N ,yŨ (t, z, p N ), we conclude that there exists N 0 such that for N > N 0 , From the Lipschitz continuity of H and λ * , as well as thatŨ is bounded, there exists constant C such that From Proposition 4.2, we know there exists constant C such that From Lemma 4.4 and z∈ λ * z (y, yŨ ) = 0 for every y ∈ , we have D yŨ (t, z, p N )) · λ * (y, yŨ ) = DŨ (t, z, p N )) · λ * (y, yŨ ).
From the boundedness of λ * and , there is constant C such that We can conclude the proof by defining Finally we can proceed to the proof of our main result. The main idea of the proof is to characterize both the DNN approximation (θ,p) and the true solution (θ, p) by their corresponding master equations, which are further approximated by two backward ODE systems on certain discrete grid points. Then the error of the two can be directly estimated on these grid points using Gronwall inequality. As both (θ,p) and (θ, p) are uniformly Lipschitz continuous with respect to their initial conditions, the error between the grid points can also be estimated.
Completion of proof of Theorem 2.6 Since ODE system (2.2) admits a solution to any initial value p 0 ∈ P( ), we can define U (t, z, p) := θ(t, z).
From Cecchin and Pelino [9], U satisfy the master equation for any p ∈ P( ): There exists a constant C such that |A| + |B y | + |C y | ≤ C T t d(s)ds.
As p N ∈ P N ( ), there exists constant C such that By applying Gronwall inequality, there is constant C such that for every t ∈ [0, T ], z ∈ and p N ∈ P N ( ) we have (4.23) For N > 2N 0 where N 0 is defined in Proposition 4.9 above, ifp ∈ B(P( ), 1 N ), there is p ∈ P( ) such thatp = p + 4 and 4 < 1 N . And there exists p N ∈ P N ( ) such that From Proposition 4.1,Ũ (t, z,p) is well defined, and from Proposition 4.2, there exists constant C independent of N and p, such that for every t ∈ [0, T ] and z ∈ , Combining the above inequalities with (4.23), we have |Ũ (t, z,p) − U (t, z, p)| ≤ C N for some constant C independent to N and p, which is equivalent to By using the uniform boundedness and Lipschitz continuity of λ * , we can prove p andp are Lipschitz continuous w.r.t θ andθ , respectively, with the help of Gronwall inequality and technique similar to the proof of Proposition 4.2. Note also that the Lipschitz coefficient only depends on the uniform bound and Lipschitz continuous coefficient of λ * , which again only depend on the preliminary M given in Proposition 4.1. Hence, we know there exists a uniform constant C independent on N such that This concludes the proof.

Proof of Proposition 3.1
Proof The proof is divided to several steps to prove the conditions for H and G, respectively.
Step 2: proof of H satisfying Assumption 2.1.
Step 3: proof of G satisfying Assumption 2.1. From (3.5), the differentiability and (2.5) of G are trivial. We then only need to prove (2.6). Note that which is non-positive, this concludes the proof.

Conclusions
In this paper, we have solved the finite state mean field games problem by the deep neural network method. By transforming the fully coupled FBODE system to the master equation, we have proved that the error between the true solution and the approximate solution is linear to the square root of DNN loss function. We have also applied the DNN method to solve the optimal market making problem with terminal rank-based trading volume reward which is shown to perform better in liquidity provision and trading cost reduction than the linear trading volume reward. There remain many open questions such as general heterogeneous interaction structure and infinite state MFG. We leave these and other questions for future research.