In this chapter, we introduce the MFG framework for discrete state spaces, stressing the points that are most relevant for the models studied in the next chapters. As was already mentioned, the difference between the present MFG setting and the modeling of Part I is that now the small agents themselves become rational optimizers and are not supposed just to follow some prescribed deterministic or stochastic strategies (such as myopic behavior). For simplicity, in treating MFG we will exclusively use finite-state models and will not touch the extensions to countable state spaces, though the results of Chapter 4 allow one to extend large portions of the theory more or less directly to this more general framework.

1 Preliminaries: Controlled Markov Chains

Let us recall briefly the theory of controlled Markov chains (see details, e.g., in [113]). Let \(Q=\{Q_{ij}(t, u)\}\) be a family of Q-matrices, \(i, j\in \{1, \cdots , d\}\), depending continuously on a parameter \(u\in U\subset \mathbf {R}^n\) and piecewise continuously on time t. By a (Markov) strategy we mean any collection of piecewise continuous functions \(\hat{u}_j(t)\). A controlled Markov chain specified by such a strategy is the Markov chain \(X^{\hat{u}}_{s, i}(t)\) with the Q-matrices \(Q_{ij} (t, \hat{u}_i(t))\). Suppose one gets a payoff J(tju) per unit of time staying at j with control u around time t and the terminal payoff \(S_T(j)\), paid at time T if the process terminates in j at time T. The corresponding Markov control problem is to find the maximal total payoff

$$ S(t,j)=\max _{\hat{u}} \mathbf {E}[\int _t^T J(s,j(s), \hat{u}_{j(s)}(s)) ds +S_T(X^{\hat{u}}_{t, x}(T))], $$

where \(j(s)=X^{\hat{u}}_{t, j}(s)\), and a strategy, called an optimal strategy, where this maximum is attained.

It is known that S(tj) solves the backward (evolving from the terminal time T backward) Bellman equation:

$$\begin{aligned} \frac{\partial S(t,j)}{\partial t} +\max _u [J(t,j, u) +\sum _{k=1}^d Q_{jk} (t,u) S(t, k)]=0, \end{aligned}$$
(5.1)

with the terminal condition \(S(T, j)=S_T^j\).

The standard heuristic derivation of this equation goes as follows. Assuming that S is smooth and taking into account that our Markov chain can have only one jump during a small period of time \(\tau \) (see (2.3) for the approximations by discrete-time Markov chains), we can write approximately

$$ S(t,j)=\max _u [J(t,j,u)\tau +\tau |Q_{jj}| \sum _{k\ne j}\frac{Q_{jk}}{|Q_{jj}|}S(t+\tau , k)+(1-\tau |Q_{jj}|)S(t+\tau , j)]. $$

This way of expressing the optimal payoff at time t via its values at future times is often referred to in general as the principle of optimality. Expanding S in a Taylor series and keeping only terms linear in \(\tau \) (free terms S(tj) cancel) yields (5.1).

A rigorous justification is usually performed by first proving the well-posedness of the Bellman equation and then showing that its solution supplies the required maximum (the so-called verification theorem arguments).

2 MFG Forward–Backward Systems and the Master Equation

Suppose we have a family \(Q=\{Q_{ij}(x, u)\}\) of Q-matrices, \(i, j\in \{1, \cdots , d\}\), depending continuously on a parameter \(u\in U\) and Lipschitz continuously on \(x\in \Sigma _d\). Suppose there are N players, each moving according to Q and aiming at maximizing the payoff

$$\begin{aligned} \mathbf {E}[\int _t^T J(s, j(s), x(s), u(s)) \, ds +S_T(j(T))], \end{aligned}$$
(5.2)

where j(s) is the position of the player at time s. Here the motions of all players are coupled, since all transitions depend on the total distribution of players \(x=(n_1, \cdots , n_d)/N\), where \(n_j\) denotes the number of players in state j at any given time.

The MFG methodology suggests the following solution concept for this problem for large N. Let an evolution of the distributions x(t), \(t\in [0,T]\), be a known continuous curve. Then every player should search for the maximal payoff

$$ S(t,j)=\max _{\hat{u}} \mathbf {E}[\int _t^T J(s, j(s), x(s),\hat{u}_{j(s)}(s))\, ds +S_T(j(T))], $$

where \(j(s)=X^{\hat{u}}_{s, j}(s)\) is the Markov chain with the Q-matrices \(Q_{ij} (x(t),\hat{u}_i(t))\). As follows from (5.1), S(ti) should satisfy the backward Bellman equation

$$\begin{aligned} \frac{\partial S(t,j)}{\partial t} +\max _u [J(t,j,x(t),u) +\sum _k Q_{jk} (x(t),u) S(t, k)]=0, \end{aligned}$$
(5.3)

with the terminal condition \(S(T, j)=S_T(j)\). Assume that we can find a solution S(tx) and the corresponding optimal strategy \(\hat{u}_j(t)=\hat{u}_j(t, x(t))\) providing \(\max \) in this equation at any time t. Now, if all players are using this optimal strategy, the mean-field interacting particle systems of N players given by the Q-matrices

$$ \hat{Q}_{ij}(t, x)= Q_{ij}(x,\hat{u}_i(t)) $$

converge, according to Theorem 2.5.1, to the solutions \(X_{0,x(0)}(t)\) of the system of (forward) kinetic equations (2.40):

$$\begin{aligned} \dot{x}_k =\sum _{i \ne k} [x_i Q_{ik}(x,\hat{u}_i(t)))-x_k Q_{ki} (x,\hat{u}_k(t)))] =\sum _{i=1}^d x_i Q_{ik}(x, \hat{u}_i(t))), \quad k=1,..., d. \end{aligned}$$
(5.4)

Let \(\hat{x}(t)\) be the solution of this system with the initial condition \(\hat{x}(0)=x(0)\). The consistency between the controlled dynamics and the mean-field evolution can be naturally described by the requirement that \(\hat{x}(t)=x(t)\). This is exactly the forward–backward MFG consistency problem, or MFG consistency condition, also referred to by some authors as a Nash–MFG equilibrium. Equivalently, starting with a control \(u_j(t)\), we can solve the corresponding kinetic equation to find the distribution x(t) and then find the corresponding optimal control \(\hat{u}_j(t)\) fitting the solution of the HJB (5.3). The MFG consistency condition can then be expressed by the equation \(u(t)=\hat{u}(t)\). In any case, it can be expressed by saying that the pair \((\hat{x}(t), \hat{u}_j(t))\) provides a solution to the coupled forward–backward system (5.3)–(5.4), more precisely, to its initial–terminal value problem (initial \(x_0\) is given for (5.4) and terminal \(S_T\) for (5.3)).

One can expect that solutions to the MFG consistency problem should provide some approximations to the solutions of games with N players in which each player is trying to maximize (5.2). Thus we are led to two basic problems of MFG theory: describe the solutions to the MFG consistency problem (say, prove an existence and/or uniqueness theorem) and provide an exact link between these solutions and the corresponding original game with a finite number of players. The latter task can be discussed with two approaches (often requiring different techniques): showing that Nash equilibria of games with N players converge to a solution of the MFG consistency problem, or showing that the solutions of the MFG consistency problem yield approximate Nash equilibria for finite-player games. In this book we will work with models in which the solutions can be calculated explicitly, and thus we shall not touch here upon the first problem of MFG theory mentioned above. The second problem will be addressed in the next section.

Though we will work with this forward–backward formulation, let us mention, for completeness, an alternative approach to the MFG consistency problem that arises from looking directly at the limiting evolution of the pair (j(t), x(t)), where x(t) evolves according to the kinetic equations,

$$ \dot{x}_k =\sum _{i=1}^d x_i Q_{ik}(x (t), \hat{u}_i(t)), \quad k=1,..., d, $$

and \(j(t)=X^{\hat{u}}_{s, x}(t)\) is the Markov chain with the Q-matrices \(Q_{ij} (x(t), \hat{u}_i(t))\), as in the controlled Markov process in the continuous state space \(\{1,\cdots , d\} \times \Sigma _d\). This is, strictly speaking, no longer a chain, since it evolves by jumps and continuous displacements in the continuous state space. Nevertheless, in controlling the process with the objective to maximize (5.2), we find for the optimal payoff the Bellman equation in the form

$$\begin{aligned} \frac{\partial S(t,j, x)}{\partial t} +\max _{u_1, \cdots , u_d} \left[ J(t,j,x,u_j) +\sum _k Q_{jk} (x,u_j) S(t,k,x) +\sum _{k,i}\frac{\partial S(t,j,x)}{\partial x_k}x_i Q_{ik}(x, u_i)\right] =0. \end{aligned}$$
(5.5)

This is obtained analogously to (5.1) from the approximate equation

$$ S(t,j, x)=\max _{u_1, \cdots , u_d} [\tau J(t,j,x,u_j) +\tau |Q_{jj}| \sum _{k\ne j}\frac{Q_{jk}(x, u_j)}{|Q_{jj}|}S(t+\tau ,k,X_{x, t}(t+\tau )) $$
$$ +(1-\tau |Q_{jj}|)S(t+\tau , j,X_{x, t}(t+\tau ))], $$

or, discarding the higher-order terms in \(\tau \),

$$ S(t,j, x)=\max _{u_1, \cdots , u_d} [J(t,j,x,u)\tau +\tau \sum _{k\ne j} Q_{jk} S(t,k, x) +(1-\tau |Q_{jj}|)S(t+\tau , j,X_{x, t}(t+\tau ))], $$

using the first-order Taylor approximation

$$ S(t+\tau , j,X_{x,t}(t+\tau ))=S(t, j,x)+\tau \frac{\partial S}{\partial t}(t, j,x) +\tau \sum _{k,i}\frac{\partial S}{\partial x_k}(t, j,x) x_iQ_{ik}(x, u_i). $$

Equation (5.5) is called the master equation (in backward form) . The next statement shows that this equation provides (at least when it is reasonably well posed) an alternative approach to the analysis of the MFG consistency problem, which selects the most effective solutions to backward–forward systems, thus forming an envelope for various solutions of the MFG consistency problem.

Proposition 5.2.1.

Let S(tjx) be a smooth function solving equation (5.5) with the terminal condition \(S_T(j)\) and giving the optimal payoff in the Markov decision problem on \(\{1,\cdots , d\} \times \Sigma _d\) used to derive this equation. Let it be possible to choose a Lipschitz continuous (in x) selector \(\tilde{u}_j(t, x)\) giving a maximum in (5.5) (for this solution S) and hence to build the corresponding trajectories \(X_{x_0}(t)\) solving the kinetic equations

$$ \dot{x}_j=\sum _ix_iQ_{ij}(x, \tilde{u}_i(t, x)) $$

with any initial \(x_0\).

Then the pair \((X_{x_0}(t), \tilde{u}_j(t, X_{x_0}(t))\) is a solution to the MFG consistency problem, and

$$\begin{aligned} \tilde{S}(t,j)\le S(t,j, x_0), \end{aligned}$$
(5.6)

for the payoff \(\tilde{S}(t, j)\) on every other solution \((\hat{x}(t), \hat{u}_j(t))\) to the forward–backward MFG consistency problem with \(\hat{x}(t)=X_{x_0}(t)\).

Proof.

By the definition of S(tjx) as the solution to a Markov decision problem in \(\{1,\cdots , d\} \times \Sigma _d\), \(S(t,j, x_0)\) is not less than the payoff that can be obtained by any player using any symmetric strategy (the same as all other players) given the dynamics \(X_{x_0}(t)\) of the total distributions, implying (5.6). It follows that the payoff \(S(t,j, x_0)\) cannot be improved by changing the strategies inside the class of symmetric strategies. Consequently, \(\tilde{u}_i(t, x(t))\) provides the maximum payoff in the class of these strategies and hence provides a solution to the forward–backward MFG consistency problem.\(\square \)

Some comments are in order here. MFG problems were introduced by Lasry–Lions [159] and Huang–Malhame–Caines [118, 119]. The deep theory developed so far is reflected in several books and surveys. We give a very brief bibliographical review of its development in Appendix. We were intentionally brief in our presentation, aiming at possibly the most elementary exposition needed to grasp the simplest finite-state models dealt with in the present book. For instance, as we have already mentioned, we will not use the master equation. We introduced it to give an idea of the major direction of research in MFG. In fact, the exploitation of the master equation is the only way to include in the theory the common noise and/or a major player, whose coupling with small players includes such noise (see [160]). The master equation is the limiting equation for the system of Bellman equations describing N-player games. In this sense, it is a decoupling equation encoding the entire mean-field game in one evolution. In particular, having a solution to the master equation leads to a solution of the convergence problem. All these aspects are fully discussed in [56] and the fundamental two-volume monograph [60]. In [56, 60, 160], one can also find many deep results on the existence and/or uniqueness of the master equation, including some cases for small players with a finite state space.

3 MFG Solutions and Nash Equilibria for a Finite Number of Players

In the next chapters we shall construct the solutions to the MFG consistency problem for several models and discuss the properties of these solutions, without paying attention to their links with the related games of finitely many players. The justification for this study will be provided now, where we shall show that these solutions represent approximate Nash equilibria for N-player games. The results of this section will not be used explicitly in our further analysis.

Let us recall that a Nash equilibrium in a game of N players is a collection of strategies of these players, often referred to as a profile of strategies, such that a unilateral deviation of any particular player from this profile cannot improve the payoff of this agent. Therefore, as Nash himself stressed, this is a no-regret outcome for each player. An \(\epsilon \)-Nash equilibrium is a profile of strategies such that a unilateral deviation of any particular player from this profile can improve the payoff of this agent by an amount not exceeding \(\epsilon \).

Recall that our functional norms always refer to the dependence on x uniform with respect to other variables, so that, for instance,

$$ \Vert Q\Vert _{C^1}=\Vert Q\Vert +\sup _i \sum _j \left| \sup _{k,x, u} \frac{\partial Q_{ij}}{\partial x_k}\right| , \quad \Vert Q\Vert _{C^2}=\Vert Q\Vert _{C^1}+ \sup _i \sum _j \left| \sup _{k,l,x, u} \frac{\partial ^2 Q_{ij}}{\partial x_k \partial x_l}\right| . $$

The Nash equilibria and the \(\epsilon \)-Nash equilibria for dynamic N-player games can be looked at in several ways, which are traditionally distinguished in the literature on optimization theory. In general, one speaks about open loop control and related open loop equilibria if players choose their control strategies \(u_j(t)\) from the beginning, irrespective of the dynamics of the game (but which may depend on the common source of uncertainty). One speaks about closed loop control and related closed loop equilibria if players choose feedback controls \(u_j(t, z)\), which at any time t depend also on the position z of the process. In the MFG setting, new possibilities arise, since from the point of view of each player, the position includes the player’s own position, say j, and the overall distribution x. Let us speak about partially open loop control and related partially open loop equilibria if each player chooses among strategies \(u_j(t)\), piecewise continuous in t, that depend on the player’s own position at time t, but not on the overall distribution x. The use of such strategies, sometimes referred to as distributed strategies, is reasonable in many cases in which the overall distribution is not easily observable by each concrete player. By a closed loop control we mean, as usual, a control \(u_j(t, x)\) that is supposed to be applied by a player at time t when its position is j and the overall distribution is x.

For the sake of transparency, we shall concentrate on the model of Section 5.2 with the running cost function J in (5.2) not depending on x explicitly, that is, with the payoff

$$\begin{aligned} \mathbf {E}[\int _t^T J(s, j(s), u_{j(s)}(s)) \, ds +S_T(j(T))], \end{aligned}$$
(5.7)

with a continuous function J, and with partially open loop equilibria. Thus we assume that we have a family \(Q=\{Q_{ij}(x, u)\}\) of Q-matrices, \(i, j\in \{1, \cdots , d\}\), depending continuously on a parameter \(u\in U\) and Lipschitz continuously on \(x\in \Sigma _d\). Suppose there are N players, each moving according to Q and aiming at maximizing the payoff (5.7).

Theorem 5.3.1.

Let \((\hat{x}(t), \hat{u}_j(t))\) be a solution to the backward–forward MFG consistency problem. Then for the initial distribution x(0), the symmetric profile of strategies \(\hat{u}_j(t))\) is an \(\epsilon \)-Nash equilibrium in the partially open loop setting, with \(\epsilon \) of order \(1/\sqrt{N}\). If Q(xu) is twice continuously differentiable in x uniformly in u, then the order of \(\epsilon \) improves to 1 / N.

Proof.

We have to show that if all players use the strategy \(\hat{u}_j(t)\), then any particular player unilaterally deviating from this strategy cannot increase the payoff by an amount exceeding \(\epsilon \). Thus let us assume that one tagged player is using some deviating strategy \(u_j^{dev}(t)\), while other players stick to \(\hat{u}_j(t)\). The natural state space for such a Markov chain will be \(\{1, \cdots , d \}\times \Sigma _d\), the first discrete coordinate j denoting the position of the tagged player.

We are exactly in the setting of Section 2.7. The operator \(L_t^{N, dev}\) given by (2.79) and the limiting operator (2.80 take the form

$$ L_t^{N,dev}f(j,x)= \sum _k Q^{dev}_{jk} (x, u_j^{dev}(t)) (f(k,x)-f(j, x)) $$
$$\begin{aligned} +\sum _i (x_i-\delta ^j_i/N) \sum _{k\ne i} Q_{ik} (x, \hat{u}_j(t)) \left[ f(j, x-e_i/N +e_k/N)-f(j, x)\right] , \end{aligned}$$
(5.8)
$$ \Lambda _t^{dev}f(j,x)= \sum _k Q^{dev}_{jk} (x, u_j^{dev}(t)) (f(k,x)-f(j, x)) $$
$$\begin{aligned} +\sum _i x_i \sum _{k\ne i} Q_{ik} (x, \hat{u}_j(t)) \left[ \frac{\partial f}{\partial x_k}-\frac{\partial f}{\partial x_i}\right] (j, x). \end{aligned}$$
(5.9)

Applying Theorem 2.7.1, we derive that the payoffs for the tagged player in the N-player game differ by that player’s payoff in the limiting evolution by an amount not exceeding \(\epsilon \). Notice that in order to take into account the running payoff J, we apply this theorem not only to \(S_T(j)\) and terminal time T, but also to each \(J(s, j(s), u_{j(s)}(s))\) with terminal time s. Since \(\hat{u}_j(t)\) is optimal in the limiting game, it is therefore \(\epsilon \)-optimal for the N-player game.\(\square \)

Remark 20.

To extend the theorem to the closed-loop control and J depending on x, one just has to apply Theorem 2.7.2, which extends Theorem 2.7.1 to functions f depending explicitly on x.

4 MFG with a Major Player

We complete here our brief review of the foundations of MFGs on finite state spaces by explaining two popular extensions of the basic model, which includes a major player. Namely, let us assume, as in Sections 2.6 and 3.3, that the transition matrices \(Q=Q(x,u, b)\) and the payoffs J(tjxub) depend additionally on a parameter b controlled by the major player (a principal). If, as in Section 2.6, the principal is playing just the best response \(b^*(x)\), we are directly back in the original problem with \(Q=Q(x,u, b^*(x)\). However, if the major player chooses b strategically, aiming at maximizing some payoff of the general type

$$ \int _t^T B(x(s), b(s)) \, ds +V_T(x(T)), $$

the situation becomes different.

In this case, the MFG methodology works as follows. If the evolution of the distributions x(t), \(t\in [0,T]\), is a known continuous curve, then the major player finds the optimal strategy \(\hat{b}(t)\) and, based on this strategy, any given player should search for the maximal payoff

$$ S(t,i)=\max _{\hat{u}} \mathbf {E}[\int _t^T J(s, j(s), x(s),\hat{u}_{j(s)}(s), \hat{b}(s)) dt +S_T(j(T))], $$

where \(j(t)=X^{\hat{u}}_{s, x}(t)\) is the Markov chain with Q-matrices \(Q_{ij} (x(t),\hat{u}_i(t), \hat{b}(t))\). The Bellman equation for the optimal payoff of each small player S(ti) takes the form

$$\begin{aligned} \frac{\partial S(t,j)}{\partial t} +\max _u [J(t,j,x(t),u, \hat{b}(t)) +\sum _k Q_{jk} (x(t),u, b(t)) S(t, k)]=0. \end{aligned}$$
(5.10)

After finding a solution S(tx) and the corresponding optimal strategy \(\hat{u}_j(t)=\hat{u}_j(t, x(t))\) providing \(\max \) in this equation at any time t, we can solve the corresponding kinetic equations

$$\begin{aligned} \dot{x}_k =\sum _{i=1}^d x_i Q_{ik}(x, \hat{u}_i(t)), \hat{b}(t)), \quad k=1,..., d. \end{aligned}$$
(5.11)

Let \(\hat{x}(t)\) be the solution of this system with the initial condition \(\hat{x}(0)=x(0)\). The MFG consistency problem, or MFG consistency condition with major player, can be expressed by the equation \(\hat{x}(t)=x(t)\).