DiscreteTime Ergodic MeanField Games with Average Reward on Compact Spaces
 23 Downloads
Abstract
We present a model of discretetime meanfield game with compact state and action spaces and average reward. Under some strong ergodicity assumption, we show it possesses a stationary meanfield equilibrium. We present an example showing that in general an equilibrium for this game may not be a good approximation of Nash equilibria of the nperson stochastic game counterparts of the meanfield game for large n. Finally, we identify two cases when the approximation is good.
Keywords
Meanfield game Anonymous game Stochastic game Average reward Ergodic reward Stationary equilibrium Geometric ergodicity1 Introduction
Meanfield game theory has been developed independently by Lasry and Lions [39] and by Huang et al. [37] to study noncooperative differential games with a large number of identical players. The main idea behind their models was that by approximating the game with a limit where the number of players is infinite, we can reduce the game problem, which for a large finite number of players becomes untractable, to a much simpler singleagent decision problem. The idea has been largely accepted by the differential game community, which resulted in a huge number of publications on the topic over the last decade. The reader interested in differentialtype meanfield game models discussed so far is referred to the books [8, 21] or the survey [32].
Our focus in this paper is, however, on similar discretetime models, which, surprisingly, appeared in the gametheoretic literature long before the pioneering works on meanfield games. In the seminal paper by Jovanovic and Rosenthal [38], each player controls an individual discretetime Markov chain, while the global state of the game, defined as the probability distribution over individual states of all the players, becomes deterministic. While the tools used there were significantly different from those considered in differential meanfield game literature, the general principle, which was to simplify the original large game problem by considering an approximation with oneagent optimization models, stayed the same. Some generalizations of model of Jovanovic and Rosenthal were given in [2, 9, 10, 22, 27, 45]. All of these papers considered games with discounted rewards (costs). Discounted discretetime meanfield games were also studied in a number of economic applications, see references in [2].
Our paper deals with a different reward criterion—longrun average reward (sometimes also called ergodic reward), often used in Markov decision process and dynamic game problems, yet hardly present in the discretetime meanfield game literature. To the best of our knowledge, there are only three papers dealing with this kind of problems in a discretetime setting, discussed in more detail below. The literature on differentialtype meanfield games with this payoff criterion is a lot more extensive. In [28, 39], results about relation between games with a large finite number of players and meanfield games of this type are proved. [18, 19, 20] discuss the relation between the solutions of ergodic meanfield games and meanfield games with large fixed time horizon. Existence and uniqueness of solutions to averagereward meanfield games are addressed in many articles including [5, 6, 7, 23, 24, 25, 30, 31, 39, 40, 42] and a number of preprints. Finally, [1, 4, 15] provide some numerical methods for solving this type of games. The first model of discretetime meanfield game with average reward has been introduced in [48], where the existence of a stationary meanfield equilibrium has been proved under some ergodicity assumption in case when state and action spaces of the players are finite. Under the additional assumption that the individual transitions of the players do not depend on the empirical distribution of states or actions of all the players, it also shows that the meanfield model approximates well the nperson models for n large enough. Similar assumption has also been made in [12], where averagereward games with \(\sigma \)compact Polish individual state spaces were studied. The problem is that apart from this assumption, the results in [12] used some strong regularity conditions stated in terms of a specific metric topology on the state of stationary policies, which seem to be too strong to be satisfied under any reasonable assumptions. In the last paper, we need to mention here [16] averagereward discretetime meanfield games were used to study a dynamic routing model. The main contribution of the paper was presenting a linearprogramming formulation of the problem of finding a stationary equilibrium in games of this type.
In our paper, we do not consider such a general setting as that in [12], limiting ourselves to the games with compact state and action spaces. In return, within this framework we make assumptions that are satisfied by a large class of models. Moreover, we state them in terms of basic primitives of the model, making them rather easy to verify. Finally, in general we do not require the independence of the individual transitions from the empirical distribution of states and actions of the players. In our article, we give the results of two types. First, under the assumptions given in Sect. 3, we show that the meanfield game has a stationary equilibrium. Then, we provide several results, both positive and negative, linking equilibria in the model with a continuum of players with \(\varepsilon \)equilibria in its nperson stochastic counterparts when n is large.
The organization of the paper is as follows: In Sect. 2, we present the general framework we are going to work with and define what kind of solutions we will be looking for. In Sect. 3, we present our assumptions. Sections 4 and 5 provide our main results—in Sect. 4 we prove the existence of the stationary equilibrium in the meanfield game model, while in Sect. 5 we give results linking equilibria in the meanfield game with approximate equilibria in games with large finite number of players. We end the paper with conclusions in Sect. 6.
2 The Model
2.1 DiscreteTime MeanField Games

We assume that the game is played in discrete time, that is, \(t\in \{ 1,2,\ldots \}\).

The game is played by an infinite number (continuum) of players. Each player has a private state\(s\in S\), changing over time. We assume that the set of individual states S is the same for each player and that it is a nonempty compact metric space. Private state of player i at time t is denoted by \(s^i_t\). If we refer to an arbitrary player, we skip the superscript i.

A probability distribution \(\mu \) over Borel sets^{1} of S is called a global state of the game. It describes the proportion of the population which is in each of the individual states. Global state at time t will be denoted by \(\mu _t\). We assume that at every stage of the game, each player knows both his private state and the global state, and that his knowledge about individual states of his opponents is limited to the global state.

The set of actions available to any player in state \((s,\mu )\) is given by \(A(s,\mu )\), with \(A:=\bigcup _{(s,\mu )\in S\times \Delta (S)}A(s,\mu )\)—a compact metric space. \(A(\cdot ,\cdot )\) is a nonempty valued correspondence.

The global distribution of the state–action pairs is denoted by \(\tau \in \Delta (S\times A)\). If we refer to the global state–action distribution at a specific time t, we write \(\tau ^t\).

Individual’s immediate reward is given by a bounded measurable function \(r:S\times A\times \Delta (S\times A)\rightarrow \mathbb {R}\). \(r(s,a,\tau )\) gives the reward of a player at any stage of the game when his private state is s, his action is a and the distribution of state–action pairs among the entire player population is \(\tau \).

Transitions are defined for each individual separately with a transition kernel \(Q:S\times A\times \Delta (S\times A)\rightarrow \Delta (S)\). \(Q(B\cdot ,\cdot ,\tau )\) is product measurable for any \(B\in \mathcal {B}(S)\) and any \(\tau \in \Delta (S\times A)\).
 Global state at time \(t+1\) is given by the aggregation of individual transitions of the players,As it can be clearly seen from the above formula, the transition of the global state is deterministic.$$\begin{aligned} \Phi \big (\cdot \tau ^t\big )=\int _{S\times A}Q\big (\cdot s,a,\tau ^t\big )\tau ^t\big (\mathrm{{d}}s\times \mathrm{{d}}a\big ), \end{aligned}$$
Next, we define the solution we will be looking for:
Definition 1
2.2 nPerson Stochastic Games

The state space is \(S^n\) and the action space for each player is A. Similarly as in the case of the meanfield game, the set of actions available to player i in state \(\overline{s}=(s_1,\ldots ,s_n)\) is given by \(A^i_n(\overline{s}):=A\left( s_i,\frac{1}{n}\sum _{j=1}^n\delta _{s_j}\right) \).
 Individual immediate reward of player i, \(r^i_n:S^n\times A^n\rightarrow \mathbb {R}\), \(i=1,\ldots ,n\) is defined for any profile of players’ states \(\overline{s}=(s_1,\ldots ,s_n)\) and any profile of players’ actions \(\overline{a}=(a_1,\ldots ,a_n)\) by$$\begin{aligned} r^i_n(\overline{s},\overline{a}):=r\left( s_i,a_i,\frac{1}{n}\sum _{j=1}^n\delta _{(s_j,a_j)}\right) . \end{aligned}$$
 The transition probability \(Q_n:S^n\times A^n\rightarrow \Delta (S^n)\) can be defined for any \(\overline{s}\in S^n\) and \(\overline{a}\in A^n\) by the formula (for the clarity of exposition we write it only for Borel rectangles, which obviously defines the product measure):$$\begin{aligned}&Q_n(B_1\times \ldots \times B_n\overline{s},\overline{a})\\&\quad :=Q\left( B_1s_1,a_1,\frac{1}{n}\sum _{j=1}^n\delta _{(s_j,a_j)}\right) \ldots Q\left( B_ns_n,a_n,\frac{1}{n}\sum _{j=1}^n\delta _{(s_j,a_j)}\right) . \end{aligned}$$

In nperson game, we consider stationary strategies \(f:S^n\rightarrow \Delta (A)\) (satisfying, for each player i, two standard conditions: \(f(B\cdot )\) is measurable for any \(B\in \mathcal {B}(A)\) and \(f(A^i_n(\overline{s})\overline{s})=1\) for every \(\overline{s}\in S^n\)). The set of all stationary strategies for player i is denoted by \(\mathcal {F}_n^i\).
 The functional maximized by each player is his average reward defined for any initial state \(\overline{s_0}\in S^n\) and any profile of stationary strategies \(\overline{f}=(f_1,\ldots ,f_n)\) by the formulawith \(\mathbb {P}^{\overline{s_0},Q_n,\overline{f}}\) denoting the measure on the set of all infinite histories of the game corresponding to \(\overline{s_0}\), \(Q_n\) and \(\overline{f}\) defined with the help of the IonescuTulcea theorem similarly as in case of the meanfield game.$$\begin{aligned} J_n^i\big (\overline{s_0},\overline{f}\big ):=\liminf _{T\rightarrow \infty } \frac{1}{T+1}\mathbb {E}^{\overline{s_0},Q_n,\overline{f}} \sum _{t=0}^Tr^i_n(\overline{s_t},\overline{a_t}) \end{aligned}$$

Finally, the solution we will be looking for in nperson counterparts of the stochastic game is that of Nash equilibrium, which is the standard solution concept considered in the stochastic game literature:
Definition 2
The notation \([\overline{f}_{i},g]\) denotes here and in the sequel the profile of strategies \(\overline{f}\) with its ith component replaced by g. If we only show that the above inequality is only true for strategies g from some subclasses \(\mathcal {F}_n^i(0)\subset \mathcal {F}_n^i\), we say that \(\overline{f}\) is a Nash equilibrium in the class \(\mathcal {F}_n^1(0)\times \ldots \times \mathcal {F}_n^1(0)\). If (3) is true up to some \(\varepsilon >0\), we say that \(\overline{f}\in \mathcal {F}_n^1\times \ldots \times \mathcal {F}_n^n\) is an \(\varepsilon \)Nash equilibrium.
Remark 1
Note that for any n and any \(i\in \{ 1,\ldots ,n\}\), \(\mathcal {F}\) can be viewed as a subset of \(\mathcal {F}_n^i\). Moreover, it can be easily seen that in case all the players except some player i in an nperson counterpart of the meanfield game use strategies from \(\mathcal {F}\), the best response of i is also to use a strategy from \(\mathcal {F}\). This immediately implies that a Nash equilibrium in the class \((\mathcal {F})^n\) is in fact a Nash equilibrium in \(\mathcal {F}_n^1\times \ldots \times \mathcal {F}_n^n\). For that reason, in the sequel we will no longer use general strategies from \(\mathcal {F}_n^i\) when we talk about nperson games, concentrating on strategies from \(\mathcal {F}\) or from some subsets of this set.
2.3 Notation
As we have written, we assume that state and action spaces S and A are compact metric. The metric on S will be denoted by \(d_S\) while that on A by \(d_A\). Whenever we relate to a metric on a product space, we mean the sum of the metrics on its coordinates.
Whenever we speak about continuity of correspondences, we refer to the following definitions:
Let X and Y be two metric spaces and \(F:X\rightarrow Y\), a correspondence. Let \(F^{1}(G)=\{ x\in X: F(x)\cap G\ne \emptyset \}\). We say that F is upper semicontinuous iff \(F^{1}(G)\) is closed for any closed \(G\subset Y\). F is lower semicontinuous iff \(F^{1}(G)\) is open for any open \(G\subset Y\). F is said to be continuous iff it is both upper and lower semicontinuous. For more on (semi)continuity of correspondences, see [35], “Appendix D” or [3], Chapter 17.2.
3 Assumptions
 (A1)

Function r is continuous on \(S\times A\times \Delta (S\times A)\).
 (A2)

For any sequence \(\{ s_n,a_n,\tau _n\}\subset S\times A\times \Delta (S\times A)\) such that \(s_n\rightarrow s^*\), \(a_n\rightarrow a^*\) and \(\tau _n\Rightarrow \tau ^*\), \(Q(\cdot s_n,a_n,\tau _n)\Rightarrow Q(\cdot s^*,a^*,\tau ^*)\). Moreover, for any fixed s and any sequence \(\{ a_n,\tau _n\}\subset A\times \Delta (S\times A)\) such that \(a_n\rightarrow a^*\) and \(\tau _n\Rightarrow \tau ^*\), \(Q(\cdot s,a_n,\tau _n)\rightarrow Q(\cdot s,a^*,\tau ^*)\).
 (A3)
 (minorization property) There exist a constant \(\gamma >0\) and a probability measure \(P\in \Delta (S)\) such thatfor every \(s\in S\), \(a\in A\), \(\tau \in \Delta (S\times A)\) and any Borel set \(D\subset S\).$$\begin{aligned} Q(Ds,a,\tau )\ge \gamma P(D) \end{aligned}$$
 (A4)

The correspondence A is continuous.^{5}
 (A2’)

For any sequence \(\{ s_n,a_n,\tau _n\}\subset S\times A\times \Delta (S\times A)\) such that \(s_n\rightarrow s^*\), \(a_n\rightarrow a^*\) and \(\tau _n\Rightarrow \tau ^*\), \(Q(\cdot s_n,a_n,\tau _n)\Rightarrow Q(\cdot s^*,a^*,\tau ^*)\).
Remark 2
It is a standard result in dynamic programming [43] that the minorization property is for a timeinvariant Markov decision process equivalent to another property of uniform geometric ergodicity. In the following, we present a lemma that adapts this result to our case, linking the constants appearing in both assumptions. It also summarizes some other useful properties implied by (A3).
Lemma 1
 (a)for any \(f\in \mathcal {F}\) and any fixed state–action distribution of other players \(\tau \in \Delta (S\times A)\) there exists a unique measure \(p_{f,\tau }\in \Delta (S)\) such that$$\begin{aligned} \left\ Q^k(\cdot s,f,\tau )p_{f,\tau }\right\ _v\le 2\left( 1\frac{\gamma }{2}\right) ^{k} \quad \text{ for } k\ge 1,s\in S. \end{aligned}$$(6)
 (b)for any \(n\in \mathbb {N}\) and \(f_1,\ldots ,f_n\in \mathcal {F}\) there exists a unique measure \(p^n_{f_1,\ldots ,f_n}\in \Delta (S^n)\) such thatwith^{6}\(p^n_{\overline{f}}=p^{(n)}_{f_1,\overline{f}}\ldots p^{(n)}_{f_n,\overline{f}}\), where \(p^{(n)}_{f_i,\overline{f}}\in \Delta (S)\), \(i=1,\ldots ,n\) depend only on individual strategy of the player and the profile \(\overline{f}\); in particular, they are equal for any two players using the same strategy.$$\begin{aligned} \left\ Q_n^k(\cdot \overline{s},f_1,\ldots ,f_n)p^n_{f_1,\ldots ,f_n}\right\ _v \le 2\left( 1\frac{\gamma ^n}{2}\right) ^{k} \quad \text{ for } k\ge 1,\overline{s}\in S^n. \end{aligned}$$(7)
The proof of this lemma is given in “Appendix”.
Remark 3
Example 1
The fact that, unlike in nperson games considered in case (b) of the lemma, the limit distribution of individual states of a player may depend on the initial global state of the meanfield game suggests that in general the stationary behaviour of the meanfield game will not approximate well the limit behaviour of its nperson counterparts for large n.
4 The Existence of a Stationary MeanField Equilibrium
In this section, we address the problem of the existence of an equilibrium of discretetime meanfield games with longrun average payoff. Its main result is given as follows.
Theorem 1
Any discretetime meanfield game with longrun average payoff satisfying assumptions (A1–A4) has a stationary meanfield equilibrium.
Remark 4
Some ergodicity assumption is necessary for the existence of an equilibrium in discretetime averagepayoff meanfield game. See Example 3.1 in [48]. It is a matter of discussion though if we can assume less than (A3).
We precede the proof of the theorem with three lemmas.
Lemma 2
Suppose assumption (A4) holds. Then for any \(\mu \in \Delta (S)\) and \(\varepsilon >0\) there exist \(K_\varepsilon ^\mu \in \mathbb {N}\) and Borelmeasurable functions \(\alpha _i^\mu :S\rightarrow A\), \(i=1,\ldots ,K_\varepsilon ^\mu \) such that for any \(a\in A(s,\mu )\), \(\min _{i\le K_\varepsilon ^\mu } d_A(a,\alpha _i^\mu (s))<\varepsilon \).
Proof
In the previous lemma, we have proved the existence of a finite set of measurable functions \(\alpha _i^\mu \) such that for any \(s\in S\) and \(\mu \in \Delta (S)\) the set of values of these functions at s is an \(\varepsilon \)net of \(A(s,\mu )\). In the next one, for any sequence of state–action distributions \(\eta _n\Rightarrow \eta \) and any strategy \(f\in \mathcal {F}(\eta _S)\), we construct strategies \(f_n\in \mathcal {F}((\eta _n)_S)\) using at any point \((s,\mu )\) only actions from the set \(\{\alpha ^\mu _i(s), i=1,\ldots , K^\mu _{\frac{1}{n}}\}\), which approximate well in some sense the strategy f. This will be used to prove that the graph of the best response correspondence is closed in weak convergence topology.
Lemma 3
Proof
Now, using the above fact about the sequence of \(\varepsilon _n\) we prove that \(\Pi (f_n,p_{f_n,\eta _n})\Rightarrow \Pi (f,p_{f,\eta })\). We do it in three steps. In step 1, we prove by induction that for any fixed values of \(k\in \mathbb {N}\) and \(s\in S\), \(Q^k(\cdot s,f_n,\eta _n)\rightarrow Q^k(\cdot s,f,\eta )\).
In the next lemma, we show that any state–action distribution satisfying certain invariance property can be disintegrated into a stationary strategy and an invariant measure [as introduced in part (a) of Lemma 1] corresponding to this strategy. This will allow us to construct the best response correspondence used in the proof of Theorem 1 as a correspondence on the set of state–action measures rather than on a set of strategies.
Lemma 4
Proof
Proof of Theorem 1
Since \(u^\tau (\eta ):=\int _{S\times A}r(s,a,\tau )\eta (\mathrm{{d}}s\times \mathrm{{d}}a)\) is clearly a continuous function as by (A1) r is continuous, it assumes a maximum on \(\Theta (\tau )\), which implies that for any \(\tau \in \Delta (S\times A)\), \(\Psi (\tau )\ne \emptyset \). From the linearity of integral, it is also clear that for each \(\tau \in \Delta (S\times A)\), \(\Psi (\tau )\) is convex.
The existence of a fixed point of \(\Psi \) follows now from Glickberg’s fixed point theorem [29].
Remark 5
Note that the strong continuity part of assumption (A2) was only used in the proof of Lemma 3, which, in turn, was used to prove that the graph of \(\Psi \) is closed. If we assume that the feasible action correspondence \(A(s,\mu )\) does not depend on \(\mu \), then we do not need Lemma 3 for that (\(f_\sigma ^n=f_\sigma \in \mathcal {F}((\tau _n)_S)\) for any n, as \(\mathcal {F}(\mu )\equiv \mathcal {F}\) in that case). Hence, in that case the thesis of Theorem 1 is true under assumptions (A1), (A2’), (A3) and (A4).
5 Approximate Equilibria of nPerson Stochastic Games
In this section, we present two results showing that under some additional assumptions stationary equilibria of meanfield games considered in the previous section well approximate stationary strategy Nash equilibria of their nperson stochastic counterparts when n is large enough. The main problem with making such an approximation is that stationary meanfield equilibria only specify the behaviour of the players for one value of the global state of the game. It may be enough for the meanfield game, as there we can guarantee that this initial global state does not change over the course of the game, but certainly is not enough in case of its nperson counterparts. What we can do there whenever the game is in a global state different than the one specified by the meanfield equilibrium is to approximate it in some sense using the values of the equilibrium strategy specified for the meanfield equilibrium stationary global state. It turns out, in general, this is not enough to obtain a good approximation of equilibrium for nperson stochastic counterparts of the meanfield game, as shown by the following example. It is worth mentioning here that we know of only one other result of this kind appearing in the meanfield game literature [17]. In that paper, however, failure of the usual nplayer game approximation by its meanfield counterpart is a result of absorbing states in the model, whereas in the present paper this phenomenon seems to come from the ergodic cost structure.
Example 2
Now suppose all the players in nperson counterpart of this game use strategy \(f^*\). Note that the situation when all the individual states are zeros is clearly an absorbing state of the Markov chain of states of the nperson game. Also, regardless of the initial state of the game, the probability of not reaching it after t stages of the game is no more than \(\left( 1\frac{1}{3^n}\right) ^t\), which goes to zero as t goes to infinity. This clearly implies that after a finite number of stages all private states become zeros with probability 1. Hence, the average reward corresponding to the profile consisting of strategies \(f^*\) in the nperson counterpart of the meanfield game is 0. Now suppose that one of the players changes his strategy to \(g(\cdot s,\mu )=\delta _1(\cdot )\). Then the game is still absorbed at all private states equal to 0, but the ergodic reward of the player using strategy g is 1, so the profile of \(f^*\) is not an \(\varepsilon \) stationary Nash equilibrium in the nperson game for any \(\varepsilon < 1\).
In the following, we present two results showing that under some additional assumption the meanfield approximation of nperson anonymous stochastic games is good. In the first one, we consider the case where the individual transitions are independent from the global state of the game. This kind of assumption often appears in the meanfield game literature. Notably, it is considered in both existing papers on discretetime meanfield games with average rewards [12, 48].
Theorem 2
Suppose that \((f^*,\mu ^*)\) is a meanfield equilibrium in a discretetime meanfield game with longrun average payoff satisfying assumptions (A1), (A2’), (A3) and (A4). Assume further that the individual transitions of the players \(Q(\cdot s,a,\tau )=\widetilde{Q}(\cdot s,a)\) for any \(s\in S\), \(a\in A\) and \(\tau \in \Delta (S\times A)\) and that the feasible action correspondence \(A(s,\mu )\) does not depend on \(\mu \). Then for any \(\varepsilon >0\) there exists an \(n_0\) such that for any \(n\ge n_0\) the profile of strategies where each player uses strategy \(f(\cdot s,\mu )\equiv f^*(\cdot s)\) is an \(\varepsilon \)Nash equilibrium in nperson counterpart of the meanfield game.
The proof of this theorem is preceded by a lemma.
Lemma 5
Proof
Proof of Theorem 2
Theorem 3
 (a)
The stationary strategy f defined with the formula \(f(\cdot s,\mu )=f^*(\cdot s)\) for any \(s\in S\) and \(\mu \in \Delta (S)\) is an element of \(\mathcal {F}\). Moreover, it is weakly Lipschitz continuous with constant \(\beta _f\) as a function of s.
 (b)The transition kernel Q satisfies for any \(s\in S\), \(a_1,a_2\in A\) and \(\tau _1,\tau _2\in \Delta (S\times A)\)$$\begin{aligned} \Vert Q(\cdot s,a_1,\tau _1)Q(\cdot s,a_2,\tau _2)\}\Vert _v\le \beta _Q(\max \{ d_A(a_1,a_2),\rho _{S\times A}(\tau _1,\tau _2)\}). \end{aligned}$$(35)
 (c)
The constants \(\beta _f,\beta _Q\) satisfy \(\beta _Q(1+\beta _f)<\frac{\gamma }{2}\).
The proof of the theorem is preceded by three lemmas. In the first one, we prove that under the assumptions of Theorem 3 the invariant measures of the process of individual states of any given player in the meanfield game are uniquely determined given a strategy of this player and that of his opponents, which, as shown in Example 1, is not true in general.
Lemma 6
Proof
To prove the lemma for \(g\ne f\), note that by (8), \(p_{g,\Pi (f,\mu _{ff})}\) is an invariant measure corresponding to the Markov chain of individual states of a player when the behaviour of other players is distributed according to the distribution \(\Pi (f,\mu _{ff})\), so \(\mu _{gf}=p_{g,\Pi (f,\mu _{ff})}\) satisfies (36). As by Lemma 1, the chain is geometrically ergodic, the invariant measure is unique, so \(\mu _{gf}=p_{g,\Pi (f,\mu _{ff})}\). \(\square \)
The next lemma provides a strong technical result which will be repeatedly used to prove the convergence of the utilities in nperson counterparts of the meanfield game to those in the meanfield game as n goes to infinity.
Lemma 7
 (a)Suppose f is as given in Theorem 3 and let \(g_1,h_1,g_2,h_2\ldots \in \mathcal {F}_L\). Let further \(\mu ^n_f,\mu _g^n,\mu _h^n\in \Delta (S)\), \(n=1,2,\ldots \) and \(\tau ^n_g=\Pi (g_n(\cdot \cdot ,\mu ^n_f),\mu ^n_g)\), \(\tau ^n_h=\Pi (h_n(\cdot \cdot ,\mu ^n_f),\mu ^n_h)\) and \(\tau ^n_f=\Pi (f(\cdot \cdot ,\mu ^n_f),\mu ^n_f)\). If there exists a sequence \(\{ n_m\}\) such that \(\tau ^{n_m}_g\Rightarrow _{m\rightarrow \infty }\tau ^*_g\), \(\tau ^{n_m}_h\Rightarrow _{m\rightarrow \infty }\tau ^*_h\) and \(\tau ^{n_m}_f\Rightarrow _{m\rightarrow \infty }\tau ^*_f\) for some \(\tau ^*_g,\tau ^*_h,\tau ^*_f\in \Delta (S\times A)\), then for any continuous function \(u:S\times A\times \Delta (S\times A)\rightarrow \mathbb {R}\) the following is true:$$\begin{aligned}&\int _{S^{n_m}}\int _{A^{n_m}}u\left( s_i,a_i,\frac{1}{n_m}\sum _{k=1}^{n_m}\delta _{(s_k,a_k)}\right) g\left( \mathrm{{d}}a_is_i,\frac{1}{n_m}\sum _{k=1}^{n_m}\delta _{s_k}\right) \nonumber \\&\quad \times \, h\left( \mathrm{{d}}a_ls_l,\frac{1}{n_m}\sum _{k=1}^{n_m}\delta _{s_k}\right) \Pi _{j\ne i,l}f\left( \mathrm{{d}}a_js_j,\frac{1}{n_m}\sum _{k=1}^{n_m}\delta _{s_k}\right) \nonumber \\&\quad \times \, \mu _g^{n_m}(\mathrm{{d}}s_i)\mu _h^{n_m}(\mathrm{{d}}s_l)\Pi _{j\ne i,l}\mu _f^{n_m}(\mathrm{{d}}s_j)\rightarrow _{m\rightarrow \infty }\int _S \int _Au(s_i,a_i,\tau ^*_f)\tau ^*_g(\mathrm{{d}}s_i\times \mathrm{{d}}a_i)\qquad \quad \end{aligned}$$(39)
 (b)If for each n, \(g_n=g\), then the RHS of (39) can be written as$$\begin{aligned} \int _S\int _Au\big (s_i,a_i,\tau ^*_f\big )g(\mathrm{{d}}a_is_i,\big (\tau ^*_f\big )_S)\big (\tau ^*_g\big )_S(\mathrm{{d}}s_i). \end{aligned}$$
Proof
In the last lemma, we prove the convergence of the unique invariant measures of the process of individual states of a player corresponding to given strategies of the player and his opponents in nperson counterparts of the meanfield game to those in the meanfield game.
Lemma 8
Proof
Let now \(\tau ^n_g:=\Pi (g,p^{(n)}_{g,[\overline{f}_{i},g]})\) and \(\tau ^n_f:=\Pi (f,p^{(n)}_{f,[\overline{f}_{i},g]})\). As \(\Delta (S\times A)\) is compact metric, every sequence \(\{(\tau ^{n_m}_g,\tau ^{n_m}_f)\}\) must contain a convergent subsequence. Let \(\tau ^*_g=\lim _{l\rightarrow \infty } \tau ^{n_{m_l}}_g\) and \(\tau ^*_f=\lim _{l\rightarrow \infty }\tau ^{n_{m_l}}_f\).
So far we have shown that \((\tau ^{n_m}_g)_S=p^{(n_m)}_{g,[\overline{f}_{i},g]}\) has a subsequence converging to \(\mu _{gf}\). However, as the subsequence \(\tau ^{n_m}_g\) was arbitrary, this proves that the entire sequence \((\tau ^n_g)_S=p^{(n)}_{g,[\overline{f}_{i},g]}\) converges to \(\mu _{gf}\). \(\square \)
Proof of Theorem 3
Remark 6
6 Concluding Remarks
In the paper, we have presented a model of discretetime meanfield game with compact state and action spaces and average reward. Under some strong ergodicity assumption, we have shown that it possesses a stationary meanfield equilibrium. Next, we have presented an example showing that in case of averagereward criterion usual approximation of nperson games with its meanfield counterpart may fail. Finally, we have identified some cases when stationary equilibria of the meanfield game can approximate well the Nash equilibria of its nperson stochastic game counterparts. As we have seen, some strong additional assumptions were required to obtain this kind of results. A natural question arises whether there are other conditions that can give a good approximation of nperson models by their counterpart with a continuum of players. One of the directions that we can follow in answering this question is limiting ourselves to games played on subsets of the real line. In that case, considering some assumptions of ordinal type rather than general topological properties may give a good result. Other natural questions are, whether the results from this article can be extended to games played on general, noncompact state and action sets and whether considering Markov strategies instead of stationary ones can result in a larger class of models where meanfield limit approximates well its nperson counterparts when n is large. All these questions seem both interesting and highly nontrivial.
Footnotes
 1.
Here and in the sequel, the Borel \(\sigma \)algebra on a given set X is denoted by \(\mathcal {B}(X)\), while the set of probability distributions on \((X,\mathcal {B}(X))\) is denoted by \(\Delta (X)\).
 2.
We shall use similar notation also in case of general stationary strategies from \(\mathcal {F}\). In that case, \(\Pi (f(\cdot \cdot ,\mu _1),\mu _2)(D)\) will denote \(\int _D f(\mathrm{{d}}as,\mu _1)\mu _2(\mathrm{{d}}s)\).
 3.
Here we omit the superscript i used to define the measure \(\mathbb {P}^{\mu _0,Q,f}\), as the situation is symmetric.
 4.
Here and in the sequel, for any \(\tau \in \Delta (S\times A)\), \(\tau _S\) denotes the Smarginal of the measure \(\tau \).
 5.
With the source space \(\Delta (S)\) endowed with the weak convergence topology.
 6.The notation \(P=P_1\cdots P_n\) stands here and in the sequel for the product measure \(P\in \Delta (S^n)\) defined by the formula$$\begin{aligned} P(B)=\int _BP_1(\mathrm{{d}}s_1)\cdot \ldots \cdot P_n(\mathrm{{d}}s_n)\quad \text{ for } B\in \mathcal {B}(S^n). \end{aligned}$$
 7.
See also Theorems A.6 and 2.3 in [14], defining the constants appearing in Corollary 2.4. The fact that the constants \(C_1\) and \(C_2\) can be taken independently from n follows from compactness of \(S\times A\)—then K in Theorem 2.3 can be taken equal to \(S\times A\) and a in Theorem A.6 may be arbitrary.
 8.
We make use here of the assumption that \(g_{n}=g\) for each n.
Notes
Acknowledgements
The author would like to thank two anonymous referees for their constructive remarks which helped to significantly improve the presentation of the results. He is also greatly indebted to professor Andrzej S. Nowak for his help during the writing of this article.
References
 1.Achdou Y, Capuzzo Dolcetta I (2010) Mean field games: numerical methods. SIAM J Numer Anal 48–3:1136–1162MathSciNetzbMATHGoogle Scholar
 2.Adlakha S, Johari R (2013) Mean field equilibrium in dynamic games with strategic complementarities. Oper Res 61(4):971–989MathSciNetzbMATHGoogle Scholar
 3.Aliprantis CD, Border KC (1999) Infinite dimensional analysis. A Hitchhiker’s guide. Springer, BerlinzbMATHGoogle Scholar
 4.Almulla N, Ferreira R, Gomes DA (2017) Two numerical approaches to stationary meanfield games. Dyn Games Appl 7(4):657–682MathSciNetzbMATHGoogle Scholar
 5.Arapostathis A, Biswas A, Carroll J (2017) On solutions of mean field games with ergodic cost. J Math Pures Appl 107(2):205–251MathSciNetzbMATHGoogle Scholar
 6.Bardi M, Feleqi E (2016) Nonlinear elliptic systems and meanfield games. NoDEA Nonlinear Differ Equ Appl 23(4):44MathSciNetzbMATHGoogle Scholar
 7.Bardi M, Priuli FS (2014) Linearquadratic \(N\)person and meanfield games with ergodic cost. SIAM J Control Optim 52(5):3022–3052MathSciNetzbMATHGoogle Scholar
 8.Bensoussan A, Frehse J, Yam P (2013) Mean field games and mean field type control theory. Springer, New YorkzbMATHGoogle Scholar
 9.Bergin J, Bernhardt D (1992) Anonymous sequential games with aggregate uncertainty. J Math Econ 21:543–562MathSciNetzbMATHGoogle Scholar
 10.Bergin J, Bernhardt D (1995) Anonymous sequential games: existence and characterization of equilibria. Econ Theory 5(3):461–89MathSciNetzbMATHGoogle Scholar
 11.Bertsekas DP, Shreve SE (1978) Stochastic optimal control: the discrete time case. Academic Press, New YorkzbMATHGoogle Scholar
 12.Biswas A (2015) Mean field games with ergodic cost for discrete time Markov processes. arXiv:1510.08968
 13.Bogachev VI (2007) Measure theory, vol II. Springer, BerlinzbMATHGoogle Scholar
 14.Boissard E (2011) Simple bounds for convergence of empirical and occupation measures in 1Wasserstein distance. Electron J Probab 16:2296–2333MathSciNetzbMATHGoogle Scholar
 15.BricenoArias L, Kalise D, Silva FJ (2018) Proximal methods for stationary mean field games with local couplings. SIAM J Control Optim 56(2):801–836MathSciNetzbMATHGoogle Scholar
 16.Calderone D, Sastry SS (2017) Infinitehorizon averagecost Markov decision process routing games. In: IEEE 20th international conference on intelligent transportation systems (ITSC), Yokohama, pp 16–19Google Scholar
 17.Campi L, Fischer M (2018) \(N\)player games and meanfield games with absorption. Ann Appl Probab 28(4):2188–2242MathSciNetzbMATHGoogle Scholar
 18.Cardaliaguet P (2013) Long time average of first order mean field games and weak KAM theory. Dyn Games Appl 3(4):473–488MathSciNetzbMATHGoogle Scholar
 19.Cardaliaguet P, Lasry JM, Lions PL, Porretta A (2012) Long time average of mean field games. Netw Heterog Media 7(2):279–301MathSciNetzbMATHGoogle Scholar
 20.Cardaliaguet P, Lasry JM, Lions PL, Porretta A (2013) Long time average of mean field games with a nonlocal coupling. SIAM J Control Optim 51(5):3558–3591MathSciNetzbMATHGoogle Scholar
 21.Carmona R, Delarue F (2018) Probabilistic theory of mean field games with applications. Springer, BerlinzbMATHGoogle Scholar
 22.Chakrabarti SK (2003) Pure strategy Markov equilibrium in stochastic games with a continuum of players. J Math Econ 39(7):693–724MathSciNetzbMATHGoogle Scholar
 23.Cirant M (2015) Multipopulation mean field games systems with Neumann boundary conditions. J Math Pures Appl 103(5):1294–1315MathSciNetzbMATHGoogle Scholar
 24.Cirant M (2016) Stationary focusing meanfield games. Commun Partial Differ Equ 41(8):1324–1346MathSciNetzbMATHGoogle Scholar
 25.Dragoni F, Feleqi E (2018) Ergodic mean field games with Hörmander diffusions. Calc Var Partial Differ Equ 57:116. https://doi.org/10.1007/s0052601813911 zbMATHGoogle Scholar
 26.Dudley RM (2004) Real analysis and probability. Cambridge University Press, CambridgeGoogle Scholar
 27.Elliot R, Li X, Ni Y (2013) Discrete time meanfield stochastic linearquadratic optimal control problems. Automatica 49:3222–3233MathSciNetzbMATHGoogle Scholar
 28.Feleqi E (2013) The derivation of ergodic mean field game equations for several populations of players. Dyn Games Appl 3(4):523–536MathSciNetzbMATHGoogle Scholar
 29.Glicksberg IL (1952) A further generalization of the Kakutani fixed point theorem with application to Nash equilibrium points. Proc Am Math Soc 3:170–174MathSciNetzbMATHGoogle Scholar
 30.Gomes DA, Mitake H (2015) Existence for stationary meanfield games with congestion and quadratic Hamiltonians. NoDEA Nonlinear Differ Equ Appl 22(6):1897–1910MathSciNetzbMATHGoogle Scholar
 31.Gomes DA, Patrizi S, Voskanyan V (2014) On the existence of classical solutions for stationary extended mean field games. Nonlinear Anal 99:49–79MathSciNetzbMATHGoogle Scholar
 32.Gomes DA, Saúde J (2014) Mean field games models—a brief survey. Dyn Games Appl 4(2):110–154MathSciNetzbMATHGoogle Scholar
 33.Granas A, Dugundji J (2003) Fixed point theory. Springer, New YorkzbMATHGoogle Scholar
 34.Haurie A, Krawczyk JB, Zaccour G (2012) Games and dynamic games. World Scientific, SingaporezbMATHGoogle Scholar
 35.HernándezLerma O, Lasserre JB (1996) Discretetime Markov control processes: basic optimality criteria. Springer, BerlinzbMATHGoogle Scholar
 36.Hinderer K (1970) Foundations of nonstationary dynamic programming with discretetime parameter, vol 33. Lecture notes in operations research and mathematical systems. Springer, BerlinzbMATHGoogle Scholar
 37.Huang M, Malhamé RP, Caines PE (2006) Large population stochastic dynamic games: closedloop McKean–Vlasov systems and the Nash certainty equivalence principle. Commun Inf Syst 6:221–252MathSciNetzbMATHGoogle Scholar
 38.Jovanovic B, Rosenthal RW (1988) Anonymous sequential games. J Math Econ 17:77–87MathSciNetzbMATHGoogle Scholar
 39.Lasry JM, Lions PL (2007) Mean field games. Jpn J Math 2(1):229–260MathSciNetzbMATHGoogle Scholar
 40.Mészáros AR, Silva FJ (2017) On the variational formulation of some stationary secondorder mean field games systems. SIAM J Math Anal 50(1):1255–1277MathSciNetzbMATHGoogle Scholar
 41.Nowak A (1998) A generalization of Ueno’s inequality for \(n\)step transition probabilities. Appl Math 25(4):295–299MathSciNetzbMATHGoogle Scholar
 42.Pimentel EA, Voskanyan V (2017) Regularity for second order stationary meanfield games. Indiana Univ Math J 66:1–22MathSciNetzbMATHGoogle Scholar
 43.Rieder U (1979) On nondiscounted dynamic programming with arbitrary state space. University of Ulm, UlmGoogle Scholar
 44.Royden HL (1968) Real analysis. Macmillan, LondonzbMATHGoogle Scholar
 45.Saldi N, Başar T, Raginsky M (2016) MarkovNash equilibria in meanfield games with discounted cost. arXiv:1612.07878
 46.Serfozo R (1982) Convergence of Lebesgue integrals with varying measures. Sankhya: Indian J Stat, Ser A 44(3):380–402MathSciNetzbMATHGoogle Scholar
 47.Ueno T (1957) Some limit theorems for temporally discrete Markov process. J Fac Sci Univ Tokyo 7:449–462MathSciNetzbMATHGoogle Scholar
 48.Więcek P, Altman E (2015) Stationary anonymous sequential games with undiscounted rewards. J Optim Theory Appl 166(2):686–710MathSciNetzbMATHGoogle Scholar
Copyright information
OpenAccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.