Discrete-Time Ergodic Mean-Field Games with Average Reward on Compact Spaces

We present a model of discrete-time mean-field game with compact state and action spaces and average reward. Under some strong ergodicity assumption, we show it possesses a stationary mean-field equilibrium. We present an example showing that in general an equilibrium for this game may not be a good approximation of Nash equilibria of the n-person stochastic game counterparts of the mean-field game for large n. Finally, we identify two cases when the approximation is good.


Introduction
Mean-field game theory has been developed independently by Lasry and Lions [39] and by Huang et al. [37] to study non-cooperative differential games with a large number of identical players.The main idea behind their models was that by approximating the game with a limit where the number of players is infinite, we can reduce the game problem, which for a large finite number of players becomes untractable, to a much simpler single-agent decision problem.The idea has been largely accepted by the differential game community, which resulted in a huge number of publications on the topic over the last decade.The reader interested in differential-type mean-field game models discussed so far is referred to the books [8,21] or the survey [32].
Our focus in this paper is, however, on similar discrete-time models, which, surprisingly, appeared in the game-theoretic literature long before the pioneering works on mean-field games.In the seminal paper by Jovanovic and Rosenthal [38], each player controls an individual discrete-time Markov chain, while the global state of the game, defined as the probability distribution over individual states of all the players, becomes deterministic.While the tools used there were significantly different from those considered in differential mean-field game literature, the general principle, which was to simplify the original large game problem by considering an approximation with one-agent optimization models, stayed the same.Some generalizations of model of Jovanovic and Rosenthal were given in [2,9,10,22,27,45].All of these papers considered games with discounted rewards (costs).Discounted discrete-time mean-field games were also studied in a number of economic applications, see references in [2].
Our paper deals with a different reward criterion-long-run average reward (sometimes also called ergodic reward), often used in Markov decision process and dynamic game problems, yet hardly present in the discrete-time mean-field game literature.To the best of our knowledge, there are only three papers dealing with this kind of problems in a discrete-time setting, discussed in more detail below.The literature on differential-type mean-field games with this payoff criterion is a lot more extensive.In [28,39], results about relation between games with a large finite number of players and mean-field games of this type are proved.[18][19][20] discuss the relation between the solutions of ergodic mean-field games and mean-field games with large fixed time horizon.Existence and uniqueness of solutions to average-reward mean-field games are addressed in many articles including [5][6][7][23][24][25]30,31,39,40,42] and a number of preprints.Finally, [1,4,15] provide some numerical methods for solving this type of games.The first model of discrete-time mean-field game with average reward has been introduced in [48], where the existence of a stationary mean-field equilibrium has been proved under some ergodicity assumption in case when state and action spaces of the players are finite.Under the additional assumption that the individual transitions of the players do not depend on the empirical distribution of states or actions of all the players, it also shows that the mean-field model approximates well the n-person models for n large enough.Similar assumption has also been made in [12], where average-reward games with σ -compact Polish individual state spaces were studied.The problem is that apart from this assumption, the results in [12] used some strong regularity conditions stated in terms of a specific metric topology on the state of stationary policies, which seem to be too strong to be satisfied under any reasonable assumptions.In the last paper, we need to mention here [16] average-reward discrete-time mean-field games were used to study a dynamic routing model.The main contribution of the paper was presenting a linear-programming formulation of the problem of finding a stationary equilibrium in games of this type.
In our paper, we do not consider such a general setting as that in [12], limiting ourselves to the games with compact state and action spaces.In return, within this framework we make assumptions that are satisfied by a large class of models.Moreover, we state them in terms of basic primitives of the model, making them rather easy to verify.Finally, in general we do not require the independence of the individual transitions from the empirical distribution of states and actions of the players.In our article, we give the results of two types.First, under the assumptions given in Sect.3, we show that the mean-field game has a stationary equilibrium.Then, we provide several results, both positive and negative, linking equilibria in the model with a continuum of players with ε-equilibria in its n-person stochastic counterparts when n is large.
The organization of the paper is as follows: In Sect.2, we present the general framework we are going to work with and define what kind of solutions we will be looking for.In Sect.3, we present our assumptions.Sections 4 and 5 provide our main results-in Sect. 4 we prove the existence of the stationary equilibrium in the mean-field game model, while in Sect. 5 we give results linking equilibria in the mean-field game with approximate equilibria in games with large finite number of players.We end the paper with conclusions in Sect.6.

Discrete-Time Mean-Field Games
A discrete-time mean-field game is described by the following objects: -We assume that the game is played in discrete time, that is, t ∈ {1, 2, . ..}.
-The game is played by an infinite number (continuum) of players.Each player has a private state s ∈ S, changing over time.We assume that the set of individual states S is the same for each player and that it is a non-empty compact metric space.Private state of player i at time t is denoted by s i t .If we refer to an arbitrary player, we skip the superscript i.
-A probability distribution μ over Borel sets 1 gives the reward of a player at any stage of the game when his private state is s, his action is a and the distribution of state-action pairs among the entire player population is τ .
-Transitions are defined for each individual separately with a transition kernel Q : S × A × (S × A) → (S).Q(B|•, •, τ ) is product measurable for any B ∈ B(S) and any τ ∈ (S × A).
-Global state at time t + 1 is given by the aggregation of individual transitions of the players, As it can be clearly seen from the above formula, the transition of the global state is deterministic.
A function f : S × (S) → (A), such that f (B|•, μ) is measurable for any B ∈ B( A) and any μ ∈ (S), satisfying f (A(s, μ)|s, μ) = 1 for every s ∈ S and μ ∈ (S) is called a stationary strategy.The set of all stationary strategies is denoted by F .In the paper, we never consider general (history-dependent) strategies.When we talk about mean-field games, we also use stationary strategies depending only on the individual state of the player.Since in general the set of feasible actions is also a function of the global state, we define F (μ) as the set of functions f : S → (A) such that f (B|•) is measurable for any B ∈ B( A), satisfying f (A(s, μ)|s) = 1 for every s ∈ S. We can identify any f ∈ F (μ) with the class of all stationary strategies f ∈ F satisfying f (•|s) = f (•|s, μ) for any s ∈ S.
Next, let ( f , μ) denote the state-action distribution of the players in the mean-field game corresponding to a global state μ and a stationary strategy Given the evolution of the global state, which depends on the strategies of the players in a deterministic manner, we can define the individual history of a player i as the sequence of his consecutive individual states and actions h = (s i 0 , a i 0 , s i 1 , a i 1 , . ..).By the Ionescu-Tulcea theorem (see Chap. 7 in [11]), for any stationary strategies f of player i and g of other players and any initial individual state distribution μ 0 , there exists a unique probability measure P μ 0 ,Q, f ,g on the set of all infinite histories of the game H = (S × A) ∞ endowed with Borel σ -algebra, such that for any B ∈ B(S), D ∈ B( A) and any partial history with state-action distributions defined recursively by τ 0 = (g, μ 0 ), τ t+1 = (g, (•|τ t )) for t = 1, 2, . ... We can define the long-time average reward of a player using policy f ∈ F when all the other players use policy g ∈ F and the initial state distribution (both of the player and his opponents) is μ 0 , to be3 where τ 0 = (g, μ 0 ) and τ t+1 = (g, (•|τ t )) for t = 1, 2, . ... Next, we define the solution we will be looking for: Definition 1 A stationary strategy f and a measure μ ∈ (S) form a stationary mean-field equilibrium in the long-time average reward game if f ∈ F (μ), for every other stationary strategy g ∈ F (μ)

n-Person Stochastic Games
The main reason to consider mean-field games is that usually under some fairly mild assumptions they can approximate well some n-person dynamic games defined with the same data when n is large enough.It is similar in our case.The n-person games that will be approximated by our model are discrete-time n-person stochastic games as defined in [34].In our case, we consider n-person stochastic counterparts of the mean-field game defined by the following objects: -The state space is S n and the action space for each player is A. Similarly as in the case of the mean-field game, the set of actions available to player i in state s = (s 1 , . . ., s n ) is given by . ., n is defined for any profile of players' states s = (s 1 , . . ., s n ) and any profile of players' actions a = (a 1 , . . ., a n ) by -The transition probability Q n : S n × A n → (S n ) can be defined for any s ∈ S n and a ∈ A n by the formula (for the clarity of exposition we write it only for Borel rectangles, which obviously defines the product measure): -In n-person game, we consider stationary strategies f : S n → (A) (satisfying, for each player i, two standard conditions: The set of all stationary strategies for player i is denoted by F i n .-The functional maximized by each player is his average reward defined for any initial state s 0 ∈ S n and any profile of stationary strategies f = ( f 1 , . . ., f n ) by the formula with P s 0 ,Q n , f denoting the measure on the set of all infinite histories of the game corresponding to s 0 , Q n and f defined with the help of the Ionescu-Tulcea theorem similarly as in case of the mean-field game.-Finally, the solution we will be looking for in n-person counterparts of the stochastic game is that of Nash equilibrium, which is the standard solution concept considered in the stochastic game literature: for any s, any g ∈ F i n , and i ∈ {1, . . ., n}.
The notation [ f −i , g] denotes here and in the sequel the profile of strategies f with its ith component replaced by g.If we only show that the above inequality is only true for strategies g from some subclasses F i n (0) ⊂ F i n , we say that f is a Nash equilibrium in the class Remark 1 Note that for any n and any i ∈ {1, . . ., n}, F can be viewed as a subset of F i n .Moreover, it can be easily seen that in case all the players except some player i in an n-person counterpart of the mean-field game use strategies from F , the best response of i is also to use a strategy from F .This immediately implies that a Nash equilibrium in the class (F) n is in fact a Nash equilibrium in F 1 n × . . .× F n n .For that reason, in the sequel we will no longer use general strategies from F i n when we talk about n-person games, concentrating on strategies from F or from some subsets of this set.

Notation
As we have written, we assume that state and action spaces S and A are compact metric.The metric on S will be denoted by d S while that on A by d A .Whenever we relate to a metric on a product space, we mean the sum of the metrics on its coordinates.
The convergence of probability measures defined on one of these spaces may be of three types.The one that we will use most often is the weak convergence.To denote the weak convergence of measures, we will always use the symbol ⇒.It is known that for a compact metric set X , (X ) endowed with weak convergence topology is compact and metrizable (see e.g.Prop.7.22 in [11]).There are several metrics consistent with weak convergence topology.In all of our considerations, whenever we use a metric on (X ) defining the weak convergence, we use the metric (see Theorem 11.3.3 in [26]) where μ 1 , μ 2 ∈ (X ) and • B L is the metric on the set of bounded Lipschitz continuous functions from X to R defined by the formula To make a distinction between metrics defining weak convergence on different sets, we will also use subscripts S, A etc. The second type of convergence used in the paper is the convergence in the complete variation norm • v (usually simply called 'norm convergence') defined for any finite signed measure μ on (X , B(X )) as follows: When writing about this type of convergence, we will directly relate to the norm.
The last type of convergence we will be using is the strong (or setwise) convergence denoted by → and defined as follows: It is weaker than norm convergence, but the topology defined by it is neither metrizable nor sequential, which makes it much less useful in practice.
Finally, in some proofs, we will also make use of the 1-Wasserstein distance defined for measures on (X , B(X )) with finite 1st moment.If we assume that X is compact, each probability measure has a finite 1st moment; hence, the 1-Wasserstein distance can be used for any μ 1 , μ 2 ∈ (X ).One of equivalent definitions of the 1-Wasserstein distance W 1 is then as follows (see p. 234 in [13]): It is clear from the definitions of ρ, • v and W 1 that for any μ 1 , μ 2 ∈ (S) we have We will make use of these inequalities several times in our proofs.
Whenever we speak about continuity of correspondences, we refer to the following definitions: Let X and Y be two metric spaces and F is said to be continuous iff it is both upper and lower semicontinuous.For more on (semi)continuity of correspondences, see [35], "Appendix D" or [3], Chapter 17.2.
Further, we define k-step transitions in mean-field and n-person models.For any stationary strategy f ∈ F and any constant state-action distribution τ ∈ (S × A), we can define kstep individual transition probability corresponding to Q when player uses strategy f against state-action distribution of the others τ as follows 4 : Next, let us define k-step transition probability in n-person counterpart of the mean-field game corresponding to Q n and the profile of stationary strategies f = ( f 1 , . . ., f n ) ∈ F n when the initial states of the players are s 1 , . . ., s n (for the clarity of exposition again we write it only for Borel rectangles): As before, we use the convention that

Assumptions
In the following section, we present our main assumptions which will be used in case of both mean-field games and their stochastic counterparts.Unlike in [12], all the assumptions are directly related to the primitives of the model.
Moreover, for any fixed s and any sequence ) (minorization property) There exist a constant γ > 0 and a probability measure for every s ∈ S, a ∈ A, τ ∈ (S × A) and any Borel set D ⊂ S. (A4) The correspondence A is continuous. 5 weaker version of assumption (A2) will be used in several places: Remark 2 While assumptions (A1) and (A4) are both quite easy to check and satisfied for a wide variety of models, for many readers it may not be obvious, what kind of stochastic kernels satisfy assumptions (A2-A3).In the following, we try to answer this question.The most natural type of stochastic kernels that satisfy (A2) is defined by the formula where q : S × S × A × S × A → R + ∪ {0} is a measurable probability density function continuous with respect to (s, a, s , a ) for every fixed z ∈ S, and μ is any fixed σ -finite measure on S.This gives already quite a large class of transition probabilities satisfying (A2), including as a particular case any kernel concentrated on a fixed discrete subset of S.
It can be further extended by considering stochastic kernels being convex combinations with continuous weight functions of several kernels of form (4) (probably defined with the help of different measures μ i ) and those of two following forms (in both cases the transition does not depend on a or τ ): where h : S → S is continuous; where Y is some Borel space, F : S × Y → S is a measurable function such that F(•, y) is continuous on S for every fixed y ∈ Y and ν is a probability distribution on Y .If we assume that for some i 0 , Q(B|s, a, τ ) ≡ μ i 0 for some probability measure μ i 0 [this is obviously a specific case of kernel of type ( 4)] and λ i 0 > 0, the transition probability obtained automatically satisfies the minorization property (A3) with P = μ i 0 and γ = min (s,a,τ )∈S×A× (S×A) λ i 0 (s, a, τ ).
A stochastic kernel satisfying (A2') and (A3) can be constructed in a similar manner, but here we should consider convex combinations of kernels of types ( 4), (5) with kernels defined by It is a standard result in dynamic programming [43] that the minorization property is for a time-invariant Markov decision process equivalent to another property of uniform geometric ergodicity.In the following, we present a lemma that adapts this result to our case, linking the constants appearing in both assumptions.It also summarizes some other useful properties implied by (A3).

Lemma 1 Suppose the transition probability Q satisfies assumption (A3). Then: (a) for any f ∈ F and any fixed state-action distribution of other players τ ∈ (S × A)
there exists a unique measure p f ,τ ∈ (S) such that , where p

. , n depend only on individual
strategy of the player and the profile f ; in particular, they are equal for any two players using the same strategy.
The proof of this lemma is given in "Appendix".
Remark 3 Note that using (6) we can show that for any As the Markov chain of individual states of a player using f against τ is by Lemma 1 geometrically ergodic, it is known that for any strategy f ∈ F , any distribution of initial individual state μ 0 and any τ ∈ (S × A) fixed over time, lim with expectation on the LHS taken with respect to the unique probability measure ) and defined with the help of the Ionescu-Tulcea theorem.
Similarly, we can show that (7) implies for any s 0 ∈ S n and f and These are important properties that we will repeatedly use to compute average rewards corresponding to strategies in both the mean-field game and its n-person stochastic counterparts.
Example 1 It is important to note that the thesis of part (a) of Lemma 1 cannot be strengthened by showing that the limit measure p f ,τ does not depend on the initial global state μ 0 = τ S --only on strategies used by the players.Suppose S = {0, 1} and the transition kernel Q depends only on the global state of the game (thus, whatever the strategy, it does not affect the transitions) in the following way: It is easy to check that for any α ∈ (0, 1), Q satisfies all the assumptions of our model; in particular, assumption (A3) is satisfied for γ = α and P = δ 0 .Clearly, however, for μ = δ 0 the individual state of the player moves after one step to 0 and stays there forever, while for The fact that, unlike in n-person games considered in case (b) of the lemma, the limit distribution of individual states of a player may depend on the initial global state of the mean-field game suggests that in general the stationary behaviour of the mean-field game will not approximate well the limit behaviour of its n-person counterparts for large n.

The Existence of a Stationary Mean-Field Equilibrium
In this section, we address the problem of the existence of an equilibrium of discrete-time mean-field games with long-run average payoff.Its main result is given as follows.
Theorem 1 Any discrete-time mean-field game with long-run average payoff satisfying assumptions (A1-A4) has a stationary mean-field equilibrium.
Remark 4 Some ergodicity assumption is necessary for the existence of an equilibrium in discrete-time average-payoff mean-field game.See Example 3.1 in [48].It is a matter of discussion though if we can assume less than (A3).
We precede the proof of the theorem with three lemmas.
Lemma 2 Suppose assumption (A4) holds.Then for any μ ∈ (S) and ε > 0 there exist K The map A(s, μ) is continuous with non-empty compact values, and the functions a → d A (a, a i ) are continuous.Hence, by Theorem 18.19 in [3] each A μ i admits a Borel-measurable selection.Let α μ i be the measurable selector from A μ i .Then by the definition of ε 2 -net for any s ∈ S and any a ∈ A(s, μ) there exists an i such that d A (a, a i ) < ε 2 .But for such an i, as by the definition of In the previous lemma, we have proved the existence of a finite set of measurable functions α μ i such that for any s ∈ S and μ ∈ (S) the set of values of these functions at s is an ε-net of A(s, μ).In the next one, for any sequence of state-action distributions η n ⇒ η and any strategy f ∈ F (η S ), we construct strategies f n ∈ F ((η n ) S ) using at any point (s, μ) only actions from the set {α }, which approximate well in some sense the strategy f .This will be used to prove that the graph of the best response correspondence is closed in weak convergence topology.
are the functions defined in Lemma for any s ∈ S. Thus, proving that f n ∈ F ((η n ) S ) requires only showing that for any fixed thus to prove the measurability of f n (B|•) we only need to show that for every n and i, Since f is a Borel-measurable stochastic kernel, according to Proposition 7.29 in [11], to prove that is Borel-measurable.Clearly, for any C n i , its complement or the empty set.Thus, what we only need to show is that for any n and i the set C n i ∈ B(S × A).To this end, first note that The first set is the graph of A(•, η S ), which is closed by (A4).To show that each of the K −1 other sets is Borel, we only need to note that for any two functions g : and h : S → A such that g is continuous and h Borel-measurable, the set {(s, a) ∈ S × A : g(h(s), a) < 0} is Borel, as (s, a) → g(h(s), a) is a composition of Borel functions and hence also a Borel function.This leads us to the conclusion that each C n i is also Borel as a finite intersection of Borel sets, which proves that functions f n (B|•) are measurable.
Next, let us define We will show that ε n → n→∞ 0. Suppose it is not the case, which means that there exists a subsequence of {ε n } converging to some β > 0. Without loss of generality, we may assume that it is the entire sequence {ε n } that converges to β.This implies that for n big enough there exist s n ∈ S and a n ∈ A(s n , η S ) such that min i:1≤i≤K Since A and S are compact, there exists a subsequence of {s n , a n }, {s n k , a n k }, converging to some (s * , a * ).The values of A are closed, so a * ∈ A(s * , η S ).Next, since by assumption (A4) A is continuous, there exists another sequence { a n k } such that a n k ∈ A(s n k , (η n k ) S ) for each k and lim k→∞ a n k = a * .From the definition of functions α , we know that for each k there exists an i k such that Then min i:1≤i≤K However, this, together with (13), and the fact that {a n k } and { a n k } have the same limit imply that lim k→∞ min i:1≤i≤K so for k large enough min i:1≤i≤K which contradicts (12).Now, using the above fact about the sequence of ε n we prove that ).We do it in three steps.In step 1, we prove by induction that for any fixed values of k ∈ N and s Let us take any ε > 0. For k = 1 and any B ∈ B(S), we have The function Q(B|s, •, •) is by (A2) continuous on a compact domain A × (S × A), hence uniformly continuous.Then there exists a ζ > 0 such that for any a 1 , ).We will prove the same is true for k + 1.As before, we fix B ∈ B(S).
for any s by the first step of the induction, Prop.C.12 in [35] (see also [44] p. 232) implies that (14) goes to zero as n goes to infinity, proving that for any The next step of the proof is showing that p f n ,η n → p f ,η .Take an ε > 0 and fix any B ∈ B(S) and s 0 ∈ S. By Lemma 1, and for k big enough, say k ≥ k 0 .From what we have already shown, we can also find an n 0 ∈ N, such that for n ≥ n 0 , If we add (15)(16)(17) side by side, we obtain The value of ε was arbitrary, so this proves that p f n ,η n → p f ,η .To end the proof of the lemma, we only need to show that Take any bounded continuous function w : w(s, a) f (da|s) p f ,η (ds) The first term goes to zero as n goes to infinity, as A w(s, a) f (da|s) is a bounded measurable function and, as we have just shown, p f n ,η n → p f ,η .To prove that the second term also converges to zero as n → ∞, take any ε > 0 w is a continuous function defined on a compact domain, hence uniformly continuous.Let thus ζ > 0 be such that for a 1 , a 2 ∈ A and s ∈ A, |w(s, a 1 )−w(s, a 2 )| < ε if d A (a 1 , a 2 ) < ζ and let n 0 be such that ε n < ζ for n ≥ n 0 .Then (19) is smaller than ε.As ε was taken arbitrary, this proves that the second term in (18) goes to zero as n goes to infinity, ending the proof that In the next lemma, we show that any state-action distribution satisfying certain invariance property can be disintegrated into a stationary strategy and an invariant measure [as introduced in part (a) of Lemma 1] corresponding to this strategy.This will allow us to construct the best response correspondence used in the proof of Theorem 1 as a correspondence on the set of state-action measures rather than on a set of strategies.

Lemma 4 Let τ ∈ (S × A) and suppose η ∈ (S × A) satisfies
and Then there exists a stationary strategy f ∈ F (τ S ) such that Moreover, for any initial distribution of the private state μ 0 ∈ (S) S×A r (s, a, τ )η(ds × da) = lim Proof It is known from e.g.[36] p. 89, that η satisfying ( 21) can be disintegrated into a stochastic kernel f ∈ F (τ S ) and its marginal on S, η S , that is, satisfying for any D ∈ B(S× A) If we input this into (20), we obtain Iterating this equation k times, we obtain Now take any B ∈ B(S).By (23) and part (a) of Lemma 1, we have Passing to the limit as k → ∞, we obtain that η S = p f ,τ .Now, (22) follows from (9).

Proof of Theorem 1
Let us consider the correspondences defined on (S × A): We will show that has a fixed point and then that this fixed point corresponds to a stationary mean-field equilibrium in the game.First note that for any τ ∈ (S × A), and any stationary strategy the first equality and the last equality follow from the definition of (•, •), the second and penultimate ones follow from Lemma 1, the third from the definition of the k + 1-step transition probability, while the fourth one from the fact that Q(B|•, f , τ ) is a measurable function bounded by 1.
Next we show that the graph of is closed in weak convergence topology.To prove that, first note that for any bounded continuous function w : S → R, S w(s)Q(ds|•, •, •) is, by the weak continuity of Q, a continuous function.This then implies that for any sequences η n , τ n ∈ (S × A) such that η n ∈ (τ n ) with η n ⇒ η and τ n ⇒ τ , S w(s)Q(ds|•, •, τ n ) converges continuously to S w(s)Q(ds|•, •, τ ); hence, by Theorem 3.3 in [46] we have . From the uniqueness of the limit this implies that η = S×A Q(•|s, a, τ )η(ds × da), hence η ∈ (τ ), which implies that the graph of is closed.
Then by Lemma 3 there exist stationary strategies and On the other hand, we can easily show that for each n, ( Suppose it is not the case.Then there exists a B ∈ B(S) and a ζ > 0 such that However, by the definition of p f n σ ,τ n and the fact that Q(B|•, f n σ , τ n ) is a bounded measurable function, this can be rewritten for some s ∈ S as which is an obvious contradiction.As Combining (25)(26)(27) we obtain S×A r (s, a, τ )σ (ds × da) which contradicts (24), ending the proof that the graph of is closed.The existence of a fixed point of follows now from Glickberg's fixed point theorem [29].
Suppose τ * is this fixed point.By Lemma 4, there exists a stationary strategy We will show that ( f * , p f * ,τ * ) is a stationary mean-field equilibrium in our game.Clearly, as τ * ∈ (τ * ), μ 0 = p f * ,τ * implies μ t = p f * ,τ * for any t ∈ N. Next, take any g ∈ F (τ * S ).Using exactly the same arguments as in the proof that ( However, by Lemma 4 this can be rewritten as lim where both sides of the inequality are independent of the initial state distribution μ 0 , which implies that Remark 5 Note that the strong continuity part of assumption (A2) was only used in the proof of Lemma 3, which, in turn, was used to prove that the graph of is closed.If we assume that the feasible action correspondence A(s, μ) does not depend on μ, then we do not need Lemma 3 for that ( for any n, as F (μ) ≡ F in that case).Hence, in that case the thesis of Theorem 1 is true under assumptions (A1), (A2'), (A3) and (A4).

Approximate Equilibria of n-Person Stochastic Games
In this section, we present two results showing that under some additional assumptions stationary equilibria of mean-field games considered in the previous section well approximate stationary strategy Nash equilibria of their n-person stochastic counterparts when n is large enough.The main problem with making such an approximation is that stationary meanfield equilibria only specify the behaviour of the players for one value of the global state of the game.It may be enough for the mean-field game, as there we can guarantee that this initial global state does not change over the course of the game, but certainly is not enough in case of its n-person counterparts.What we can do there whenever the game is in a global state different than the one specified by the mean-field equilibrium is to approximate it in some sense using the values of the equilibrium strategy specified for the mean-field equilibrium stationary global state.It turns out, in general, this is not enough to obtain a good approximation of equilibrium for n-person stochastic counterparts of the mean-field game, as shown by the following example.It is worth mentioning here that we know of only one other result of this kind appearing in the mean-field game literature [17].In that paper, however, failure of the usual n-player game approximation by its mean-field counterpart is a result of absorbing states in the model, whereas in the present paper this phenomenon seems to come from the ergodic cost structure.
Example 2 Consider an average-reward mean-field game with S = {0, 1} = A defined with the individual transition kernel Q and the reward function r depending only on the state and the action of the individual and the global state of the game μ rather than the state-action distribution τ in the following way: Q and r clearly satisfy (A1-A4).We will show that f * ∈ F prescribing always to take action 0 and stationary distribution μ * = 1 3 δ 0 + 2 3 δ 1 is a stationary mean-field equilibrium in this game.μ * is clearly a stationary distribution corresponding to f * ; hence, if the game starts in global state μ * and all the players use strategy f * , the global state does not change.Suppose that a player uses stationary strategy g ∈ F (μ * ) defined with the formula g(•|s) = α s δ 0 +(1−α s )δ 1 where α 0 , α 1 ∈ [0, 1] against constant global state μ * .It is easy to see that which gives unique stationary distribution 5−2α 1 9+2α 0 −2α 1 , 4+2α 0 9+2α 0 −2α 1 .Thus, the average reward corresponding to strategy g and global state μ * equals 6α 1 4 It is tedious but elementary to show that it attains maximum over [0, 1] 2 for α 0 = α 1 = 1 which corresponds to strategy f * , which shows that indeed ( f * , μ * ) is a stationary mean-field equilibrium in our game.Now suppose all the players in n-person counterpart of this game use strategy f * .Note that the situation when all the individual states are zeros is clearly an absorbing state of the Markov chain of states of the n-person game.Also, regardless of the initial state of the game, the probability of not reaching it after t stages of the game is no more than 1 − 1 3 n t , which goes to zero as t goes to infinity.This clearly implies that after a finite number of stages all private states become zeros with probability 1.Hence, the average reward corresponding to the profile consisting of strategies f * in the n-person counterpart of the mean-field game is 0. Now suppose that one of the players changes his strategy to g(•|s, μ) = δ 1 (•).Then the game is still absorbed at all private states equal to 0, but the ergodic reward of the player using strategy g is 1, so the profile of f * is not an εstationary Nash equilibrium in the n-person game for any ε < 1.
In the following, we present two results showing that under some additional assumption the mean-field approximation of n-person anonymous stochastic games is good.In the first one, we consider the case where the individual transitions are independent from the global state of the game.This kind of assumption often appears in the mean-field game literature.Notably, it is considered in both existing papers on discrete-time mean-field games with average rewards [12,48].
Theorem 2 Suppose that ( f * , μ * ) is a mean-field equilibrium in a discrete-time mean-field game with long-run average payoff satisfying assumptions (A1), (A2'), (A3) and (A4).Assume further that the individual transitions of the players Q(•|s, a, τ ) = Q(•|s, a) for any s ∈ S, a ∈ A and τ ∈ (S × A) and that the feasible action correspondence A(s, μ) does not depend on μ.Then for any ε > 0 there exists an n 0 such that for any n ≥ n 0 the profile of strategies where each player uses strategy f The proof of this theorem is preceded by a lemma.
for any s ∈ S, a ∈ A and τ ∈ (S × A) and that the feasible action correspondence A(s, μ) does not depend on μ.Then for any strategies f 1 , . . ., Proof We prove the result by induction.First note that for any B 1 , . . ., B n ∈ B(S) and any τ ∈ (S × A) Next assume that the statement of lemma is true for k and consider k + 1.
which by the induction principle shows that Proof of Theorem 2 Before we start the actual proof note that since the individual transitions do not depend on the global state-action distribution τ , neither does p f * ,τ (the same is true for any other strategy).Moreover, since by (8) p f * ,τ must be the invariant distribution of the Markov chain of individual states of the player corresponding to strategy f and μ * is one by the definition of stationary mean-field equilibrium, On the other hand, if we combine the results of Lemmas 1 and 5, we immediately see that for any g ∈ F , Now, let us take an ε > 0. By ( 9), ( 28) and the fact that p g,τ does not depend on τ , for any g ∈ F we have Similarly, by ( 11) and ( 29), Let us denote here and in the sequel by m ( f * , μ * ), m ∈ N the random measure describing the empirical distribution of state-action pairs when m players employ globalstate-independent strategy f * when their states are drawn according to μ * .Then (31) can be written as We can now write using ( 30) and ( 32) that for any g ∈ F , We will now show that the first term on the RHS of ( 33) is smaller than ε 6 for n large enough and that the second one is at most twice bigger.
To show it for the first term, note that for any bounded continuous w : S × A :→ R and any measure τ ∈ (S × A) w(s, a) τ (ds × da) If we now take n 1 such that for every s ∈ S, a ∈ A and we immediately obtain that the first term on the RHS of ( 33) is smaller than ε 6 .To show the inequality for the second term note that by Corollary 2.5 in [14], there exist positive constants C 1 and C 2 such that If we take n 2 ≥ n 1 such that C 1 e −C 2 n 2 < ε 12 r ∞ , we can rewrite the second term on the RHS of (33) as for n ≥ n 2 , where the inequality follows from the definition of n 2 and the fact that W 1 majorizes ρ.This shows that for n ≥ n 2 , for any g ∈ F and s ∈ S n .By the definition of stationary mean-field equilibrium, for any g ∈ F , If we combine it with (34) applied to strategies g and f * , we obtain for n ≥ n 2 , which shows that for such an n the profile of f * strategies is an ε-Nash equilibrium in the n-person stochastic counterpart of the mean-field game.
It turns out that when we assume that the transitions of the players depend on the global state-action distribution, obtaining a result linking equilibria in the mean-field game with ε-equilibria in its n-person counterparts requires some very strong assumptions both about the transition kernel Q and about the mean-field game equilibrium strategy, which can imply the independence from τ of the invariant measure of the Markov chain governed by the transition probability Q(•|s, g, τ ) for any given strategy g.This kind of conditions is used in the next theorem.What is worse though is that in that case we can no longer show that the profile of mean-field equilibrium strategies is an ε-equilibrium in n-person counterpart of the mean-field game for n large enough in the class of all stationary strategies of the players F , but we need to limit ourselves to the class defined as follows.
Then for any ε > 0 and L > 0 there exists an n 0 such that for any n ≥ n 0 the profile of strategies where each player uses strategy f is an ε-Nash equilibrium in the class (F L ) n in the n-person counterpart of the mean-field game.
The proof of the theorem is preceded by three lemmas.In the first one, we prove that under the assumptions of Theorem 3 the invariant measures of the process of individual states of any given player in the mean-field game are uniquely determined given a strategy of this player and that of his opponents, which, as shown in Example 1, is not true in general.

Lemma 6
Suppose that all the assumptions of Theorem 3 are satisfied.Then for any g ∈ F there exists exactly one μ g f ∈ (S) such that for any B ∈ B(S), Moreover, μ g f = p g, ( f ,μ f f ) .
Proof We start by defining the operator M f : (S) → (S) as follows: In what follows, we will show that M f is a contraction mapping.Let w : S × A → R be a function with w B L ≤ 1 and let μ be an arbitrary element of (S).We define For any s 1 , s 2 ∈ S, we have where the last inequality follows from the Lipschitz continuity of f and w.This proves that where the second equality is true because f does not depend on the global state while the penultimate inequality makes use of the Lipschitz continuity of w μ 2 f .Obviously, this implies that ρ S×A ( ( f , μ 1 ), ( f , μ 2 )) ≤ (1 and further that where the last inequality follows from ( 37) and (35).Next, (38), (52) and Corollary 2 in [41] imply that where β := , this implies that M f is a contraction mapping from (S) into itself.As (S) is compact metric and hence complete, Banach fixed point theorem [33] implies that it has a unique fixed point, say μ f f .Note, however, that by (8) (36).Moreover, if some μ = μ f f satisfies (36), it is an invariant distribution of the Markov chain of individual states of a player corresponding to f and μ and hence (by the uniqueness of the invariant measure for a geometrically ergodic Markov chain) it must be equal to p f , ( f , μ) .Then (8) implies it is a fixed point of M f which contradicts the uniqueness of such a fixed point.This establishes the first part of the lemma for g = f .
To prove the lemma for g = f , note that by (8), p g, ( f ,μ f f ) is an invariant measure corresponding to the Markov chain of individual states of a player when the behaviour of other players is distributed according to the distribution ( f , μ f f ), so μ g f = p g, ( f ,μ f f ) satisfies (36).As by Lemma 1, the chain is geometrically ergodic, the invariant measure is unique, so μ g f = p g, ( f ,μ f f ) .
The next lemma provides a strong technical result which will be repeatedly used to prove the convergence of the utilities in n-person counterparts of the mean-field game to those in the mean-field game as n goes to infinity.

Lemma 7 (a) Suppose f is as given in Theorem 3 and let g
(b) If for each n, g n = g, then the RHS of (39) can be written as Proof First note that the function : is clearly continuous as for τ n ⇒ τ and η n ⇒ η we have To complete the proof of the lemma let us introduce some additional notation.Let τ n f k be a random measure describing empirical distribution when k players' behaviour is consistent with the distribution τ n f , that is, where the last inequality makes use of ( 41), (42) and the fact that W 1 dominates ρ.As ε was arbitrary, this ends the proof of part (a) of the lemma.
To prove part (b), first note that clearly τ n m g ⇒ m→∞ τ * g implies μ n m g = (τ n m g ) S ⇒ m→∞ (τ * g ) S .Then, note that if we replace the last term on the LHS of (43) with8 and show that it is still smaller than ε 3 for m big enough, we obtain the thesis of part (b) of the lemma.Note, however, that for any sequence of elements of S, s n i → n→∞ s i , goes to zero as n → ∞ by Theorem 3.3 in [46].Then we can use the same theorem once more to obtain (44).We can now take m 1 ≥ m 0 such that the quantity in ( 44) is smaller than ε 3 for m ≥ m 1 to obtain the thesis of part (b) of the lemma.
In the last lemma, we prove the convergence of the unique invariant measures of the process of individual states of a player corresponding to given strategies of the player and his opponents in n-person counterparts of the mean-field game to those in the mean-field game.

Lemma 8
Suppose that all the assumptions of Theorem 3 are satisfied.Then for any g ∈ F c , p

Proof
To start the proof, first note that for any bounded continuous where the first equality follows from part (b) of Lemma 1, while the second from (10).Let now τ n g := (g, p We can now use Lemma 7 for sequences μ n m g = p which, in view of (45) and part (b) of Lemma 1 implies that and consequently Using the same reasoning, but this time taking τ n g := τ n f , τ n h := τ n g , μ n m g = μ n m f := p By Lemma 6, μ f f is the only probability measure satisfying this equation; hence, τ * f = μ f f .Then, if we input τ * f = μ f f into (46), we obtain which, again by Lemma 6, implies that τ * g = μ g f .So far we have shown that (τ n m g ) S = p (n m ) g,[ f −i ,g] has a subsequence converging to μ g f .However, as the subsequence τ n m g was arbitrary, this proves that the entire sequence (τ n g ) S = p (n) g,[ f −i ,g] converges to μ g f .Proof of Theorem 3 Take any g ∈ F L .We start by computing the rewards corresponding to one player using strategy g against f used by everyone else in the mean-field game and in its n-person counterpart.Note that by the definition of the mean-field equilibrium and Lemma 6, μ * = μ f f = p g, ( f ,μ f f ) , hence by (9) J μ * , g, f = S r s, a, , f , μ f f g da|s, μ f f μ g f (ds).We can also show (using some straightforward computations) that τ * f can be disintegrated into f and (τ * f ) S .Now we can mimic the proof of Lemma 8 (we only need to replace g in the definitions of τ n g , τ n f , μ n g and μ n f with g n there-the rest of the proof is identical) to show that (τ * f ) S = μ f f and (τ * g ) S = μ g f .Inputting this into (50), we obtain lim r (s i , a i , ( f , μ f f ))g(da i |s i )μ g f (ds i ) = J μ * , g, f .
Thus, we can pass to the limit in (49), getting which is a contradiction, as (μ * , f ) was a stationary mean-field equilibrium in the mean-field game.
Remark 6 If in addition to all the assumptions of Theorem 3 we assume that the reward function r is Lipschitz continuous, we may prove (only slightly complicating the proofs of Lemmas 6 and 7) that the thesis of the theorem is true under weaker assumptions on stationary strategy f of the form: There exists a stationary strategy f ∈ F such that f (•|s, μ) = f * (•|s) for any s ∈ S and satisfying Then the constants β f , β * f , β Q need to satisfy β Q (1 + 2β f + β * f ) < γ 2 .This kind of assumption is still very strong but more likely to be satisfied for a stationary strategy in a mean-field game when the correspondence A depends on the global state of the game.

Concluding Remarks
In the paper, we have presented a model of discrete-time mean-field game with compact state and action spaces and average reward.Under some strong ergodicity assumption, we have shown that it possesses a stationary mean-field equilibrium.Next, we have presented an example showing that in case of average-reward criterion usual approximation of n-person games with its mean-field counterpart may fail.Finally, we have identified some cases when stationary equilibria of the mean-field game can approximate well the Nash equilibria of its n-person stochastic game counterparts.As we have seen, some strong additional assumptions were required to obtain this kind of results.A natural question arises whether there are other conditions that can give a good approximation of n-person models by their counterpart with a continuum of players.One of the directions that we can follow in answering this question is limiting ourselves to games played on subsets of the real line.In that case, considering some assumptions of ordinal type rather than general topological properties may give a good result.Other natural questions are, whether the results from this article can be extended to games played on general, non-compact state and action sets and whether considering Markov strategies instead of stationary ones can result in a larger class of models where mean-field limit approximates well its n-person counterparts when n is large.All these questions seem both interesting and highly nontrivial.
is not true, that is, there exist s, s ∈ S such that p s f ,τ − p s f ,τ v > β > 0. But, clearly there exists an m such that for k ≥ m and, by (52) there exists a k 0 such that for k ≥ k 0 .Combining these inequalities for k = max{m, k 0 } we obtain which is a contradiction.
To prove part (b) first note that for any s ∈ S n , μ ∈ (S) and B ∈ B(S n ), where P n denotes the product measure on (S n , B(S n )) induced by measure P. The rest of the proof looks exactly the same as the proof of the main part of (a).To see that p n f is a product measure note that by definition for any k Q k n (•|s, μ, f ) is a product measure.The norm-limit of product measures must also be a product measure.To see that p n f i , f = p n f j , f if f i = f j , note that the Markov chain of states of the game when strategy profile f is applied is symmetric in the sense that the transitions of individual states of i and j are the same if their initial individual states are the same, which results in the same ergodic behaviour in this case.However, in view of the independence of p n f from the initial state s, p n f i , f = p n f j , f for any initial state of the chain.

Theorem 3
weakly continuous and for any s ∈ S, f (•|s, •) is weakly Lipschitz continuous with constant L} .Suppose that ( f * , μ * ) is a mean-field equilibrium in a discrete-time mean-field game with long-run average payoff satisfying assumptions (A1-A4).Assume further that: (a) The stationary strategy f defined with the formula f (•|s, μ) = f * (•|s) for any s ∈ S and μ ∈ (S) is an element of F .Moreover, it is weakly Lipschitz continuous with constant β f as a function of s.(b) The transition kernel Q satisfies for any s ∈ S, a 1 , a 2 ∈ A and τ 1 , τ 2 ∈ (S × A) S×A u(s, a, τ n )η n (ds × da) − S×A u(s, a, τ )η(ds × da) → n→∞ 0 by Theorem 3.3 in[46].
of S is called a global state of the game.It describes the proportion of the population which is in each of the individual states.Global state at time t will be denoted by μ t .We assume that at every stage of the game, each player knows both his private state and the global state, and that his knowledge about individual states of his opponents is limited to the global state.
-The set of actions available to any player in state (s, μ) is given by A(s, μ), with A := (s,μ)∈S× (S) A(s, μ)-a compact metric space.A(•, •) is a non-empty valued correspondence.-Theglobal distribution of the state-action pairs is denoted by τ ∈ (S × A).If we refer to the global state-action distribution at a specific time t, we write τ t .-Individual's immediate reward is given by a bounded measurable function r :