Multiple-Population Discrete-Time Mean Field Games with Discounted and Total Payoffs: The Existence of Equilibria

In the paper we present a model of discrete-time mean-field game with several populations of players. Mean-field games with multiple populations of the players have only been studied in the literature in the continuous-time setting. The main results of this article are the first stationary and Markov mean-field equilibrium existence theorems for discrete-time mean-field games of this type. We consider two payoff criteria: discounted payoff and total payoff. The results are provided under some rather general assumptions on one-step reward functions and individual transition kernels of the players. In addition, the results for total payoff case, when applied to a single population, extend the theory of mean-field games also by relaxing some strong assumptions used in the existing literature.

Dynamic games with a large number of players are a natural tool to model dynamic interactions in many areas of science, yet they do not attract much attention due to complexity of such models.One of the natural ways to deal with problems with a large number of agents that have been developped in different fields of research is to replace such complex models with relatively simpler ones with a continuum of infinitesimal players.This kind of approximations have appeared in one-step games at least since two seminal papers by Wardrop [58] and Schmeidler [55], but for a long time have not been introduced to dynamic game models.The situation has changed since a series of papers by Lasry and Lions [43,44] and by Huang, Caines and Malhamé [38,39,40] where models of noncooperative differential games with a continuum of identical players have been introduced.The idea on which these models were founded was that for games of this type in the limit with infinite number of players, the game problem can be reduced to a much simpler single-agent decision problem.A huge number of publications on the topic have followed during the last decade and the literature is still growing fast.A review of the existing results on differential-type mean-field games can be found in the books [8,18] or the survey [29].
Similar discrete-time models have appeared in the literature significantly earlier in the paper by Jovanovic and Rosenthal [41] under the name of anonymous sequential games, but have not attracted as much attention as their continuoustime counterparts.However, since then some further theoretical results on this type of games have appeared.The models with discounted payoff criterion have been studied in [10,11,20,3,51,23,52,53,54].Conditions under which Nash equilibria in finite-player discounted-utility games converge to equilibria of respective anonymous models were analyzed in [31,32,37,49,51].In [59,60,50] long-time average payoff has been considered, while [59] have treated the games with total reward criterion.In [5,6], algorithms allowing to compute mean-field equilibria in both discounted and average reward games have been presented.
All of the papers enumerated above have considered the case with only one population of symmetrical players.There is no reason however not to consider mean-field games with a bigger number of populations.As long as this number is small, considering this kind of limit model rather than a game with a huge finite number of players should be a significant simplification of the problem.In case of continuous time, this type of models have been introduced in [38] and further studied in [25,21,22,1,7,9,46].As far as we know, there have been no papers on discrete-time mean-field games with multiple populations of players.In this article, we try to fill in that gap by introducing two models of games of this type: one with discounted payoff, another with total payoff.In both cases we provide the results about the existence of mean-field equilibria in such games under some natural assumptions.It is worth mentioning here that some of the results we present, notably all concerning total payoff criterion, are proved under much less restrictive assumptions than those used in the existing literature on single-population mean-field games.As single-popultaion games are just a specific case of the model presented here, in that way the paper also extends the theory for single-population mean-field games.This is further discussed when the relevant results are presented.
The organization of the paper is as follows: In Section 2 we present the way to model the discrete-time mean-field games with several populations of the players.In Section 3 we introduce some notation used in the remainder of the article.In Sections 4 and 5 we present several mean-field equilibrium-existence theorems for cases of discounted and total payoff, respectively.Finally, in Section 6 we give some concluding remarks.

The model
Mean-field game models were designed to approximate dynamic game situations with a large number of symmetric agents.In multi-population mean-field games we still assume that the number of agents is large, but they are homogenic only within a smaller group called a population.The number of populations is finite and fixed, and their mutual interactions are encompassed in each individual's rewards and transitions.Each population has its own reward function and transition kernel (which may or may not operate on the same state space), which makes it significantly different from the models considered in the literature on discrete-time mean-field games.Below we describe the model formally.
A multi-population discrete-time mean-field game is described by the following objects: • We assume that the game is played in discrete time, that is t ∈ {1, 2, . ..}.
• The game is played by an infinite number (continuum) of players divided into N populations.Each player has a private state s, changing over time.
We assume that the set of individual states S i is the same for each player in population i (i = 1, . . ., N ), and that it is a nonempty closed subset of a locally compact Polish space S. • A vector µ = (µ1 , . . ., µ N ) ∈ Π N i=1 ∆(S i ) of N probability distributions over Borel sets2 of S i , i = 1, . . ., N , is called a global state of the game.Its i-th component describes the proportion of i-th population, which is in each of the individual states.We assume that at every stage of the game each player knows both his private state and the global state, and that his knowledge about individual states of his opponents is limited to the global state.
• The set of actions available to a player from population i in state (s, µ) is given by A i (s), with A := i∈{1,...,N},s∈S i A i (s) -a compact metric space.For any i, A i (•) is a non-empty compact valued correspondence such that is a measurable set.Note that we assume that the sets of actions available to a player only depend on his private state and not on the global state of the game.
• The global distribution of the state-action pairs is denoted by τ Again, it gives the distributions of state-action pairs within the population divided into subpopulations i = 1, . . ., N .
• Immediate reward of an individual from population i is given by a measurable function r i : gives the reward of a player at any stage of the game when his private state is s, his action is a and the distribution of state-action pairs among the entire player population is τ .
• Transitions are defined for each individual separately with stochastic kernels ) and i ∈ {1, . . ., N }.• The global state at time t + 1, µt, is given by the aggregation of individual transitions of the players done by the formula As it can be clearly seen, the transition of the global state is deterministic.
) is measurable for any B ∈ B(A) and any t, satisfying π i t (A i (s) | s) = 1 for every s ∈ S i and every t, is called a Markov strategy for a player of population i.A function The set of all Markov strategies for players from i-th population is denoted by M i while that of stationary strategies by F i .As in MDPs, stationary strategies can be seen as a specific case of Markov strategies that do not depend on t.In the paper we never consider general (history-dependent) strategies. 3ext, let Π i t (π i , µ i ) denote the state-action distribution of the i-th population players at time t in the mean-field game corresponding to the distribution of individual states in population i, µ i and a Markov strategy for players of population i, π i ∈ M i , that is The vector (Π 1 t (π 1 , µ 1 ), . . ., Π N t (π N , µ N )) will be denoted by Πt(π, µ).When we use this notation for stationary strategies, we skip the subscript t.
Given the evolution of the global state, which depends on the strategies of the players in a deterministic manner, we can define the individual history of a player α (from any given population i) as the sequence of his consecutive individual states and actions h = (s α 0 , a α 0 , s α 1 , a α 1 , . ..).By the Ionescu-Tulcea theorem (see Chap. 7 in [12]), for any Markov strategies π α of player α and σ 1 , . . ., σ N of other players (including all other players of the same population), any initial global state µ0 and any initial private state of player α, s, there exists a unique probability measure P s,µ 0 ,Q,π α ,σ on the set of all infinite individual histories of the game H = (D i ) ∞ endowed with Borel σ-algebra, such that for any B ∈ B(S i ), E ∈ B(A) and any partial history ), with state-action distributions defined by τ j 0 = Π j 0 (σ j , µ j 0 ), τ j t+1 = Π i t (σ j , Φ j (• | τ t )) for t = 1, 2, . . .and j = 1, . . ., N .Now we are ready to define the two types of reward we shall consider in this paper.For β ∈ (0, 1), the β-discounted reward4 for a player α from population i using policy π i ∈ M i when other players use policies σ j ∈ M j (depending on the population j they belong to) and the initial global state is µ0, with the initial individual state of player α being s i 0 is defined as follows: where τ j 0 = Π j 0 (σ j , µ j 0 ), τ j t+1 = Π j t (σ j , Φ j (• | τ t )) for t = 1, 2, . . .and j = 1, . . ., N .To define the total reward in our game let us distinguish one state in S, say s * , isolated from S \ {s * } and assume that A i (s * ) = {a * } independently of i ∈ {1, . . ., N } for some fixed a * isolated from A \ {a * }.Moreover, let us assume that s * ∈ S i for i = 1, . . ., N .Then the total reward of a player from population i using policy π i ∈ M i when other players apply policies σ = (σ 1 , . . ., σ N ) and the initial global state is µ0, with the initial individual state of player α being s i 0 , is defined in the following way: where τ j 0 = Π j 0 (σ j , µ j 0 ), τ j t+1 = Π j t (σ j , Φ j (• | τ t )) for t = 1, 2, . . .and j = 1, . . ., N , while T i is the moment of the first arrival of the process {s i t } to s * .The total reward is interpreted as the reward accumulated by the player over the whole of his lifetime.State s * is an artificial state (so is action a * ) denoting that a player is dead.µ0 corresponds to the distribution of the states across the population when he is born, while s i 0 is his own state when he is born.The fact that after some time the state of a player can become again different from s * should be interpreted as that after some time the player is replaced by some new-born one.
Next, we define the solutions we will be looking for: ) form a stationary mean-field equilibrium in the β-discounted reward game if for any i, s i 0 ∈ S i , and every other stationary strategy of a player from population i, g i ∈ F i ) ), with t a denoting for any infinite vector a = (a0, a1, . ..), the vector (at, at+1, . ..).Moreover, if µ 0 = µ * 0 , then µ t = µ * t for every t ≥ 1 if strategies π 1 , . . ., π N are used by all the players.

Preliminaries
As we have written, we assume that S and A are metric spaces.The metric on S will be denoted by dS while that on A by dA.Whenever we relate to a metric on a product space, we mean the sum of the metrics on its coordinates.Some of the assumptions presented below will be given with respect to the moment function w0 : S → [1, ∞), that is a continuous function satisfying for some sequence {Kn} n≥1 of compact subsets of S.Moreover, w0(s) ≥ 1 + dS(s, s0) p for some p ≥ 1 and s0 ∈ S.
In order to study both bounded and unbounded one-stage reward functions, we define the following function: For any function h : S → R we define its w-norm as .
Whenever we speak of functions defined on a product of S and some other space, their w-norm is defined similarly, with the help of the same function w.
By Bw(S) we denote the space of all real-valued measurable functions from S to R with finite w-norm.and by Cw(S) -the space of all continuous functions in Bw(S).Clearly, both Bw(S) and Cw(S) are Banach spaces.The same can be said of Bw(S × A) and Cw(S × A) -the spaces of bounded and bounded continuous functions from S × A to R with finite w-norm. 5nalogously, for any finite signed measure µ on S, we define the w-norm of µ as It should be noted that in case w ≡ 1, µ w is the total variation distance (see e.g.[35], Section 7.2.There are two standard types of convergence of probability measures which are used in the paper: the weak convergence denoted by ⇒ and the strong (or setwise) convergence denoted by → and defined (for any Borel space (X, B(X))) by µn → µ ⇐⇒ µn(B) → µ(B) for any B ∈ B(X).
It is known (see e.g.[47], Theorem 6.6) that the weak topology can be metrized using the metric where {φi} i≥1 is a sequence of continuous bounded functions from S to R whose elements form a dense subset of the unit ball in C(S).Strong convergence topology is in general not metric.
Next, let ∆w(S) := µ ∈ ∆(S) : It has been shown in [51] that ∆w(S) can be metrized using the metric It can be shown that ∆w(S) with metric ρw is under the assumptions that we make about w a Polish space (see [51,14] for more on that).We will use the topology defined by this metric (called w-topology in the sequel) as the standard topology on ∆w(S).
We will also use the notation with analogously defined metrics also denoted by ρ (metric defining weak convergence) and ρw (w-metric) as well as similar notation for subsets of S or S × A.
Whenever we speak about continuity of correspondences, we refer to the following definitions: Let X and Y be two metric spaces and F is said to be continuous iff it is both upper and lower semicontinuous.For more on (semi)continuity of correspondences see [34], Appendix D or [4], Chapter 17.2.
4 The existence of stationary and Markov mean field equilibria in discounted payoff game

Assumptions
In this section, we address the problem of the existence of an equilibrium in discrete-time mean-field games with β-discounted payoff.We begin by presenting the set of assumptions used in our results.

Main results
In the first two main results of this section we prove the existence of stationary mean-field equilibrium in discounted discrete-time mean-field games.
Theorem 1 Suppose that the assumptions (A1-A3) are satisfied.Then for any β ∈ (0, 1) the multi-population discrete-time mean-field game with β-discounted payoff defined with r i , Q i , S i and A i , i = 1, . . ., N , has a stationary mean-field equilibrium.
In the proof we adapt the techniques introduced in [41] to our case.We precede the proof of the theorem with two lemmas.
that is, let V i β,τ be the optimal value for the β-discounted Markov decision process of player from population i when the behaviour of all the other players is described by the state-action measure τ , fixed over time.Under assumptions (A1-A3) Proof: Let us fix an i ∈ {1, . . ., N } and define for any τ ∈ Π N j=1 ∆w(D j ) Note, that clearly (by assumptions (A1) and (A2) (b)) T i τ maps Bw(S i ) into itself.Moreover, for any u1, u2 ∈ Bw(S i ), where the penultimate inequality follows from the definition of the w-norm, while the last one from assumption (A2) (b).Hence, T i τ is a contraction defined on a complete space.By the Banach fixed point theorem it has a unique fixed point, which is by Theorem 4.2.3 in [34] equal to V i β,τ .Moreover, this fixed point can be obtained as limn→∞ T i τ n (u0) for any given u0 ∈ Bw(S i ).
Let u τ 0 ≡ 0 and define for n = 1, 2, . . .u τ n := T i τ (u τ n−1 ).We will next show that for each n, u τ n (s) is continuous in (s, τ ) and u τ n w ≤ R 1−β .We prove these statements by induction on n.For n = 0 both claims are obvious.Suppose they hold for n = k − 1.Then by Theorem 3.3 in [56] (see also Remark 3.4 (ii) there -the assumptions given there are satisfied with ) is jointly continuous in (s, a, τ ), hence, by Proposition 7.32 in [12] u τ k (s) = sup is also (jointly) continuous.We also have (here the third inequality is a consequence of assumption (A2) (b), while the fourth one follows from (A1) and our inductive assumption): Thus, the second claim has been proved for n = k.
To finish the proof, let us take convergent sequences {s k } k≥1 in S i and {τ k } k≥1 in Π N j=1 ∆w(D j ) such that s k → s * and τ k ⇒ τ * .We will show that We start the proof by noticing that the set K := {s k : k ≥ 1} ∪ {s * } is clearly compact, hence there exists a value W such that W ≥ |w(s)| for s ∈ K. Now, fix any ε > 0 and let n0 be such that Clearly, by repeated use of (3) for u1 = u τ * 0 and u2 = V i β,τ * , we obtain Similarly we obtain that for any k ≥ 1, Finally, from the joint continuity of u • n 0 (•), there exists a k0 ∈ N such that for Now, combining (4), ( 5) and ( 6), we obtain that for any k ≥ k0 Lemma 3 Suppose assumptions (A1-A3) hold and M > 0 is such that for each i ∈ {1, . . ., N } the set is nonempty.Then for each τ ∈ Π N j=1 ∆(D j ), i = 1, . . ., N and any stationary strategy f i ∈ F i , there exists a µ f i ,τ ∈ ∆ M w (S i ) such that for any B ∈ B(S).
Proof: Let us fix i ∈ {1, . . ., N } and note that for any τ ∈ Π N j=1 ∆(D j ), and any stationary strategy f i ∈ F i , the transition probability is clearly strongly continuous.Next, suppose that the initial distribution of individual state of a player from population i is ρ i 0 ∈ ∆ M w (S i ).We prove by induction that the same is true for ρ i t , the distribution of his state when t = 1, 2, . . .if he uses strategy f i and the behaviour of the other players is described by τ .Suppose the thesis is true for t.
Then by assumption (A2) (b) we have By Remark 1 in [33] we know that the sequence of measures (whose elements clearly belong to ∆ M w (S i ), as it is a convex set) has a subsequence weakly converging to an invariant measure of the Markov chain with transition probability ) is tight, hence, by Prohorov's theorem (Theorem 6.1 in [13]) relatively compact.But this implies that ∆ M w (S i ) is closed in weak convergence topology, hence µ f i ,τ ∈ ∆ M w (S i ).Proof of Theorem 1: Let M be such as in Lemma 3 and for i = 1, . . ., N let Let us further define the correspondences from Π N j=1 ∆ M w (D j ) to ∆ M w (D i ), i = 1, . . ., N : We will now verify, that for each i, Θ i and Ψ i have some useful properties.We fix i ∈ {1, . . ., N } for all these considerations.First note that η = Π i (f i , µ f i ,τ ) clearly belongs to Θ i (τ ), as for any B ∈ B(S i ), where the first and the last equality follow from the definition of Π i (•, •), the second comes from the definition of invariant measure, while the third one from (7).Moreover, by Lemma 3 We next show that the graph of Θ i is closed in weak convergence topology.To prove that, first note that for any bounded continuous function u : S i → R, Hence, by Theorem 3.3 in [56], we have . From the uniqueness of the limit this implies that By Theorem 4.2.3 in [34] there exists an optimal stationary policy f i * in the optimization problem of a player from population i maximizing his discounted reward when the behaviour of all the other players is described by the global state τ , fixed over time.Moreover, f i * is a measurable selector attaining maximum on the RHS of the equation Then we can write that Next we show that the graph of Ψ i β is closed in the weak convergence topology.Let us take sequences τn ∈ Π N j=1 ∆ M w (D j ) and ηn ∈ ∆ M w (D i ) such that ηn ∈ Θ(τn) for every n ∈ N with ηn ⇒ η and τn ⇒ τ .Since the graph of Θ i is closed, to show that so is that of Ψ i β , we only need to prove that the equality defining the set Ψ i β (τ ) is satisfied for τ and η.Note however that for each n Then by the continuity of Q i and V i β,τn and Theorem 3.3 in [56] (see also Remark 3.4 (ii) there -by (A2) (b) the assumption presented there is true for g As also r i (•, •, τn) converges continuously to r i (•, •, τ ) by (A1), we may apply Theorem 3.3 in [56] again (again with g = R 1−β w, which satisfies the assumption given in Remark 3.4 (ii) by (A1) and (A2) (b)), we can pass to the limit in (9), obtaining which ends the proof that the graph of Ψ i is closed.
Finally, we can also note that for each τ , the set Ψ i β (τ ) is clearly convex.Next, let us define the following correspondence mapping Π N i=1 ∆(D i ) into itself: It is obvious that our previous considerations imply that Ψ β also has nonempty and convex values and that its graph is closed.To finish the proof we need to note that the function w0 (which is a moment on S) is also a moment on S × A (as A is compact), hence each ∆ M w (D i ) is tight.Now Prohorov's theorem implies that ∆ M w (D i ) is compact in weak convergence topology for i = 1, . . ., N and Π N j=1 ∆ M w (D j ) is compact in product topology.Therefore, by the Glickberg fixed point theorem [26], Ψ β it has a fixed point.
Suppose τ * is this fixed point.By the well-known result, see e.g.[36] p.89, for each i ∈ {1, . . ., N }, τ i * can be disintegrated into a stochastic kernel g i * ∈ F i 0 and its marginal on S i , (τ i * ) S i , that is, satisfying for any D ∈ B(D i ) Let us further define otherwise inequality in the definition of S i 0 would imply an inequality in the definition of Ψ i β .Let us thus define the strategy It is clear that for any s ∈ S i , Then, for any D ∈ B(D i ) we can reason as follows: where the first equality follows from (10), the second and the last one -from the definition of integral over a set, while the third and the fourth one use the definition of strategy f i and (11).Note however that the two last equalities proved imply that there exist strategies f i and invariant measures µ i * = (τ i * )S for each population i ∈ {1, . . ., N }, such that f i is the best response in the β-discounted game against τ * and τ * is the stationary global state corresponding to the profile of strategies (f 1 , . . ., f N ) and the initial global state (µ 1 * , . . ., µ N * ), hence a stationary mean-field equilibrium in the β-discounted game.
Remark 1 It should be noted here that the results given in Theorem 1 applied to the model with a single population extend the existing results for such a case.The most general result of this type in the literature appears in [41] and concerns the case with compact individual state space.
The last result in this section gives the conditions under which Markov meanfield equilibria in our models exist.They are based on one of the theorems given in [51].It should be noted here that the assumptions in that paper are slightly stronger than in our model when applied to a single population.Namely, in our model the rewards and the transitions depend on the state-action distribution of the other players, while in [51] the dependence is only on the distribution of private states.Also, in our model we allow the set of feasible actions to depend on player's private state, while in [51] there was no such dependence.
Remark 2 As we have already noted, in our model the rewards and the transitions may depend on the state-action distribution of the players, which differs from [51], where the dependence is only on the distribution of private states.Such an assumption is not new to the mean-field game literature.In case of discretetime games it has already been used in the first paper on this type of games [41].It has also been applied in [10,11,59,60].As for the continuous time case, this type of models have been introduced by Gomes and Voskayan in [30] under the name of extended mean-field games.Cardaliaguet and Lehalle have proposed the name of mean field games of controls in [17] for this type of framework.Some further results on the topic include [42,2,19,28,15,45].
We precede the proof of Theorem 4 by a counterpart of Lemma 2 for the Markov case.It requires some additional notation.First, let us define for i = 1, . . ., N the sets Using these constants, we define for i = 1, . . ., N and t ≥ 0 the sets It is easy to see that under (A1'), Ci with metric ρC((u0, u1, . ..), (v0, v1, . ..) where δ is chosen such that δ > γ and αβδ < 1, is a complete metric space.
Lemma 5 For any state-action measure flow (τ ) := (τ 0, τ 1, . ..) ∈ Ξ, let that is, let it be the optimal value at time t for the β-discounted Markov decision process of player from population i when the behaviour of all the other players is described by the flow (τ ).Under assumptions (A1'-A2') and (A3) for any i ∈ {1, . . ., N } and t ≥ 0, V i,t β,(τ ) ∈ C t i .
Proof: Let us fix an i ∈ {1, . . ., N } and define for any (τ ) ∈ Ξ and t ≥ 0 By Proposition 7.32 in [12] for any u ∈ C i t+1 , T i,t (τ ) (u) is continuous.Moreover, where the last inequality follows from (A1') and (A2') (b) (note that w (D i ) is implied by the assumption that µ i 0 ∈ ∆w(D i ) and (b) of (A2') applied to the recursive formula for τ t).Hence, T i,t (τ ) maps C i t+1 into C i t .Next, for any u1, u2 ∈ C i t+1 , we have where the last inequality follows from the definition of the w-norm and the assumption (A2') (b).
We next define the operator T i (τ ) : Ci → Ci with the formula From what we have shown, it really maps Ci into itself.As (12) implies that for any (u0, u1, . ..) and (v0, v1, . ..) in Ci, it is an αβδ-contraction defined on a complete space.By the Banach fixed point theorem it has a unique fixed point.By Theorems 14.4 and 17.1 in [36] the elements of this vector are equal to V i,t β,(τ ) , t ≥ 0, which ends the proof that the optimal value functions V i,t β,(τ ) ∈ C t i for t ≥ 0. Now we are ready to pass to the main part of the proof of Theorem 4. Proof of Theorem 4: We start by defining the correspondences from Ξ into Ξ i , (i = 1, . . ., N ) with the formulas: We next prove that Θ i and Ψ i β have some useful properties.We fix i ∈ {1, . . ., N } for these considerations.We start by showing that for any Markov strategy π i ∈ M i the flow (η i ) defined with the recurrence is an element of Θ i ((τ )).We do it by induction on t.For t = 0 both η i 0 S i = µ i 0 and D i w(s)η i 0 (ds×da) ≤ M are obvious (by the definition of Π i 0 and Assumption (A1')).Now suppose D i w(s)η i t (ds × da) ≤ M α t .Then by the defnition of Φ i , Moreover, by (A2') (b) we have which shows that η i t+1 ∈ Ξ i t+1 and by the induction principle that (η i ) ∈ Ξ i .Next we prove that the graph of Θ i is closed.To do that, we take convergent sequences (τ . Hence, by Theorem 3.3 in [56], we have Next note, that if π i β is an optimal deterministic Markov policy in the optimization problem of a player from population i maximizing his β-discounted reward when the behaviour of all the other players at each stage is described by the flow (τ ), then for each t ≥ 1 and s ∈ S i it satisfies which implies that for any t ≥ 1, Hence, measure flow (η i ) defined by ( 13) with β has the same property only requires proving that the equalities defining Ψ i β hold for (η i ) and (τ ).Let us fix t ≥ 1.The definition of Ψ i β implies that for each n By the continuity of Q i and V i,t β,(τ (n) ) , and Theorem 3.3 in [56] (the assumption presented in Remark 3.4 (ii) there is true for g = Ltw by Lemma 5), t−1 ) converges continuously to r i (•, •, τ t−1) by (A1'), using Theorem 3.3 in [56] once more (now with g = Lt−1w in Remark 3.4 (ii) there, again by by Lemma 5), we can pass to the limit in ( 14), obtaining As t was arbitrary, this ends the proof that the graph of Ψ i β is closed.To finalize the proof, we define the correspondence from Ξ into itself: What we have shown already implies that Ψ β has nonempty values and that its graph is closed.Convexity of values of Ψ β is obvious.As w is a moment function, each ∆ w (D i ) is tight, hence, by Prohorov's theorem it is compact.This implies that Ξ is compact in product topology.Therefore, by the Glickberg fixed point theorem [26], Ψ β has a fixed point.Suppose (τ * ) is this fixed point.Disintegrating τ * i t gives for i = 1, . . ., N and t = 0, 1, . . .stochastic kernels π * i t and measures µ * i t which (after similar modifications as in the proof of Theorem 1) correspond to the Markov strategies and global state flows in mean-field Markov equilibrium in our game.5 The existence of stationary and Markov mean field equilibria in total payoff game

Assumptions
In this section, we address the problem of the existence of an equilibrium in the mean-field games with total payoff.In its main results we shall add new assumption (A4) or (A4") to those defined in Section 4.1.Its formulation requires defining for i = 1, . . ., N , s ∈ S i , a ∈ A i (s) and τ ∈ Π N j=1 ∆(D j ) the modified transition probabilities Q i * : The first new assumption will be used to prove the existence of a stationary mean-field equilibrium in total reward models.
In case of the results about the existence of Markov mean-field equilibria in the discounted case, two of the assumptions (A1') and (A2') refered to the discount factor β, which does not exist in the total reward model.Hence, apart from the new assumption (A4"), new versions of these two assumptions will be necessary.For technical reasons, there are also some additional restrictions added.(A1") For i = 1, . . ., N , r i is continuous and bounded above by some constant R on D i × Π N i=1 ∆(D i ).Moreover, there exist non-negative constants α, γ, M satisfying α ≤ γ, αγ < 1 and S w(s)µ i 0 (ds) ≤ M for i = 1, . . ., N, and such that for i = 1, . . ., N , s ∈ S i and t = 0, 1, 2, . .., Here and in the sequel for i = 1, . . ., N , s ∈ S i , π i ∈ M i and (τ ) = (τ 0, τ 1, . ..) ∈ Π ∞ t=0 Π N j=1 ∆(D j ), and for t ≥ 2: For i = 1, . . ., N , s ∈ S i , π i ∈ M i and τ ∈ Π N j=1 ∆(D j ), Remark 3 By assuming (A4) or (A4"), we build upon the framework of transient total-reward Markov decision processes introduced by Veinott [57] in the context of finite state and action spaces and generalized to Borel spaces in [48,24].The optimization problem faced by an individual from population i in our model of total reward mean-field game would for every fixed global state-action distribution τ be transient, if which is clearly true under (A2) (b) and (A4).Roughly speaking, this would mean that that for a reward function such that r i (•, •, τ ) w < ∞, the total reward is finite for any Markov strategy applied.In (A4) we strengthen this assumption by requireing that the convergence of the total reward to its value is uniform across all Markov strategies with respect to the w-norm.(A4") is an adjustement of this condition to the case when the decision-maker optimizes his behavior against a flow (τ ).

Main results
Theorem 6 Suppose that the assumptions (A1-A4) are satisfied.Then the multipopulation discrete-time mean-field game with total payoff defined with r i , Q i , S i and A i , i = 1, . . ., N , has a stationary mean-field equilibrium.
Let us start by noticing that total reward of a player from population i using a given strategy π when the behaviour of the others is constant over time and described by τ in the MDP model with transition probability Q * i is the same as the reward until reaching state s * in the model with transition probability Qi.Let us next define for any i ∈ {1, . . ., N } and τ ∈ Π N j=1 ∆(D j ) V i * τ (s) := max that is, the optimal value for the total-reward Markov decision process of player from population i when the behaviour of all the other players is described by the state-action measure τ , fixed over time.Crucial properties of function V i * • (•) are given in lemma below.
Lemma 7 Under assumptions (A1-A4) for each i ∈ {1, . . ., N }, V i * τ (s) is jointly continuous in (s, τ ).Moreover, there exists a constant L such that V i * τ (•) w ≤ L for any τ ∈ Π N j=1 ∆w(D j ).Proof: Fix i and τ , and note that under (A2) (b) and (A4), w is finite for i = 1, . . ., N .Hence, we immediately see that the total-reward Markov decision process defined with S i , A i , r i and Q i * satisfies the assumptions7 of Theorem 12 in [24].Therefore, with a help of this theorem and Proposition 1 in [24] we can define: such that S i , A i , r i * and Q i * * define a β-discounted Markov decision process with a value for s ∈ S i , τ ∈ Π N j=1 ∆w(D j ).Moreover, optimal stationary strategies exist and coincide in both MDPs.
Next note, that if ζ i is a continuous function, then the model defined by S i , A i , r i * and Q i * * satisfies assumptions (A1-A3) with function w replaced by w * ≡ 1 (in particular, (A2) follows from the fact that ζ i (s, •) ≥ w(s) for s ∈ S i ).This implies that by Lemma 2, V i βτ (s) is continuous in (s, τ ) and V i βτ (•) ≤ R 1−β .Hence, combining (16) with the fact that ζ i is continuous and ζ i (s, •) ≤ Liw(s) we obtain that V i * τ (s) is also continuous in (s, τ ).Moreover, for any τ ∈ Π N j=1 ∆w(D j ), which proves the thesis of the lemma.Hence, all we need to do is to show that ζ i is continuous.
To do that, we note that ζ i is clearly the limit of the sequence of functions w τ n n≥0 , defined with the following recurrence: w τ 0 := w, w τ n := T i * τ (w τ n−1 ) for n = 1, 2, . .., where for any τ ∈ Π N j=1 ∆w(D j ), We next show by indution that each w τ n (s) is continuous in (s, τ ).For n = 0 the claim is true by the definition of w.Suppose it holds for n = k − 1.Then by Theorem 3.3 in [56] (the assumptions given in Remark 3.4 (ii) there are satisfied with ) is jointly continuous in (s, a, τ ), hence, by Proposition 7.32 in [12] w τ k (s) = sup is also (jointly) continuous.Therefore, the claim is true for any n ≥ 1.
To finish the proof, let us take convergent sequences {s k } k≥1 in S i and {τ k } k≥1 in Π N j=1 ∆w(D j ) such that s k → s * and τ k ⇒ τ * .We will show that ζ i (s k , τ k ) → ζ i (s * , τ * ).Since the set K := {s k : k ≥ 1} ∪ {s * } is clearly compact, there exists a value W such that W ≥ |w(s)| for s ∈ K. Now, fix any ε > 0. By (A4) there exists an t * such that This immediately implies and, for any k ≥ 1, Finally, from the joint continuity of w • t * (•), there exists a k0 ∈ N such that for any k ≥ k0 w Combining ( 17), ( 18) and ( 19), we obtain that for any k ≥ k0 Proof of Theorem 6: As in the case of the discounted reward, we define the correspondences from Π N j=1 ∆(D j ) to ∆(D i ): Using similar arguments to those employed in the proof of Theorem 1 (with a difference that instead of Lemma 2 we apply Lemma 7 when necessary), we can prove that Ψ i * (we do not need to prove that for Θ i , as it is defined in exactly the same way as in the case of the β-discounted reward) that it has nonempty convex values and that its graph is closed.Then we define the correspondence and show that it has a fixed point, which, again using similar arguments as in the proof of Theorem 1, can be proved to correspond to a stationary mean-field equilibrium in the total reward discrete-time mean field game considered in the theorem.
In the last result of this section we give conditions under which a Markov mean-field equilibrium exists in the total-reward game.
Proof: Let us fix µ 0 and M satisfying (A1").Recall the notation used in the proof of Theorem 4 Next, for any flow of measure-vectors (τ ) := (τ 0, τ 1, . ..) ∈ Ξ and any i ∈ {1, . . ., N } let us define V i,t * (τ ) (s) := max that is, the optimal value at time t ≥ 0 for the total-reward Markov decision process of player from population i when the behaviour of all the other players at each stage is described by the flow (τ ).Using the standard method of transforming a nonhomogeneous Markov decision process into a homogeneous one and Theorem 12 in [24] 8 we may show that V i * (τ ) (s) can be obtained from the optimal reward in the discounted Markov decision process with state space S i × N and: In fact, if V i β(τ ) (s, t) denotes the optimal value in the modified (discounted) model, V i,t * (τ ) (s) = V i β(τ ) (s, t) ζ i (s, t, (τ )) for any s ∈ S i and t ≥ 0.Moreover, optimal stationary strategies in the new model (which exist by Theorem 12 in [24]) correspond to optimal Markov strategies in the original one.Finally, repeating the arguments used in the proof of Lemma 7 (the assumptions (A1-A4) used there are satisfied with w(s) replaced by w(s, t) := w(s)α −t , which clearly is a moment function on S × N) we can show that V i β• (•, t) and ζ i (•, t, •) are continuous and their w-norms are bounded by L := R max j=1,...,N L j 1− β , which implies that for any s ∈ S i and t ≥ 0, V i,t * (τ ) (s) ≤ Lλ t w(s).
Then the single-stage rewards are defined independently of the time components of both the individual and the global state.The existence of Markov mean-field equilibria is then assured by Theorem 8 under assumptions (A1)-(A3) on original primitives of the model (which correspond to (A1"), (A2") and (A3) for the modified one).(A4") is satisfied automatically.A similar transformation allows for considering the non-stationary model with finite horizon as well as the case when the time horizon of each individual is a random variable with a finite expected value independent from the Markov chain of his individual states.
In the paper we have presented a model of discrete-time mean-field game with several populations of players.Games of this type have been studied in the literature in the discrete-time setting.The main results presented in this article are stationary and Markov mean-field equilibrium existence theorems for two payoff criteria: β-discounted payoff and total payoff proved under some rather general assumptions on one-step reward functions and individual transition kernels of the players.It is also worth noting that the games with total payoff have only been studied in finite state space case, hence, the results presented here also extend those for total-payoff mean-field games with a single population.The article is the first of two papers on multiple-population discrete-time mean-field games with discounted or total payoff.In the second one we provide theorems showing that under some additional assumptions equilibria obtained in the mean-field models are approximate equilibria in their n-person counterparts when n is large enough.
We also plan further research on the topic of discrete-time mean-field games with multiple populations of players which will concentrate on games with long-run average reward.