1 Introduction

Dynamic games with a large number of players are a natural tool to model dynamic interactions in many areas of science, yet they do not attract much attention due to complexity of such models. One of the natural ways to deal with problems with a large number of agents that have been developped in different fields of research is to replace such complex models with relatively simpler ones with a continuum of infinitesimal players. This kind of approximations have appeared in one-step games at least since two seminal papers by Wardrop [56] and Schmeidler [53], but for a long time have not been introduced to dynamic game models. The situation has changed since a series of papers by Lasry and Lions [41, 42] and by Huang, Caines and Malhamé [36,37,38] where models of non-cooperative differential games with a continuum of identical players have been introduced. The idea on which these models were founded was that for games of this type in the limit with infinite number of players, the game problem can be reduced to a much simpler single-agent decision problem. A huge number of publications on the topic have followed during the last decade and the literature is still growing fast. A review of the existing results on differential-type mean-field games can be found in the books [8, 16] or the survey [27].

Similar discrete-time models have appeared in the literature significantly earlier in the paper by Jovanovic and Rosenthal [39] under the name of anonymous sequential games, but have not attracted as much attention as their continuous-time counterparts. However, since then some further theoretical results on this type of games have appeared. The models with discounted payoff criterion have been studied in [3, 10, 11, 18, 21, 49,50,51,52]. Conditions under which Nash equilibria in finite-player discounted-utility games converge to equilibria of respective anonymous models were analyzed in [29, 30, 35, 47, 49]. In [48, 57, 58] long-time average payoff has been considered, while [57] have treated the games with total reward criterion. In [5, 6], algorithms allowing to compute mean-field equilibria in both discounted and average reward games have been presented.

All of the papers enumerated above have considered the case with only one population of symmetrical players. There is no reason however not to consider mean-field games with a bigger number of populations. As long as this number is small, considering this kind of limit model rather than a game with a huge finite number of players should be a significant simplification of the problem. Moreover, there are natural applications of this type of models. For example, we may want to analyze the influence of two related industries on each other. If each of these industries consists of a large number of firms competing with each other over time, their interaction can be modeled using a mean-field model. But as we would like to model the interaction between the two industries, the aggregate behaviour of companies from each industry should affect the preformance of the other one, so these two different populations need to be modeled within a single mean-field framework. Another natural application could be in modeling the opinion dynamics in social media. The herding behaviour in such a setting has been widely discussed in the literature. In particular, it has been analyzed using mean-field models. We know however that this dynamics will change when we have several groups with different political views clashing with each other. They can be introduced into the mean-field model by treating them as different populations. In case of continuous time, multiple-population mean-field models have been introduced in [36] and further studied in [1, 7, 9, 19, 20, 23, 44]. As far as we know, there have been no papers on discrete-time mean-field games with multiple populations of players. In this article, we try to fill in that gap by introducing two models of games of this type: one with discounted payoff, another with total payoff. In both cases we provide the results about the existence of mean-field equilibria in such games under some natural assumptions. It is worth mentioning here that some of the results we present, notably all concerning total payoff criterion, are proved under much less restrictive assumptions than those used in the existing literature on single-population mean-field games. As single-popultaion games are just a specific case of the model presented here, in that way the paper also extends the theory for single-population mean-field games. This is further discussed when the relevant results are presented.

The organization of the paper is as follows: In Sect. 2 we present the way to model the discrete-time mean-field games with several populations of the players. In Sect. 3 we introduce some notation used in the remainder of the article. In Sects. 4 and 5 we present several mean-field equilibrium-existence theorems for cases of discounted and total payoff, respectively. Finally, in Sect. 6 we give some concluding remarks.

2 The Model

Mean-field game models were designed to approximate dynamic game situations with a large number of symmetric agents. In multi-population mean-field games we still assume that the number of agents is large, but they are homogenic only within a smaller group called a population. The number of populations is finite and fixed, and their mutual interactions are encompassed in each individual’s rewards and transitions. Each population has its own reward function and transition kernel (which may or may not operate on the same state space), which makes it significantly different from the models considered in the literature on discrete-time mean-field games. Below we describe the model formally.

A multi-population discrete-time mean-field game is described by the following objects:

  • We assume that the game is played in discrete time, that is \(t\in \{ 1,2,\ldots \}\).

  • The game is played by an infinite number (continuum) of players divided into N populations. Each player has a private state s, changing over time. We assume that the set of individual states \(S^i\) is the same for each player in population i (\(i=1,\ldots ,N\)), and that it is a nonempty closed subset of a locally compact Polish space S.Footnote 1

  • A vector \(\overline{\mu }=(\mu ^1,\ldots ,\mu ^N)\in \Pi _{i=1}^N\Delta (S^i)\) of N probability distributions over Borel setsFootnote 2 of \(S^i\), \(i=1,\ldots ,N\), is called a global state of the game. Its ith component describes the proportion of ith population, which is in each of the individual states. We assume that at every stage of the game each player knows both his private state and the global state, and that his knowledge about individual states of his opponents is limited to the global state.

  • The set of actions available to a player from population i in state \((s,\overline{\mu })\) is given by \(A^i(s)\), with \(A:=\bigcup _{i\in \{ 1,\ldots ,N\}, s\in S^i}A^i(s)\)—a compact metric space. For any i, \(A^i(\cdot )\) is a non-empty compact valued correspondence such that

    $$\begin{aligned} D^i:=\{ (s,a)\in S^i\times A: a\in A^i(s)\} \end{aligned}$$

    is a measurable set. Note that we assume that the sets of actions available to a player only depend on his private state and not on the global state of the game.

  • The global distribution of the state-action pairs is denoted by \(\overline{\tau }=(\tau ^1,\ldots ,\tau ^N)\in \Pi _{i=1}^N\Delta (D^i)\). Again, it gives the distributions of state-action pairs within the population divided into subpopulations \(i=1,\ldots ,N\).

  • Immediate reward of an individual from population i is given by a measurable function \(r^i:D^i\times \Pi _{i=1}^N\Delta (D^i)\rightarrow \mathbb {R}\). \(r^i(s,a,\overline{\tau })\) gives the reward of a player at any stage of the game when his private state is s, his action is a and the distribution of state-action pairs among the entire player population is \(\overline{\tau }\).

  • Transitions are defined for each individual separately with stochastic kernels \(Q^i:D^i\times \Pi _{i=1}^N\Delta (D^i)\rightarrow \Delta (S^i)\) denoting transition probability for players from ith population. \(Q^i(B\mid \cdot ,\cdot ,\overline{\tau })\) is product-measurable for any \(B\in \mathcal {B}(S^i)\), any \(\overline{\tau }\in \Pi _{i=1}^N\Delta (D^i)\) and \(i\in \{ 1,\ldots ,N\}\).

  • The global state at time \(t+1\), \(\overline{\mu _t}\), is given by the aggregation of individual transitions of the players done by the formula

    $$\begin{aligned} \mu _{t+1}^{i}(\cdot )=\Phi ^i(\cdot \mid \overline{\tau _t}):=\int _{D^i}Q^i(\cdot \mid s,a,\overline{\tau _t})\tau _t^i(ds\times da). \end{aligned}$$

    As it can be clearly seen, the transition of the global state is deterministic.

A sequence \(\pi ^i=\{\pi _t^i\}_{t=0}^\infty \) of functions \(\pi _t^i:S^i\rightarrow \Delta (A)\), such that \(\pi _t^i(B\mid \cdot )\) is measurable for any \(B\in \mathcal {B}(A)\) and any t, satisfying \(\pi _t^i(A^i(s)\mid s )=1\) for every \(s\in S^i\) and every t, is called a Markov strategy for a player of population i. A function \(f^i:S^i \rightarrow \Delta (A)\), such that \(f^i(B\mid \cdot )\) is measurable for any \(B\in \mathcal {B}(A)\), satisfying \(f^i(A^i(s)\mid s )=1\) for every \(s\in S^i\) is called a stationary strategy. The set of all Markov strategies for players from ith population is denoted by \(\mathcal {M}^i\) while that of stationary strategies by \(\mathcal {F}^i\). As in MDPs, stationary strategies can be seen as a specific case of Markov strategies that do not depend on t. In the paper we never consider general (history-dependent) strategies.Footnote 3

Next, let \(\Pi ^i_t(\pi ^i,\mu ^i)\) denote the state-action distribution of the ith population players at time t in the mean-field game corresponding to the distribution of individual states in population i, \({\mu }^i\) and a Markov strategy for players of population i, \(\pi ^i\in \mathcal {M}^i\), that is

$$\begin{aligned} \Pi _t^i(\pi ^i,{\mu }^i)(B):=\int _B \pi _t^i(da\mid s)\mu ^i(ds)\quad \text{ for } B\in \mathcal {B}(D^i). \end{aligned}$$

The vector \((\Pi _t^1(\pi ^1,{\mu }^1),\ldots ,\Pi _t^N(\pi ^N,{\mu }^N))\) will be denoted by \(\overline{\Pi }_t(\overline{\pi },\overline{\mu })\). When we use this notation for stationary strategies, we skip the subscript t.

Given the evolution of the global state, which depends on the strategies of the players in a deterministic manner, we can define the individual history of a player \(\alpha \) (from any given population i) as the sequence of his consecutive individual states and actions \(h=(s^\alpha _0,a^\alpha _0,s^\alpha _1,a^\alpha _1,\ldots )\). By the Ionescu-Tulcea theorem (see Chap. 7 in [12]), for any Markov strategies \(\pi ^\alpha \) of player \(\alpha \) and \(\sigma ^1,\ldots ,\sigma ^N\) of other players (including all other players of the same population), any initial global state \(\overline{\mu _0}\) and any initial private state of player \(\alpha \), s, there exists a unique probability measure \(\mathbb {P}^{s,\overline{\mu _0},\overline{Q},\pi ^\alpha ,\overline{\sigma }}\) on the set of all infinite individual histories of the game \(H=(D^i)^\infty \) endowed with Borel \(\sigma \)-algebra, such that for any \(B\in \mathcal {B}(S^i)\), \(E\in \mathcal {B}(A)\) and any partial history \(h^\alpha _t=(s^\alpha _0,a^\alpha _0,\ldots ,s^\alpha _{t-1},a^\alpha _{t-1},s^\alpha _t)\in (D^i)^t\times S^i=:H_t\), \(t\in \mathbb {N}\),

$$\begin{aligned}{} & {} \mathbb {P}^{s,\overline{\mu _0},\overline{Q},\pi ^\alpha ,\overline{\sigma }}(h\in H: s^\alpha _0=s)=1, \end{aligned}$$
(1)
$$\begin{aligned}{} & {} \mathbb {P}^{s,\overline{\mu _0},\overline{Q},\pi ^\alpha ,\overline{\sigma }}(h\in H: a^\alpha _t\in E\mid h^\alpha _t)=\pi _t^\alpha (E\mid s^\alpha _t), \end{aligned}$$
(2)
$$\begin{aligned} \mathbb {P}^{s,\overline{\mu _0},\overline{Q},\pi ^\alpha ,\overline{\sigma }}(h\in H: s^\alpha _{t+1}\in B\mid (h^\alpha _t,a^\alpha _t))=Q^i(B\mid s^\alpha _t,a^\alpha _t,\overline{\tau ^t}), \end{aligned}$$

with state-action distributions defined by \(\tau ^j_0=\Pi ^j_0(\sigma ^j,\mu ^j_0)\), \(\tau ^j_{t+1}=\Pi ^i_t(\sigma ^j,\Phi ^j(\cdot \mid \overline{\tau ^t}))\) for \(t=1,2,\ldots \) and \(j=1,\ldots ,N\).

Now we are ready to define the two types of reward we shall consider in this paper. For \(\beta \in (0,1)\), the \(\beta \)-discounted rewardFootnote 4 for a player \(\alpha \) from population i using policy \(\pi ^i\in \mathcal {M}^i\) when other players use policies \(\sigma ^j\in \mathcal {M}^j\) (depending on the population j they belong to) and the initial global state is \(\overline{\mu _0}\), with the initial individual state of player \(\alpha \) being \(s^i_0\) is defined as follows:

$$\begin{aligned} J^i_\beta (s^i_0,\overline{\mu _0},\pi ^i,\overline{\sigma })=\mathbb {E}^{s^i_0,\overline{\mu _0},\overline{Q},\pi ^i,\overline{\sigma }}\sum _{t=0}^\infty \beta ^tr^i(s_t^i,a_t^i,\overline{\tau ^t}), \end{aligned}$$

where \(\tau ^j_0=\Pi _0^j(\sigma ^j,\mu ^j_0)\), \(\tau ^j_{t+1}=\Pi _t^j(\sigma ^j,\Phi ^j(\cdot \mid \overline{\tau ^t}))\) for \(t=1,2,\ldots \) and \(j=1,\ldots ,N\).

To define the total reward in our game let us distinguish one state in S, say \(s^*\), isolated from \(S{\setminus }\{ s^*\}\) and assume that \(A^i(s^*)=\{ a^*\}\) independently of \(i\in \{ 1,\ldots ,N\}\) for some fixed \(a^*\) isolated from \(A{\setminus }\{ a^*\}\). Moreover, let us assume that \(s^*\in S^i\) for \(i=1,\ldots ,N\). Then the total reward of a player from population i using policy \(\pi ^i\in \mathcal {M}^i\) when other players apply policies \(\overline{\sigma }=(\sigma ^1,\ldots ,\sigma ^N)\) and the initial global state is \(\overline{\mu _0}\), with the initial individual state of player \(\alpha \) being \(s_0^i\), is defined in the following way:

$$\begin{aligned} J^i_*(s^i_0,\overline{\mu _0},\pi ^i,\overline{\sigma })=\mathbb {E}^{s^i_0,\overline{\mu _0},\overline{Q},\pi ^i,\overline{\sigma }}\sum _{t=0}^{\mathcal {T}^i-1}r^i(s_t^i,a_t^i,\overline{\tau ^t}), \end{aligned}$$

where \(\tau ^j_0=\Pi _0^j(\sigma ^j,\mu ^j_0)\), \(\tau ^j_{t+1}=\Pi _t^j(\sigma ^j,\Phi ^j(\cdot \mid \overline{\tau ^t}))\) for \(t=1,2,\ldots \) and \(j=1,\ldots ,N\), while \(\mathcal {T}^i\) is the moment of the first arrival of the process \(\{ s_t^i\}\) to \(s^*\) (we assume it is finite with probability 1Footnote 5). The total reward is interpreted as the reward accumulated by the player over the whole of his lifetime. State \(s^*\) is an artificial state (so is action \(a^*\)) denoting that a player is dead. \(\overline{\mu _0}\) corresponds to the distribution of the states across the population when he is born, while \(s^i_0\) is his own state when he is born. The fact that after some time the state of a player can become again different from \(s^*\) should be interpreted as that after some time the player is replaced by some new-born one.

As the type of reward introduced above is not commonly used in the literature, below we present an example of a situation that can be modeled as a mean-field game with this type of optimality criterion.

Example 1

Suppose a water region is monitored using a large population of wireless sensors. The job of each sensor is to make measurements of the state of the water and send them to some base station. Each sensor is powered by a battery whose capacity is limited. The private state of a sensor is two-dimensional: it consists of its last measurement and its battery level. The action of a sensor is the frequency at which it sends its data to the base station. The higher the frequency is, the better the quality of the monitoring service becomes, but at the same time the speed at which the battery depletes increases. Thus the goal of each sensor is to use its available power efficiently. This can be modeled by the summation of its one-step rewards for the monitoring activity (which are monotonic in the urgency of data sent—that is, if the measurments point to something harmful, the speed of sending them becomes important, hence, the one-step reward becomes high) over a lifetime of its battery, which corresponds to the total reward defined above. Note, that in such a setting the reward for each sensor is computed over a different time frame from the moment its battery is replaced (we may assume this is done a time unit after the battery is emptied) until the moment it depletes again. At the same time the players are symmetric and the situation is stationary as far as the primitives of the model are concerned.

Next, we define the solutions we will be looking for:

Definition 1

Stationary strategies \(f^1\in \mathcal {F}^1,\ldots f^N\in \mathcal {F}^N\) and a global state \(\overline{\mu }\in \Pi _{i=1}^N\Delta (S^i)\) form a stationary mean-field equilibrium in the \(\beta \)-discounted reward game if for any i, \(s^i_0\in S^i\), and every other stationary strategy of a player from population i, \(g^i\in \mathcal {F}^i\)

$$\begin{aligned} J^i_\beta (s^i_0,\overline{\mu },f^i,\overline{f})\ge J^i_\beta (s^i_0,\overline{\mu },g^i,\overline{f}) \end{aligned}$$

and if \(\overline{\mu }_0=\overline{\mu }\), then \(\overline{\mu }_t=\overline{\mu }\) for every \(t\ge 1\) if strategies \(f^1,\ldots ,f^N\) are used by all the players.

Markov strategies \(\pi ^1\in \mathcal {M}^1,\ldots \pi ^N\in \mathcal {M}^N\) and a global state flow \((\overline{\mu }_0^*,\overline{\mu }_1^*,\ldots )\in (\Pi _{i=1}^N\Delta (S^i))^\infty \) form a Markov mean-field equilibrium in the \(\beta \)-discounted reward game if for any i, \(s^i_0\in S^i\), and every other Markov strategy of a player from population i, \(\sigma ^i\in \mathcal {M}^i\)

$$\begin{aligned} J^i_\beta (s_0^i,\overline{\mu _0},\pi ^i,\overline{\pi })\ge J^i_\beta (s_0^i,\overline{\mu _0},\sigma ^i,\overline{\pi }) \end{aligned}$$

and if \(\overline{\mu }_0=\overline{\mu }^*_0\) implies \(\overline{\mu }_t=\overline{\mu }^*_t\) for every \(t\ge 1\) if strategies \(\pi ^1,\ldots ,\pi ^N\) are used by all the players.

Similarly,

Definition 2

Stationary strategies \(f^1\in \mathcal {F}^1,\ldots f^N\in \mathcal {F}^N\) and a global state \(\overline{\mu }\in \Pi _{i=1}^N\Delta (S^i)\) form a stationary mean-field equilibrium in the total reward game if for any i, \(s_i^0\in S^i\), and every other stationary strategy of a player from population i, \(g^i\in \mathcal {F}^i\)

$$\begin{aligned} J^i_*(s^i_0,\overline{\mu },f^i,\overline{f})\ge J^i_*(s^i_0,\overline{\mu },g^i,\overline{f}). \end{aligned}$$

Moreover, if \(\overline{\mu }_0=\overline{\mu }\), then \(\overline{\mu }_t=\overline{\mu }\) for every \(t\ge 1\) if strategies \(f^1,\ldots ,f^N\) are used by all the players.

Markov strategies \(\pi ^1\in \mathcal {M}^1,\ldots \pi ^N\in \mathcal {M}^N\) and a global state flow \((\overline{\mu }_0^*,\overline{\mu }_1^*,\ldots )\in (\Pi _{i=1}^N\Delta (S^i))^\infty \) form a Markov mean-field equilibrium in the total reward game if for any i, t, \(s_i^t\in S^i\) and every other Markov strategy of a player from population i, \(\sigma ^i\in \mathcal {M}^i\),

$$\begin{aligned} J^i_*(s^i_t,\overline{\mu }_t^*,{}^t\pi ^i,{}^t\overline{\pi })\ge J^i_*(s^i_t,\overline{\mu }_t^*,{}^t\sigma ^i,{}^t\overline{\pi }), \end{aligned}$$

with \({}^ta\) denoting for any infinite vector \(a=(a_0,a_1,\ldots )\), the vector \((a_t,a_{t+1},\ldots )\). Moreover, if \(\overline{\mu }_0=\overline{\mu }^*_0\), then \(\overline{\mu }_t=\overline{\mu }^*_t\) for every \(t\ge 1\) if strategies \(\pi ^1,\ldots ,\pi ^N\) are used by all the players.

3 Preliminaries

As we have written, we assume that S and A are metric spaces. The metric on S will be denoted by \(d_S\) while that on A by \(d_A\). Whenever we relate to a metric on a product space, we mean the sum of the metrics on its coordinates. Some of the assumptions presented below will be given with respect to the moment function \(w_0:S\rightarrow [1,\infty )\), that is a continuous function satisfying

$$\begin{aligned} \lim _{n\rightarrow \infty }\inf _{s\in S{\setminus } K_n}w_0(s)=\infty \end{aligned}$$

for some sequence \(\{ K_n\} _{n\ge 1}\) of compact subsets of S.

In order to study both bounded and unbounded one-stage reward functions, we define the following function:

$$\begin{aligned} w:=\left\{ \begin{array}{ll} 1,&{} \text{ if } \text{ each } r_i \text{ is } \text{ bounded }\\ w_0,&{} \text{ otherwise }\end{array}\right. \end{aligned}$$

For any function \(h:S\rightarrow \mathbb {R}\) we define its w-norm as

$$\begin{aligned} \left\| h\right\| _w:=\sup _{s\in S}\left|\frac{h(s)}{w(s)}\right|. \end{aligned}$$

Whenever we speak of functions defined on a product of S and some other space, their w-norm is defined similarly, with the help of the same function w.

By \(B_w(S)\) we denote the space of all real-valued measurable functions from S to \(\mathbb {R}\) with finite w-norm. and by \(C_w(S)\)—the space of all continuous functions in \(B_w(S)\). Clearly, both \(B_w(S)\) and \(C_w(S)\) are Banach spaces. The same can be said of \(B_w(S\times A)\) and \(C_w(S\times A)\)—the spaces of bounded and bounded continuous functions from \(S\times A\) to \(\mathbb {R}\) with finite w-norm.Footnote 6

Analogously, for any finite signed measure \(\mu \) on S, we define the w-norm of \(\mu \) as

$$\begin{aligned} \left\| \mu \right\| _w=\sup _{g\in B_w(S),\Vert g\Vert _w\le 1}\left|\int _S g(s)\mu (ds)\right|. \end{aligned}$$

It should be noted that in case \(w\equiv 1\), \(\Vert \mu \Vert _w\) is the total variation distance (see e.g. [33], Section 7.2).

There are two standard types of convergence of probability measures which are used in the paper: the weak convergence denoted by \(\Rightarrow \) and the strong (or setwise) convergence denoted by \(\rightarrow \) and defined (for any Borel space \((X,\mathcal {B}(X))\)) by

$$\begin{aligned} \mu _n\rightarrow \mu \quad \Longleftrightarrow \quad \mu _n(B)\rightarrow \mu (B) \text{ for } \text{ any } B\in \mathcal {B}(X). \end{aligned}$$

It is known (see e.g. [45], Theorem 6.6) that the weak topology can be metrized using the metric

$$\begin{aligned} \rho (\mu ,\nu ):=\sum _{m=1}^\infty 2^{-m}\left|\int _S\phi _m(s)\mu (ds)-\int _S\phi _m(s)\nu (ds)\right|, \end{aligned}$$

where \(\{\phi _i\}_{i\ge 1}\) is a sequence of continuous bounded functions from S to \(\mathbb {R}\) whose elements form a dense subset of the unit ball in C(S). Strong convergence topology is in general not metric.

Next, let

$$\begin{aligned} \Delta _w(S):=\left\{ \mu \in \Delta (S):\int _Sw(s)\mu (ds)<\infty \right\} . \end{aligned}$$

It has been shown in [49] that \(\Delta _w(S)\) can be metrized using the metric

$$\begin{aligned} \rho _w(\mu ,\nu ):=\rho (\mu ,\nu )+\left|\int _Sw(s)\mu (ds)-\int _Sw(s)\nu (ds)\right|\end{aligned}$$

We will use the topology defined by this metric (called w-topology in the sequel) as the standard topology on \(\Delta _w(S)\).

We will also use the notation

$$\begin{aligned} \Delta _w(S\times A):=\left\{ \tau \in \Delta (S\times A):\int _{S\times A}w(s)\tau (ds\times da)<\infty \right\} \end{aligned}$$

with analogously defined metrics also denoted by \(\rho \) (metric defining weak convergence) and \(\rho _w\) (w-metric) as well as similar notation for subsets of S or \(S\times A\).

Whenever we speak about continuity of correspondences, we refer to the following definitions:

Let X and Y be two metric spaces and \(F:X\rightarrow Y\), a correspondence. Let \(F^{-1}(G)=\{ x\in X: F(x)\cap G\ne \emptyset \}\). We say that F is upper semicontinuous iff \(F^{-1}(G)\) is closed for any closed \(G\subset Y\). F is lower semicontinuous iff \(F^{-1}(G)\) is open for any open \(G\subset Y\). F is said to be continuous iff it is both upper and lower semicontinuous. For more on (semi)continuity of correspondences see [32], Appendix D or [4], Chapter 17.2.

4 The Existence of Stationary and Markov Mean Field Equilibria in Discounted Payoff Game

4.1 Assumptions

In this section, we address the problem of the existence of an equilibrium in discrete-time mean-field games with \(\beta \)-discounted payoff. We begin by presenting the set of assumptions used in our results.

(A1):

For \(i=1,\ldots ,N\), \(r^i\) is continuous and bounded above by some constant R on \(D^i\times \Pi _{i=1}^N\Delta (D^i)\). Moreover, for \(i=1,\ldots ,N\) and \(s\in S^i\),

$$\begin{aligned} \inf _{(a,\overline{\tau })\in A^i(s)\times \Pi _{i=1}^N\Delta _w(D^i)}r^i(s,a,\overline{\tau })\ge -Rw(s). \end{aligned}$$
(A2):

For \(i=1,\ldots ,N\) and any sequence \(\{ s_n,a_n,\overline{\tau }_n\}\subset D^i\times \Pi _{i=1}^N\Delta (D^i)\) such that \(s_n\rightarrow s_*\), \(a_n\rightarrow a_*\) and \(\overline{\tau }_n\Rightarrow \overline{\tau }^*\), \(Q^i(\cdot \mid s_n,a_n,\overline{\tau }_n)\rightarrow Q(\cdot \mid s_*,a_*,\overline{\tau }^*)\). Moreover,

(a):

for \(i=1,\ldots ,N\) the functions

$$\begin{aligned} \int _S w(s')Q^i(ds'\mid s,a,\overline{\tau }) \end{aligned}$$

are continuous in \((s,a,\overline{\tau })\),

(b):

for \(i=1,\ldots ,N\) and \(s\in S^i\)

$$\begin{aligned} \sup _{(a,\overline{\tau })\in A^i(s)\times \Pi _{i=1}^N\Delta (D^i)}\int _Sw(s')Q^i(s'\mid s,a,\overline{\tau })\le w(s). \end{aligned}$$
(A3):

For \(i=1,\ldots ,N\), correspondences \(A^i\) are continuous.

In some theorems weaker versions of assumptions (A1) and (A2) will be used:

(A1’):

For \(i=1,\ldots ,N\), \(r^i\) is continuous and bounded above by some constant R on \(D^i\times \Pi _{i=1}^N\Delta (D^i)\). Moreover, there exist non-negative constants \(\alpha \), \(\gamma \), M satisfying \(\alpha \beta \gamma <1\) and

$$\begin{aligned} \int _Sw(s)\mu _0^i(ds)\le M\quad \text{ for } i=1,\ldots ,N, \end{aligned}$$

and such that for \(i=1,\ldots ,N\), \(s\in S^i\) and \(t=0,1,2,\ldots \),

$$\begin{aligned} \inf _{(a,\overline{\tau })\in A^i(s)\times \Pi _{i=1}^N\Delta _w^{(t)}(D^i)}r^i(s,a,\overline{\tau })\ge -R\gamma ^tw(s) \end{aligned}$$

with \(\Delta _w^{(t)}(D^i):=\left\{ \tau ^i\in \Delta _w(D^i): \int _{D^i}w(s)\tau ^i(ds\times da)\le \alpha ^tM\right\} \).

(A2’):

For \(i=1,\ldots ,N\) and any sequence \(\{ s_n,a_n,\overline{\tau }_n\}\subset D^i\times \Pi _{i=1}^N\Delta _w(D^i)\) such that \(s_n\rightarrow s_*\), \(a_n\rightarrow a_*\) and \(\overline{\tau }_n\Rightarrow \overline{\tau }^*\), \(Q^i(\cdot \mid s_n,a_n,\overline{\tau }_n)\Rightarrow Q(\cdot \mid s_*,a_*,\overline{\tau }^*)\). Moreover,

(a):

for \(i=1,\ldots ,N\) the functions

$$\begin{aligned} \int _S w(s')Q^i(ds'\mid s,a,\overline{\tau }) \end{aligned}$$

are continuous in \((s,a,\overline{\tau })\),

(b):

for \(i=1,\ldots ,N\) and \(s\in S^i\)

$$\begin{aligned} \sup _{(a,\overline{\tau })\in A^i(s)\times \Pi _{i=1}^N\Delta (D^i)}\int _Sw(s')Q^i(s'\mid s,a,\overline{\tau })\le \alpha w(s). \end{aligned}$$

4.2 Main Results

In the first main result of this section we prove the existence of stationary mean-field equilibrium in discounted discrete-time mean-field games.

Theorem 1

Suppose that the assumptions (A1–A3) are satisfied. Then for any \(\beta \in (0,1)\) the multi-population discrete-time mean-field game with \(\beta \)-discounted payoff defined with \(r^i\), \(Q^i\), \(S^i\) and \(A^i\), \(i=1,\ldots ,N\), has a stationary mean-field equilibrium.

Remark 1

It should be noted here that the results given in Theorem 1 applied to the model with a single population extend the existing results for such a case. The most general result of this type in the literature appears in [39] and concerns the case with compact individual state space. Here, the individual state spaces are arbitrary closed subsets of a locally compact Polish space. As a consequence, also the reward functions are not necessarily bounded (we only assume that they are bounded above and have a finite w-norm).

In the proof of the theorem we adapt the techniques introduced in [39] to our case. We precede the proof of the theorem with two lemmas.

Lemma 2

For any \(\overline{\tau }\in \Pi _{i=1}^N\Delta (D^i)\) letFootnote 7

$$\begin{aligned} V^i_{\beta ,\overline{\tau }}(s):=\max _{f^i\in \mathcal {F}^i}\mathbb {E}^{\delta _s,\overline{Q},f^i}\sum _{t=0}^\infty \beta ^tr^i(s_t^i,a_t^i,\overline{\tau }), \end{aligned}$$

that is, let \(V^i_{\beta ,\overline{\tau }}\) be the optimal value for the \(\beta \)-discounted Markov decision process of player from population i when the behaviour of all the other players is described by the state-action measure \(\overline{\tau }\), fixed over time. Under assumptions (A1–A3) \(V^i_{\beta ,\overline{\tau }}(s)\) is jointly continuous in \((s,\overline{\tau })\) on \(S^i\times \Pi _{i=1}^N\Delta _w(D^i)\) and \(\Vert V^i_{\beta ,\overline{\tau }}\Vert _w\le \frac{R}{1-\beta }\).

Proof

Let us fix an \(i\in \{ 1,\ldots ,N\}\) and define for any \(\overline{\tau }\in \Pi _{j=1}^N\Delta _w(D^j)\)

$$\begin{aligned} T^i_{\overline{\tau }}(u)(s):=\sup _{a\in A^i(s)}\left[ r^i(s,a,\overline{\tau })+\beta \int _Su(s')Q^i(ds'\mid s,a,\overline{\tau })\right] . \end{aligned}$$

Note, that clearly (by assumptions (A1) and (A2) (b)) \(T^i_{\overline{\tau }}\) maps \(B_w(S^i)\) into itself. Moreover, for any \(u_1,u_2\in B_w(S^i)\),

$$\begin{aligned} \sup _{s\in S}\left|\frac{T^i_{\overline{\tau }}(u_1)(s)-T^i_{\overline{\tau }}(u_2)(s)}{w(s)}\right|\le & {} \sup _{s\in S, a\in A^i(s)}\frac{\beta \int _S\left|(u_1(s')-u_2(s'))Q^i(ds'\mid s,a,\overline{\tau })\right|}{w(s)}\nonumber \\\le & {} \beta \sup _{s\in S,a\in A^i(s)}\frac{\int _S\Vert u_1-u_2\Vert _ww(s')Q^i(ds'\mid s,a,\overline{\tau })}{w(s)}\nonumber \\\le & {} \beta \Vert u_1-u_2\Vert _w\sup _{s\in S }\frac{w(s)}{w(s)}=\beta \Vert u_1-u_2\Vert _w, \end{aligned}$$
(3)

where the penultimate inequality follows from the definition of the w-norm, while the last one from assumption (A2) (b). Hence, \(T^i_{\overline{\tau }}\) is a contraction defined on a complete space. By the Banach fixed point theorem it has a unique fixed point, which is by Theorem 4.2.3 in [32] equal to \(V^i_{\beta ,\overline{\tau }}\). Moreover, this fixed point can be obtained as \(\lim _{n\rightarrow \infty }\left( T^i_{\overline{\tau }}\right) ^n(u_0)\) for any given \(u_0\in B_w(S^i)\).

Let \(u_0^{\overline{\tau }}\equiv 0\) and define for \(n=1,2,\ldots \) \(u_n^{\overline{\tau }}:=T^i_{\overline{\tau }}(u_{n-1}^{\overline{\tau }})\). We will next show that for each n, \(u_n^{\overline{\tau }}(s)\) is continuous in \((s,{\overline{\tau }})\) and \(\Vert u_n^{\overline{\tau }}\Vert _w\le \frac{R}{1-\beta }\).

We prove these statements by induction on n. For \(n=0\) both claims are obvious. Suppose they hold for \(n=k-1\). Then by Theorem 3.3 in [54] (see also Remark 3.4 (ii) there—the assumptions given there are satisfied with \(g=\frac{R}{1-\beta }w\) by (A2) (b)), \(r^i(s,a,\overline{\tau })+\beta \int _Su_{k-1}^{\overline{\tau }}(s')Q^i(ds'\mid s,a,\overline{\tau })\) is jointly continuous in \((s,a,\overline{\tau })\), hence, by Proposition 7.32 in [12]

$$\begin{aligned} u_{k}^{\overline{\tau }}(s)=\sup _{a\in A^i(s)}\left[ r^i(s,a,\overline{\tau })+\beta \int _Su_{k-1}^{\overline{\tau }}(s')Q^i(ds'\mid s,a,\overline{\tau })\right] \end{aligned}$$

is also (jointly) continuous. We also have (here the third inequality is a consequence of assumption (A2) (b), while the fourth one follows from (A1) and our inductive assumption):

$$\begin{aligned}&\left\| T^i_{\overline{\tau }}(u_{k-1}^{\overline{\tau }})\right\| _w\le \sup _{(s,a)\in D^i}\frac{|r^i(s,a,\overline{\tau })|+\beta \left|\int _Su_{k-1}^{\overline{\tau }}(s')Q^i(ds'\mid s,a,\overline{\tau })\right|}{w(s)}\\&\quad \le \sup _{(s,a)\in D^i}\frac{|r^i(s,a,\overline{\tau })|}{w(s)}+\sup _{(s,a)\in D^i}\frac{\beta \Vert u_{k-1}^{\overline{\tau }}\Vert _w\int _Sw(s')Q^i(ds'\mid s,a,\overline{\tau })}{w(s)}\\&\quad \le \sup _{(s,a)\in D^i}\frac{|r^i(s,a,\overline{\tau })|}{w(s)}+\sup _{(s,a)\in D^i}\frac{\beta \Vert u_{k-1}^{\overline{\tau }}\Vert _ww(s)}{w(s)}\le R+\frac{\beta R}{1-\beta }=\frac{R}{1-\beta }. \end{aligned}$$

Thus, the second claim has been proved for \(n=k\).

To finish the proof, let us take convergent sequences \(\{ s_k\}_{k\ge 1}\) in \(S^i\) and \(\{ \overline{\tau }_k\}_{k\ge 1}\) in \(\Pi _{j=1}^N\Delta _w(D^j)\) such that \(s_k\rightarrow s_*\) and \(\overline{\tau }_k\Rightarrow \overline{\tau }_*\). We will show that \(V^i_{\beta ,\overline{\tau }_k}(s_k)\rightarrow V^i_{\beta ,\overline{\tau }_*}(s_*)\). We start the proof by noticing that the set \(K:=\{ s_k: k\ge 1\}\cup \{ s_*\}\) is clearly compact, hence there exists a value W such that \(W\ge |w(s)|\) for \(s\in K\). Now, fix any \(\varepsilon >0\) and let \(n_0\) be such that

$$\begin{aligned} W\beta ^{n_0}\frac{R}{1-\beta }<\frac{\varepsilon }{3}. \end{aligned}$$

Clearly, by repeated use of (3) for \(u_1=u_0^{\overline{\tau }_*}\) and \(u_2=V^i_{\beta ,\overline{\tau }_*}\), we obtain

$$\begin{aligned} \Vert u_{n_0}^{\overline{\tau }_*}-V^i_{\beta ,\overline{\tau }_*}\Vert _w\le \beta ^{n_0}\left( \Vert u_{0}^{\overline{\tau }_*}-V^i_{\beta ,\overline{\tau }_*}\Vert _w\right) =\beta ^{n_0}\Vert V^i_{\beta ,\overline{\tau }_*}\Vert _w\le \beta ^{n_0}\frac{R}{1-\beta }, \end{aligned}$$

hence

$$\begin{aligned} \left|u_{n_0}^{\overline{\tau }_*}(s_*)-V^i_{\beta ,\overline{\tau }_*}(s_*)\right|\le w(s_*)\Vert u_{n_0}^{\overline{\tau }_*}-V^i_{\beta ,\overline{\tau }_*}\Vert _w\le W\beta ^{n_0}\frac{R}{1-\beta }<\frac{\varepsilon }{3}. \end{aligned}$$
(4)

Similarly we obtain that for any \(k\ge 1\),

$$\begin{aligned} \left|u_{n_0}^{\overline{\tau }_k}(s_k)-V^i_{\beta ,\overline{\tau }_k}(s_k)\right|\le w(s_k)\Vert u_{n_0}^{\overline{\tau }_k}-V^i_{\beta ,\overline{\tau }_k}\Vert _w\le W\beta ^{n_0}\frac{R}{1-\beta }<\frac{\varepsilon }{3}. \end{aligned}$$
(5)

Finally, from the joint continuity of \(u_{n_0}^{\cdot }(\cdot )\), there exists a \(k_0\in \mathbb {N}\) such that for any \(k\ge k_0\)

$$\begin{aligned} \left|u_{n_0}^{\overline{\tau }_k}(s_k)-u_{n_0}^{\overline{\tau }_*}(s_*)\right|<\frac{\varepsilon }{3}. \end{aligned}$$
(6)

Now, combining (4), (5) and (6), we obtain that for any \(k\ge k_0\)

$$\begin{aligned}{} & {} \left|V^i_{\beta ,\overline{\tau }_k}(s_k)-V^i_{\beta ,\overline{\tau }_*}(s_*)\right|\le \left|u_{n_0}^{\overline{\tau }_k}(s_k)-V^i_{\beta ,\overline{\tau }_k}(s_k)\right|\\{} & {} \quad +\left|u_{n_0}^{\overline{\tau }_k}(s_k)-u_{n_0}^{\overline{\tau }_*}(s_*)\right|+\left|u_{n_0}^{\overline{\tau }_*}(s_*)-V^i_{\beta ,\overline{\tau }_*}(s_*)\right|<\varepsilon , \end{aligned}$$

which ends the proof that \(V^i_{\beta ,\cdot }(\cdot )\) is continuous. Proof that \(\Vert V^i_{\beta ,\overline{\tau }}\Vert _w\le \frac{R}{1-\beta }\) is elementary. \(\square \)

In the next lemma we show that for any i and a stationary strategy from \(\mathcal {F}^i\), the invariant measure of the Markov chain of its individual states will be in a subset of probability measures on \(S^i\) whose w norm is bounded by M.

Lemma 3

Suppose assumptions (A1–A3) hold and \(M>0\) is such that for each \(i\in \{ 1,\ldots ,N\}\) the set

$$\begin{aligned} \Delta _w^M(S^i):=\left\{ \mu \in \Delta (S^i): \int _{S^i}w(s)\mu (ds)\le M\right\} \end{aligned}$$

is nonempty. Then for each \(\overline{\tau }\in \Pi _{j=1}^N\Delta (D^j)\), \(i=1,\ldots ,N\) and any stationary strategy \(f^i\in \mathcal {F}^i\), there exists a \(\mu _{f^i,\overline{\tau }}\in \Delta _w^M(S^i)\) such that

$$\begin{aligned} \mu _{f^i,\overline{\tau }}(B) =\int _{S^i}\int _{A^i(s)}Q^i(B\mid s,a,\overline{\tau })f^i(da\mid s)\mu _{f^i,\overline{\tau }}(ds) \end{aligned}$$

for any \(B\in \mathcal {B}(S)\).

Proof

Let us fix \(i\in \{ 1,\ldots ,N\}\) and note that for any \(\overline{\tau }\in \Pi _{j=1}^N\Delta (D^j)\), and any stationary strategy \(f^i\in \mathcal {F}^i\), the transition probability

$$\begin{aligned} Q^i(\cdot \mid s,f^i,\overline{\tau }):=\int _{A^i(s)}Q^i(\cdot \mid s,a,\overline{\tau })f^i(da\mid s) \end{aligned}$$
(7)

is clearly strongly continuous.

Next, suppose that the initial distribution of individual state of a player from population i is \(\rho _0^i\in \Delta _w^M(S^i)\). We prove by induction that the same is true for \(\rho _t^i\), the distribution of his state when \(t=1,2,\ldots \) if he uses strategy \(f^i\) and the behaviour of the other players is described by \(\overline{\tau }\). Suppose the thesis is true for t. Then by assumption (A2) (b) we have

$$\begin{aligned}{} & {} \int _{S^i}w(s)\rho _{t+1}^i(ds)=\int _{S^i}\int _{A^i(s)}\int _{S^i}w(s')Q^i(ds'\mid s,a,\overline{\tau })f^i(da\mid s)\rho _t^i(ds)\\{} & {} \quad \le \int _{S^i}\int _{A^i(s)}\left( \sup _{(a',\overline{\tau })\in A^i(s)\times \Pi _{i=1}^N\Delta (D^i)}\int _{S^i}w(s')Q^i(ds'\mid s,a',\overline{\tau })\right) f^i(da\mid s)\rho _t^i(ds)\\{} & {} \quad =\int _{S^i}\left( \sup _{(a',\overline{\tau })\in A^i(s)\times \Pi _{i=1}^N\Delta (D^i)}\int _{S^i}w(s')Q^i(ds'\mid s,a',\overline{\tau })\right) \rho _t^i(ds)\\{} & {} \quad \le \int _{S^i}w(s)\rho _t^i(ds)\le M \end{aligned}$$

By Remark 1 in [31] we know that the sequence of measures

$$\begin{aligned} \nu ^T(\cdot ):=\frac{1}{T}\sum _{t=0}^{T-1}\rho _t^i(\cdot ) \end{aligned}$$

(whose elements clearly belong to \(\Delta _w^M(S^i)\), as it is a convex set) has a subsequence weakly converging to an invariant measure of the Markov chain with transition probability \(Q^i(\cdot \mid \cdot ,f^i,\overline{\tau })\). Let us call it \(\mu _{f^i,\overline{\tau }}\). It can be easily showed that \(\Delta _w^M(S^i)\) is tight, hence, by Prohorov’s theorem (Theorem 6.1 in [13]) relatively compact. But this implies that \(\Delta _w^M(S^i)\) is closed in weak convergence topology, hence the invariant measure \(\mu _{f^i,\overline{\tau }}\in \Delta _w^M(S^i)\). \(\square \)

Proof of Theorem 1:

Let M be such as in Lemma 3 and for \(i=1,\ldots ,N\) let

$$\begin{aligned} \Delta _w^M(D^i):=\left\{ \tau \in \Delta (D^i): \int _{D^i}w(s)\tau (ds\times da)\le M\right\} . \end{aligned}$$

Let us further define the correspondences from \(\Pi _{j=1}^N\Delta _w^M(D^j)\) to \(\Delta _w^M(D^i)\), \(i=1,\ldots ,N\):

$$\begin{aligned} \Theta ^i(\overline{\tau }):= & {} \left\{ \eta ^i\in \Delta _w^M(D^i): \eta ^i_{S^i}(\cdot )=\int _{D^i}Q^i(\cdot \mid s,a,\overline{\tau })\eta ^i(ds\times da) \right\} , \\ \Psi ^i_\beta (\overline{\tau }):= & {} \left\{ \eta ^i\in \Theta ^i(\overline{\tau }): \int _{S^i}V^i_{\beta ,\overline{\tau }}(s)\eta ^i_{S^i}(ds)\right. \\ {}= & {} \left. \int _{D^i}\left[ r^i(s,a,\overline{\tau })+\beta \int _{S^i}V^i_{\beta ,\overline{\tau }}(s')Q^i(ds'\mid s,a,\overline{\tau })\right] \eta ^i(ds\times da) \right\} \end{aligned}$$

We will now verify, that for each i, \(\Theta ^i\) and \(\Psi ^i_\beta \) have some useful properties. We fix \(i\in \{ 1,\ldots ,N\}\) for all these considerations.

First note that \(\eta =\Pi ^i(f^i,\mu _{f^i,\overline{\tau }})\) clearly belongs to \(\Theta ^i(\overline{\tau })\), as for any \(B\in \mathcal {B}(S^i)\),

$$\begin{aligned}{} & {} \left( \Pi ^i(f^i,\mu _{f^i,\overline{\tau }})\right) _{S^i}(B)=\mu _{f^i,\overline{\tau }}(B) =\int _{S^i} Q^i(B\mid s,f^i,\overline{\tau })\mu _{f^i,\overline{\tau }}(ds)\\{} & {} \quad =\int _{D^i} Q^i(B\mid s,a,\overline{\tau })f^i(da\mid s)\mu _{f^i,\overline{\tau }}(ds)=\int _{D^i}Q^i(B\mid s,a,\overline{\tau })\Pi ^i(f^i,\mu _{f^i,\overline{\tau }})(ds\times da) \end{aligned}$$

where the first and the last equality follow from the definition of \(\Pi ^i(\cdot ,\cdot )\), the second comes from the definition of invariant measure, while the third one from (7). Moreover, by Lemma 3\(\mu _{f^i,\overline{\tau }}\in \Delta _w^M(S^i)\), which immediately implies that \(\eta =\Pi ^i(f^i,\mu _{f^i,\overline{\tau }})\in \Delta _w^M(D^i)\).

We next show that the graph of \(\Theta ^i\) is closed in weak convergence topology. To prove that, first note that for any bounded continuous function \(u:S^i\rightarrow \mathbb {R}\), \(\int _{S^i}u(s)Q^i(ds\mid \cdot ,\cdot ,\cdot )\) is, by the strong continuity of \(Q^i\), a continuous function, so for any sequences \(\eta _n^i\in \Delta (D^i)\) and \(\overline{\tau _n}\in \Pi _{j=1}^N\Delta (D^j)\) such that \(\eta _n^i\in \Theta ^i(\overline{\tau _n})\) with \(\eta _n^i\Rightarrow \eta ^i\) and \(\overline{\tau _n}\Rightarrow \overline{\tau }\), \(\int _{S^i}u(s)Q^i(ds\mid \cdot ,\cdot ,\overline{\tau _n})\) converges continuously to \(\int _{S^i}u(s)Q^i(ds\mid \cdot ,\cdot ,\overline{\tau })\). Hence, by Theorem 3.3 in [54], we have

$$\begin{aligned} \int _{D^i}\int _{S^i}u(s)Q^i(ds\mid \widehat{s},\widehat{a},\overline{\tau _n})\eta _n^i(d\widehat{s}\times d\widehat{a})\rightarrow _{n\rightarrow \infty }\int _{D^i}\int _{S^i}u(s)Q^i(ds\mid \widehat{s},\widehat{a},\overline{\tau })\eta ^i(d\widehat{s}\times d\widehat{a}), \end{aligned}$$

which means that \(\int _{D^i}Q^i(\cdot \mid s,a,\overline{\tau _n})\eta _n^i(ds\times da)\Rightarrow \int _{D^i}Q^i(\cdot \mid s,a,\overline{\tau })\eta ^i(ds\times da)\). From the uniqueness of the limit this implies that \(\left( \eta ^i\right) _{S^i}=\int _{D^i}Q^i(\cdot \mid s,a,\overline{\tau })\eta ^i(ds\times da)\), hence \(\eta ^i\in \Theta ^i(\overline{\tau })\), which implies that the graph of \(\Theta ^i\) is closed.

By Theorem 4.2.3 in [32] there exists an optimal stationary policy \(f^i_*\) in the optimization problem of a player from population i maximizing his discounted reward when the behaviour of all the other players is described by the global state \(\overline{\tau }\), fixed over time. Moreover, \(f^i_*\) is a measurable selector attaining maximum on the RHS of the equation

$$\begin{aligned} V^i_{\beta ,\overline{\tau }}(s)=\max _{a\in A^i(s)}\left[ r^i(s,a,\overline{\tau })+\beta \int _SV^i_{\beta ,\overline{\tau }}(s')Q^i(ds'\mid s,a,\overline{\tau })\right] . \end{aligned}$$
(8)

Then we can write that

$$\begin{aligned}{} & {} \int _{S^i}V^i_{\beta ,\overline{\tau }}(s)\left( \Pi ^i(f^i_*,\mu _{f^i_*,\overline{\tau }})\right) _{S^i}(ds)=\int _{S^i}V^i_{\beta ,\overline{\tau }}(s)\mu _{f^i_*,\overline{\tau }}(ds)\\{} & {} \quad =\int _{S^i}\left[ r^i(s,f^i_*(s),\overline{\tau })+\beta \int _{S^i}V^i_{\beta ,\overline{\tau }}(s')Q^i(ds'\mid s,f^i_*(s),\overline{\tau })\right] \mu _{f^i_*,\overline{\tau }}(ds)\\{} & {} \quad =\int _{D^i}\left[ r^i(s,a,\overline{\tau })+\beta \int _{S^i}V^i_{\beta ,\overline{\tau }}(s')Q^i(ds'\mid s,a,\overline{\tau })\right] \Pi ^i(f^i_*,\mu _{f^i_*,\overline{\tau }})(ds\times da), \end{aligned}$$

which implies that \(\Pi ^i(f^i_*,\mu _{f^i_*,\overline{\tau }})\in \Psi ^i_\beta (\overline{\tau })\).

Next we show that the graph of \(\Psi ^i_\beta \) is closed in the weak convergence topology. Let us take sequences \(\overline{\tau _n}\in \Pi _{j=1}^N\Delta _w^M(D^j)\) and \(\eta _n\in \Delta _w^M(D^i)\) such that \(\eta _n\in \Theta (\overline{\tau _n})\) for every \(n\in \mathbb {N}\) with \(\eta _n\Rightarrow \eta \) and \(\overline{\tau _n}\Rightarrow \overline{\tau }\). Since the graph of \(\Theta ^i\) is closed, to show that so is that of \(\Psi ^i_\beta \), we only need to prove that the equality defining the set \(\Psi ^i_\beta (\overline{\tau })\) is satisfied for \(\overline{\tau }\) and \(\eta \). Note however that for each n

$$\begin{aligned}{} & {} \int _{S^i}V^i_{\beta ,\overline{\tau _n}}(s)(\eta _n)_{S^i}(ds)\\{} & {} \quad =\int _{D^i}\left[ r^i(s,a,\overline{\tau _n})+\beta \int _{S^i}V^i_{\beta ,\overline{\tau _n}}(s')Q^i(ds'\mid s,a,\overline{\tau _n})\right] \eta _n(ds\times da)\nonumber \end{aligned}$$
(9)

Then by the continuity of \(Q^i\) and \(V^i_{\beta ,\overline{\tau _n}}\) and Theorem 3.3 in [54] (see also Remark 3.4 (ii) there—by (A2) (b) the assumption presented there is true for \(g=\frac{R}{1-\beta }w\)), \(\int _{S^i}V^i_{\beta ,\overline{\tau _n}}(s')Q^i(ds'\mid \cdot ,\cdot ,\overline{\tau _n})\) converges continuously to \(\int _{S^i}V^i_{\beta ,\overline{\tau }}(s')Q^i(ds'\mid \cdot ,\cdot ,\overline{\tau })\). As also \(r^i(\cdot ,\cdot ,\overline{\tau _n})\) converges continuously to \(r^i(\cdot ,\cdot ,\overline{\tau })\) by (A1), we may apply Theorem 3.3 in [54] again (again with \(g=\frac{R}{1-\beta }w\), which satisfies the assumption given in Remark 3.4 (ii) by (A1) and (A2) (b)), we can pass to the limit in (9), obtaining

$$\begin{aligned} \int _{S^i}V^i_{\beta ,\overline{\tau }}(s)(\eta )_{S^i}(ds)=\int _{D^i}\left[ r^i(s,a,\overline{\tau })+\beta \int _{S^i}V^i_{\beta ,\overline{\tau }}(s')Q^i(ds'\mid s,a,\overline{\tau })\right] \eta (ds\times da) \end{aligned}$$

which ends the proof that the graph of \(\Psi ^i\) is closed. \(\square \)

Finally, we can also note that for each \(\overline{\tau }\), the set \(\Psi ^i_\beta (\overline{\tau })\) is clearly convex.

Next, let us define the following correspondence mapping \(\Pi _{i=1}^N\Delta (D^i)\) into itself:

$$\begin{aligned} \overline{\Psi }_\beta (\overline{\tau }):=\Psi ^1_\beta (\overline{\tau })\times \cdots \times \Psi ^N_\beta (\overline{\tau }). \end{aligned}$$

It is obvious that our previous considerations imply that \(\overline{\Psi }_\beta \) also has nonempty and convex values and that its graph is closed. To finish the proof we need to note that the function \(w_0\) (which is a moment on S) is also a moment on \(S\times A\) (as A is compact), hence each \(\Delta _w^M(D^i)\) is tight. Now Prohorov’s theorem implies that \(\Delta _w^M(D^i)\) is compact in weak convergence topology for \(i=1,\ldots ,N\) and \(\Pi _{j=1}^N\Delta _w^M(D^j)\) is compact in product topology. Therefore, by the Glickberg fixed point theorem [24], \(\overline{\Psi }_\beta \) it has a fixed point.

Suppose \(\overline{\tau }_*\) is this fixed point. By the well-known result, see e.g. [34] p. 89, for each \(i\in \{ 1,\ldots ,N\}\), \(\tau ^i_*\) can be disintegrated into a stochastic kernel \(g^i_*\in \mathcal {F}^i_0\) and its marginal on \(S^i\), \((\tau ^i_*)_{S^i}\), that is, satisfying for any \(D\in \mathcal {B}(D^i)\)

$$\begin{aligned} \tau ^i_*(D)=\int _D g^i_*(da\mid s)(\tau ^i_*)_{S^i}(ds). \end{aligned}$$
(10)

Let us further define

$$\begin{aligned} S^i_0:=\left\{ s\in S^i:\int _{A^i(s)}\left[ r^i(s,a,\overline{\tau _*})+\beta \int _{S^i} V^i_{\beta ,\overline{\tau _*}}(s')Q^i(ds'\mid s,a,\overline{\tau _*})\right] g^i_*(da\mid s)<V^i_{\beta ,\overline{\tau _*}}(s)\right\} . \end{aligned}$$

Then, since \(\tau ^i_*\in \Psi ^i_\beta (\overline{\tau }_*)\),

$$\begin{aligned} \tau ^i_*(S^i_0)=0, \end{aligned}$$
(11)

otherwise inequality in the definition of \(S^i_0\) would imply an inequality in the definition of \(\Psi ^i_\beta \).

Let us thus define the strategy

$$\begin{aligned} \widehat{f^i}(s)=\left\{ \begin{array}{ll} g^i_*(s),&{} \text{ if } s\in S^i{\setminus } S^i_0\\ f^i_*(s),&{} \text{ if } s\in S^i_0\end{array}\right. \end{aligned}$$

It is clear that for any \(s\in S^i\),

$$\begin{aligned} V^i_{\beta ,\overline{\tau }_*}(s)=\int _{A^i(s)}\left[ r^i(s,a,\overline{\tau }_*)+\beta \int _{S^i}V^i_{\beta ,\overline{\tau }_*}(s')Q^i(ds'\mid s,a,\overline{\tau }_*)\right] \widehat{f^i}(da\mid s). \end{aligned}$$

Then, for any \(D\in \mathcal {B}(D^i)\) we can reason as follows:

$$\begin{aligned} \tau ^i_*(D)= & {} \int _D g^i_*(da\mid s)(\tau ^i_*)_{S^i}(ds)=\int _{S^i}\int _{A^i(s)} \mathbbm {1}_D(s,a)g^i_*(da\mid s)(\tau ^i_*)_{S^i}(ds)\\= & {} \int _{S^i_0}\int _{A^i(s)} \mathbbm {1}_D(s,a)g^i_*(da\mid s)(\tau ^i_*)_{S^i}(ds)\\{} & {} \quad +\int _{S^i{\setminus } S^i_0}\int _{A^i(s)} \mathbbm {1}_D(s,a)g^i_*(da\mid s)(\tau ^i_*)_{S^i}(ds)\\= & {} 0+\int _{S^i{\setminus } S^i_0}\int _{A^i(s)} \mathbbm {1}_D(s,a)\widehat{f^i}(da\mid s)(\tau ^i_*)_{S^i}(ds)\\= & {} \int _{S^i_0}\int _{A^i(s)} \mathbbm {1}_D(s,a)\widehat{f^i}(da\mid s)(\tau ^i_*)_{S^i}(ds)\\{} & {} \quad +\int _{S^i{\setminus } S^i_0}\int _{A^i(s)} \mathbbm {1}_D(s,a)\widehat{f^i}(da\mid s)(\tau ^i_*)_{S^i}(ds)\\= & {} \int _D\widehat{f^i}(da\mid s)(\tau ^i_*)_{S^i}(ds), \end{aligned}$$

where the first equality follows from (10), the second and the last one—from the definition of integral over a set, while the third and the fourth one use the definition of strategy \(\widehat{f^i}\) and (11).

Note however that the two last equalities proved imply that there exist strategies \(\widehat{f^i}\) and invariant measures \(\mu ^i_*=(\tau ^i_*)_S\) for each population \(i\in \{ 1,\ldots ,N\}\), such that \(\widehat{f^i}\) is the best response in the \(\beta \)-discounted game against \(\overline{\tau }_*\) and \(\overline{\tau }_*\) is the stationary global state corresponding to the profile of strategies \((\overline{f^1},\ldots ,\overline{f^N})\) and the initial global state \((\mu ^1_*,\ldots ,\mu ^N_*)\), hence a stationary mean-field equilibrium in the \(\beta \)-discounted game. \(\Box \)

The last result in this section gives the conditions under which Markov mean-field equilibria in our models exist. They are based on one of the theorems given in [49]. It should be noted here that the assumptions in that paper are slightly stronger than in our model when applied to a single population. Namely, in our model the rewards and the transitions depend on the state-action distribution of the other players, while in [49] the dependence is only on the distribution of private states. Also, in our model we allow the set of feasible actions to depend on player’s private state, while in [49] there was no such dependence.

Theorem 4

Suppose that the assumptions (A1’–A2’) and (A3) are satisfied. Then for any \(\beta \in (0,1)\) and any \(\overline{\mu }_0\in \Pi _{j=1}^N\Delta _w(S^j)\) the multi-population discrete-time mean-field game with \(\beta \)-discounted payoff defined with \(r^i\), \(Q^i\), \(S^i\) and \(A^i\), \(i=1,\ldots ,N\), has a Markov mean-field equilibrium.

Remark 2

As we have already noted, in our model the rewards and the transitions may depend on the state-action distribution of the players, which differs from [49], where the dependence is only on the distribution of private states. Such an assumption is not new to the mean-field game literature. In case of discrete-time games it has already been used in the first paper on this type of games [39]. It has also been applied in [10, 11, 57, 58]. As for the continuous time case, this type of models have been introduced by Gomes and Voskayan [28] under the name of extended mean-field games. Cardaliaguet and Lehalle have proposed the name of mean field games of controls in [15] for this type of framework. Some further results on the topic include [2, 14, 17, 26, 40, 43].

We precede the proof of Theorem 4 by a counterpart of Lemma 2 for the Markov case. It requires some additional notation. First, let us define for \(i=1,\ldots ,N\) the sets

$$\begin{aligned}{} & {} \Xi ^i:=\Pi _{t=0}^\infty \Delta _w^{(t)}(D^i), \\{} & {} \Xi :=\Pi _{i=1}^N\Xi ^i. \end{aligned}$$

Next, let for \(t\ge 0\)

$$\begin{aligned} L_t:=\sum _{k=t}^\infty (\alpha \beta )^{k-t}\gamma ^kR=\frac{\gamma ^tR}{1-\alpha \beta \gamma }. \end{aligned}$$

Using these constants, we define for \(i=1,\ldots ,N\) and \(t\ge 0\) the sets

$$\begin{aligned}{} & {} \mathcal {C}_i^t:=\left\{ u\in C_w(S^i): \Vert u\Vert _w\le L_t\right\} , \\{} & {} \mathcal {C}_i:=\Pi _{t=0}^\infty \mathcal {C}_i^t. \end{aligned}$$

It is easy to see that under (A1’), \(\mathcal {C}_i\) with metric

$$\begin{aligned} \rho _{\mathcal {C}}((u_0,u_1,\ldots ),(v_0,v_1,\ldots )):=\sum _{t=0}^\infty \delta ^{-t}\Vert u_t-v_t\Vert _w, \end{aligned}$$

where \(\delta \) is chosen such that \(\delta >\gamma \) and \(\alpha \beta \delta <1\), is a complete metric space.

Lemma 5

For any state-action measure flow \(\left( \overline{\tau }\right) :=\left( \overline{\tau }_0,\overline{\tau }_1,\ldots \right) \in \Xi \), let

$$\begin{aligned} V^{i,t}_{\beta ,\left( \overline{\tau }\right) }(s):=\max _{\pi ^i\in \mathcal {M}^i}\mathbb {E}^{\delta _s,\overline{Q},\pi ^i}\sum _{k=t}^\infty \beta ^tr^i(s_k^i,a_k^i,\overline{\tau }_k), \end{aligned}$$

that is, let it be the optimal value at time t for the \(\beta \)-discounted Markov decision process of player from population i when the behaviour of all the other players is described by the flow \(\left( \overline{\tau }\right) \). Under assumptions (A1’–A2’) and (A3) for any \(i\in \{ 1,\ldots ,N\}\) and \(t\ge 0\), \(V^{i,t}_{\beta ,\left( \overline{\tau }\right) }\in \mathcal {C}_i^t\).

Proof

Let us fix an \(i\in \{ 1,\ldots ,N\}\) and define for any \(\left( \overline{\tau }\right) \in \Xi \) and \(t\ge 0\)

$$\begin{aligned} T^{i,t}_{\left( \overline{\tau }\right) }(u)(s):=\sup _{a\in A^i(s)}\left[ r^i(s,a,\overline{\tau }_t)+\beta \int _Su(s')Q^i(ds'\mid s,a,\overline{\tau }_t)\right] . \end{aligned}$$

By Proposition 7.32 in [12] for any \(u\in \mathcal {C}^i_{t+1}\), \(T^{i,t}_{\left( \overline{\tau }\right) }(u)\) is continuous. Moreover,

$$\begin{aligned} \left\| T^{i,t}_{\left( \overline{\tau }\right) }(u)\right\| _w\le & {} \sup _{(s,a)\in D^i}\frac{\left|r^i(s,a,\overline{\tau }_t)+\beta \int _Su(s')Q^i(ds'\mid s,a,\overline{\tau }_t)\right|}{w(s)}\\\le & {} \sup _{(s,a)\in D^i}\frac{R\gamma ^tw(s)+\beta \alpha L_{t+1}w(s)}{w(s)}=R\gamma ^t+\frac{\beta \alpha R\gamma ^{t+1}}{1-\alpha \beta \gamma }=L_t, \end{aligned}$$

where the last inequality follows from (A1’) and (A2’) (b) (note that \(\tau _t^i\in \Delta ^{(t)}_w(D^i)\) is implied by the assumption that \(\mu _0^i\in \Delta _w(D^i)\) and (b) of (A2’) applied to the recursive formula for \(\overline{\tau }_t\)). Hence, \(T^{i,t}_{\left( \overline{\tau }\right) }\) maps \(\mathcal {C}^i_{t+1}\) into \(\mathcal {C}^i_{t}\). Next, for any \(u_1,u_2\in \mathcal {C}^i_{t+1}\), we have

$$\begin{aligned} \sup _{s\in S}\left|\frac{T^{i,t}_{\left( \overline{\tau }\right) }(u_1)(s)-T^{i,t}_{\left( \overline{\tau }\right) }(u_2)(s)}{w(s)}\right|\le & {} \sup _{s\in S, a\in A^i(s)}\frac{\beta \int _S\left|(u_1(s')-u_2(s'))Q^i(ds'\mid s,a,\overline{\tau }_t)\right|}{w(s)}\nonumber \\\le & {} \beta \alpha \Vert u_1-u_2\Vert _w\sup _{s\in S }\frac{w(s)}{w(s)}=\alpha \beta \Vert u_1-u_2\Vert _w, \end{aligned}$$
(12)

where the last inequality follows from the definition of the w-norm and the assumption (A2’) (b).

We next define the operator \(T^{i}_{\left( \overline{\tau }\right) }:\mathcal {C}_i\rightarrow \mathcal {C}_i\) with the formula

$$\begin{aligned} \left( T^{i}_{\left( \overline{\tau }\right) }(u_0,u_1,\ldots )\right) _t:=T^{i,t}_{\left( \overline{\tau }\right) }(u_{t+1})\quad \text{ for } t\ge 0. \end{aligned}$$

From what we have shown, it really maps \(\mathcal {C}_i\) into itself. As (12) implies that for any \((u_0,u_1,\ldots )\) and \((v_0,v_1,\ldots )\) in \(\mathcal {C}_i\),

$$\begin{aligned} \rho _{\mathcal {C}}\left( T^{i}_{\left( \overline{\tau }\right) }(u_0,u_1,\ldots ),T^{i}_{\left( \overline{\tau }\right) }(v_0,v_1,\ldots )\right)\le & {} \sum _{t=0}^\infty \delta ^{-t}\alpha \beta \Vert u_{t+1}-v_{t+1}\Vert _w\\\le & {} \alpha \beta \delta \rho _{\mathcal {C}}\left( (u_0,u_1,\ldots ),(v_0,v_1,\ldots )\right) , \end{aligned}$$

it is an \(\alpha \beta \delta \)-contraction defined on a complete space. By the Banach fixed point theorem it has a unique fixed point. By Theorems 14.4 and 17.1 in [34] the elements of this vector are equal to \(V^{i,t}_{\beta ,\left( \overline{\tau }\right) }\), \(t\ge 0\), which ends the proof that the optimal value functions \(V^{i,t}_{\beta ,\left( \overline{\tau }\right) }\in \mathcal {C}_i^t\) for \(t\ge 0\). \(\square \)

Now we are ready to pass to the main part of the proof of Theorem 4.

Proof of Theorem 4:

We start by defining the correspondences from \(\Xi \) into \(\Xi ^i\), (\(i=1,\ldots ,N\)) with the formulas:

$$\begin{aligned} \widetilde{\Theta }^i((\overline{\tau })):=\left\{ (\eta ^i)\in \Xi ^i\!: \left( \eta ^i_0\right) _{S^i}\!=\!\mu _0^i \text{ and } \!\left( \eta ^i_t\right) _{S^i}(\cdot )\!=\!\int _{D^i}\! Q^i(\cdot \mid s,a,\overline{\tau }_{t-1})\tau ^i_{t-1}(ds\times da) \right\} \!, \end{aligned}$$
$$\begin{aligned}&\widetilde{\Psi }^i_{\beta }((\overline{\tau })):= \left\{ (\eta ^i)\in \widetilde{\Theta }^i((\overline{\tau })): \int _{S^i}V^{i,t-1}_{\beta ,(\overline{\tau })}(s)\left( \eta ^i_t\right) _{S^i}(ds)\right. \\&\quad =\left. \int _{D^i}\left[ r^i(s,a,\overline{\tau }_{t-1})+\beta \int _{S^i}V^{i,t}_{\beta ,(\overline{\tau })}(s')Q^i(ds'\mid s,a,\overline{\tau }_{t-1})\right] \eta ^i_{t-1}(ds\times da) \text{ for } t\ge 1 \right\} . \end{aligned}$$

We next prove that \(\widetilde{\Theta }^i\) and \(\widetilde{\Psi }^i_{\beta }\) have some useful properties. We fix \(i\in \{ 1,\ldots ,N\}\) for these considerations. We start by showing that for any Markov strategy \(\pi ^i\in \mathcal {M}^i\) the flow \((\eta ^i)\) defined with the recurrence

$$\begin{aligned} \eta ^i_0:=\Pi _0^i(\pi ^i,\mu _0^i),\quad \eta ^i_{t+1}:=\Pi _t^i(\pi ^i,\Phi ^i(\cdot \mid \overline{\tau }_t)) \text{ for } t=1,2,\ldots \end{aligned}$$
(13)

is an element of \(\widetilde{\Theta }^i((\overline{\tau }))\). We do it by induction on t. For \(t=0\) both \(\left( \eta ^i_0\right) _{S^i}=\mu _0^i\) and \(\int _{D^i}w(s)\eta ^i_0(ds\times da)\le M\) are obvious (by the definition of \(\Pi _0^i\) and Assumption (A1’)). Now suppose \(\int _{D^i}w(s)\eta ^i_{t}(ds\times da)\le M\alpha ^{t}\). Then by the defnition of \(\Phi ^i\),

$$\begin{aligned} \left( \eta ^i_{t+1}\right) _{S^i}(\cdot )=\Phi ^i(\cdot \mid \overline{\tau }_t)=\int _{D^i}Q^i(\cdot \mid s,a,\overline{\tau }_{t})\tau ^i_{t}(ds\times da). \end{aligned}$$

Moreover, by (A2’) (b) we have

$$\begin{aligned}{} & {} \int _{D^i}w(s)\eta ^i_{t+1}(ds\times da)=\int _{S^i}w(s)\Phi ^i(ds\mid \overline{\tau }_t)\\{} & {} \quad =\int _{S^i}\int _{A^i(s)}\int _{S^i}w(s')Q^i(ds'\mid s,a,\overline{\tau }_t)\tau _t^i(ds\times da)\\{} & {} \quad \le \int _{S^i}\int _{A^i(s)}\left( \sup _{(a',\overline{\sigma }_t)\in A^i(s)\times \Xi ^i_t}\int _{S^i}w(s')Q^i(ds'\mid s,a',\overline{\sigma }_t)\right) \tau _t^i(ds\times da)\\{} & {} \quad =\int _{S^i}\int _{S^i}\left( \sup _{(a',\overline{\sigma }_t)\in A^i(s)\times \Xi ^i_t}\int _{S^i}w(s')Q^i(ds'\mid s,a',\overline{\sigma }_t)\right) \tau _t^i(ds\times da)\\{} & {} \quad \le \int _{S^i}\alpha w(s)\tau _t^i(ds\times da)\le \alpha ^{t+1}M, \end{aligned}$$

which shows that \(\eta ^i_{t+1}\in \Xi ^i_{t+1}\) and by the induction principle that \((\eta ^i)\in \Xi ^i\).

Next we prove that the graph of \(\widetilde{\Theta }^i\) is closed. To do that, we take convergent sequences \(\left\{ (\overline{\tau }^{(n)})\right\} _{n\ge 0}\) in \(\Xi \) and \(\left\{ (\eta ^{i,(n)})\right\} _{n\ge 0}\) in \(\Xi ^i\) such that \((\eta ^{i,(n)})\in \widetilde{\Theta }^i((\overline{\tau }^{(n)}))\) for each n. Moreover, \((\eta ^{i,(n)})\Rightarrow (\eta ^i)\) and \((\overline{\tau }^{(n)})\Rightarrow (\overline{\tau })\) as \(n\rightarrow \infty \) for some \((\eta ^i)\in \Xi ^i\) and \((\overline{\tau })\in \Xi \). Now fix \(t\ge 1\). By the joint strong continuity of \(Q^i\) for any bounded continuous function \(u:S^i\rightarrow \mathbb {R}\), \(\int _{S^i}u(s)Q^i(ds\mid \cdot ,\cdot ,\overline{\tau }_{t-1}^{(n)})\) converges continuously to \(\int _{S^i}u(s)Q^i(ds\mid \cdot ,\cdot ,\overline{\tau }_{t-1})\). Hence, by Theorem 3.3 in [54], we have

$$\begin{aligned}{} & {} \int _{D^i}\int _{S^i}u(s)Q^i(ds\mid \widehat{s},\widehat{a},\overline{\tau }_{t-1}^{(n)})\tau ^{i,(n)}_{t-1}(d\widehat{s}\times d\widehat{a}) \\{} & {} \quad \rightarrow _{n\rightarrow \infty }\int _{D^i}\int _{S^i}u(s)Q^i(ds\mid \widehat{s},\widehat{a},\overline{\tau }_{t-1})\tau ^i_{t-1}(d\widehat{s}\times d\widehat{a}), \end{aligned}$$

which means that \(\int _{D^i}Q^i(\cdot \mid s,a,\overline{\tau }_{t-1}^{(n)})\tau ^{i,(n)}_{t-1}(ds\times da)\Rightarrow \int _{D^i}Q^i(\cdot \mid s,a,\overline{\tau }_{t-1})\tau ^i_{t-1}(ds\times da)\). Therefore, we have \(\left( \eta ^i_t\right) _{S^i}=\int _{D^i}Q^i(\cdot \mid s,a,\overline{\tau }_{t-1})\tau ^i_{t-1}(ds\times da)\) for each \(t\ge 0\), hence \(\eta ^i\in \widetilde{\Theta }^i((\overline{\tau }))\), which implies that the graph of \(\widetilde{\Theta }^i\) is closed.

Next note, that if \(\pi ^i_\beta \) is an optimal deterministic Markov policy in the optimization problem of a player from population i maximizing his \(\beta \)-discounted reward when the behaviour of all the other players at each stage is described by the flow \((\overline{\tau })\), then for each \(t\ge 1\) and \(s\in S^i\) it satisfies

$$\begin{aligned} V^{i,t-1}_{\beta ,(\overline{\tau })}(s)=\left[ r^i(s,\pi ^i_{\beta ,t-1}(s),\overline{\tau }_{t-1})+\beta \int _{S^i}V^{i,t}_{\beta ,(\overline{\tau })}(s')Q^i(ds'\mid s,\pi ^i_{\beta ,t-1}(s),\overline{\tau }_{t-1})\right] , \end{aligned}$$

which implies that for any \(t\ge 1\),

$$\begin{aligned}&\int _{S^i}V^{i,t-1}_{\beta ,(\overline{\tau })}(s)\left( \Pi ^i_{t-1}(\pi ^i_\beta ,\Phi ^i(\cdot \mid \overline{\tau }_{t-1}))\right) _{S^i}(ds)=\int _{S^i}V^{i,t-1}_{\beta ,(\overline{\tau })}(s)\Phi ^i(\cdot \mid \overline{\tau }_{t-1})(ds)\\&\quad =\int _{S^i}\left[ r^i(s,\pi ^i_{\beta ,t-1}(s),\overline{\tau }_{t-1})+\beta \int _{S^i}V^{i,t}_{\beta ,(\overline{\tau })}(s')Q^i(ds'\mid s,\pi ^i_{\beta ,t-1}(s),\overline{\tau }_{t-1})\right] \Phi ^i(\cdot \mid \overline{\tau }_{t-1})(ds)\\&\quad =\int _{D^i}\left[ r^i(s,a,\overline{\tau }_{t-1})+\beta \int _{S^i}V^{i,t}_{\beta ,(\overline{\tau })}(s')Q^i(ds'\mid s,a,\overline{\tau }_{t-1})\right] \Pi ^i_{t-1}(\pi ^i_\beta ,\Phi ^i(\cdot \mid \overline{\tau }_{t-1}))(ds\times da), \end{aligned}$$

Hence, measure flow \((\eta ^i)\) defined by (13) with \(\pi ^i:=\pi ^i_\beta \) is an element of \(\widetilde{\Psi }^i_\beta ((\overline{\tau }))\).

Next we prove that the graph of \(\widetilde{\Psi }^i_\beta \) is closed. Let \(\left\{ (\overline{\tau }^{(n)})\right\} _{n\ge 0}\) in \(\Xi \) and \(\left\{ (\eta ^{i,(n)})\right\} _{n\ge 0}\) in \(\Xi ^i\) be convergent sequences such that \((\eta ^{i,(n)})\in \widetilde{\Psi }^i_\beta ((\overline{\tau }^{(n)}))\) for each n. Moreover, let \((\eta ^{i,(n)})\Rightarrow (\eta ^i)\in \Xi ^i\) and \((\overline{\tau }^{(n)})\Rightarrow (\overline{\tau })\in \Xi \) as \(n\rightarrow \infty \). Since the graph of \(\widetilde{\Theta }^i\) is closed, showing that \(\widetilde{\Psi }^i_\beta \) has the same property only requires proving that the equalities defining \(\widetilde{\Psi }^i_\beta \) hold for \((\eta ^i)\) and \((\overline{\tau })\). Let us fix \(t\ge 1\). The definition of \(\widetilde{\Psi }^i_\beta \) implies that for each n

$$\begin{aligned}{} & {} \int _{S^i}V^{i,t-1}_{\beta ,(\overline{\tau }^{(n)})}(s)\left( \eta ^{i,(n)}_t\right) _{S^i}(ds)\\{} & {} \quad =\int _{D^i}\left[ r^i(s,a,\overline{\tau }^{(n)}_{t-1})+\beta \int _{S^i}V^{i,t}_{\beta ,(\overline{\tau }^{(n)})}(s')Q^i(ds'\mid s,a,\overline{\tau }^{(n)}_{t-1})\right] \eta ^{i,(n)}_{t-1}(ds\times da)\nonumber \end{aligned}$$
(14)

By the continuity of \(Q^i\) and \(V^{i,t}_{\beta ,(\overline{\tau }^{(n)})}\), and Theorem 3.3 in [54] (the assumption presented in Remark 3.4 (ii) there is true for \(g=L_tw\) by Lemma 5), \(\int _{S^i}V^{i,t}_{\beta ,(\overline{\tau }^{(n)})}(s')Q^i(ds'\mid \cdot ,\cdot ,\overline{\tau }^{(n)}_{t-1})\) converges continuously to \(\int _{S^i}V^{i,t}_{\beta ,(\overline{\tau }}(s')Q^i(ds'\mid \cdot ,\cdot ,\overline{\tau }_{t-1})\). As also \(r^i(\cdot ,\cdot ,\overline{\tau }^{(n)}_{t-1})\) converges continuously to \(r^i(\cdot ,\cdot ,\overline{\tau }_{t-1})\) by (A1’), using Theorem 3.3 in [54] once more (now with \(g=L_{t-1}w\) in Remark 3.4 (ii) there, again by by Lemma 5), we can pass to the limit in (14), obtaining

$$\begin{aligned}{} & {} \int _{S^i}V^{i,t-1}_{\beta ,(\overline{\tau })}(s)\left( \eta ^i_t\right) _{S^i}(ds)\\{} & {} \quad =\int _{D^i}\left[ r^i(s,a,\overline{\tau }_{t-1})+\beta \int _{S^i}V^{i,t}_{\beta ,(\overline{\tau })}(s')Q^i(ds'\mid s,a,\overline{\tau }_{t-1})\right] \eta ^i_{t-1}(ds\times da) \end{aligned}$$

As t was arbitrary, this ends the proof that the graph of \(\widetilde{\Psi }^i_\beta \) is closed.

To finalize the proof, we define the correspondence from \(\Xi \) into itself:

$$\begin{aligned} \widetilde{\Psi }_\beta ((\overline{\tau })):=\widetilde{\Psi }^1_\beta ((\overline{\tau }))\times \cdots \times \widetilde{\Psi }^N_\beta ((\overline{\tau })). \end{aligned}$$

What we have shown already implies that \(\widetilde{\Psi }_\beta \) has nonempty values and that its graph is closed. Convexity of values of \(\widetilde{\Psi }_\beta \) is obvious. As w is a moment function, each \(\Delta _w^{(t)}(D^i)\) is tight, hence, by Prohorov’s theorem it is compact. This implies that \(\Xi \) is compact in product topology. Therefore, by the Glickberg fixed point theorem [24], \(\widetilde{\Psi }_\beta \) has a fixed point. Suppose \((\overline{\tau ^*})\) is this fixed point. Disintegrating \(\tau ^{*i}_t\) gives for \(i=1,\ldots ,N\) and \(t=0,1,\ldots \) stochastic kernels \(\pi ^{*i}_t\) and measures \(\mu ^{*i}_t\) which (after similar modifications as in the proof of Theorem 1) correspond to the Markov strategies and global state flows in mean-field Markov equilibrium in our game. \(\square \)

5 The Existence of Stationary and Markov Mean Field Equilibria in Total Payoff Game

5.1 Assumptions

In this section, we address the problem of the existence of an equilibrium in the mean-field games with total payoff. In its main results we shall add new assumption (A4) or (A4”) to those defined in Sect. 4.1. Its formulation requires defining for \(i=1,\ldots ,N\), \(s\in S^i\), \(a\in A^i(s)\) and \(\overline{\tau }\in \Pi _{j=1}^N\Delta (D^j)\) the modified transition probabilities \(Q^i_*\):

$$\begin{aligned} Q^i_*(\cdot \mid s,a,\overline{\tau }):=\left\{ \begin{array}{ll} Q^i(\cdot \mid s,a,\overline{\tau }),&{} \text{ if } s\ne s^*\\ \delta _{s^*},&{} \text{ if } s=s^*\end{array}\right. \end{aligned}$$

The first new assumption will be used to prove the existence of a stationary mean-field equilibrium in total reward models.

(A4):

For \(i=1,\ldots ,N\) and \(\overline{\tau }\in \Pi _{j=1}^N\Delta (D^j)\),

$$\begin{aligned} \lim _{T\rightarrow \infty }\sup _{\pi ^i\in \mathcal {M}^i}\left\| \sum _{t=T}^\infty \int _{S^i{\setminus }\{ s^*\}}w(s')\left( Q^i_*\right) ^t(ds'\mid s,\pi ^i,\overline{\tau })\right\| _w=0. \end{aligned}$$

In case of the results about the existence of Markov mean-field equilibria in the discounted case, two of the assumptions (A1’) and (A2’) refered to the discount factor \(\beta \), which does not exist in the total reward model. Hence, apart from the new assumption (A4”), new versions of these two assumptions will be necessary. For technical reasons, there are also some additional restrictions added.

(A1”):

For \(i=1,\ldots ,N\), \(r^i\) is continuous and bounded above by some constant R on \(D^i\times \Pi _{i=1}^N\Delta (D^i)\). Moreover, there exist non-negative constants \(\alpha \), \(\gamma \), M satisfying \(\alpha \le \gamma \), \(\alpha \gamma <1\) and

$$\begin{aligned} \int _Sw(s)\mu _0^i(ds)\le M\quad \text{ for } i=1,\ldots ,N, \end{aligned}$$

and such that for \(i=1,\ldots ,N\), \(s\in S^i\) and \(t=0,1,2,\ldots \),

$$\begin{aligned} \inf _{(a,\overline{\tau })\in A^i(s)\times \Pi _{i=1}^N\Delta _w^{(t)}(D^i)}r^i(s,a,\overline{\tau })\ge -R\gamma ^tw(s) \end{aligned}$$

with \(\Delta _w^{(t)}(D^i)=\left\{ \tau ^i\in \Delta _w(D^i): \int _{D^i}w(s)\tau ^i(ds\times da)\le \alpha ^tM\right\} \).

(A2”):

For \(i=1,\ldots ,N\) and any sequence \(\{ s_n,a_n,\overline{\tau }_n\}\subset D^i\times \Pi _{i=1}^N\Delta _w(D^i)\) such that \(s_n\rightarrow s_*\), \(a_n\rightarrow a_*\) and \(\overline{\tau }_n\Rightarrow \tau ^*\), \(Q^i(\cdot \mid s_n,a_n,\overline{\tau }_n)\rightarrow Q(\cdot \mid s_*,a_*,\tau ^*)\). Moreover,

  1. (a)

    for \(i=1,\ldots ,N\) the functions

    $$\begin{aligned} \int _S w(s')Q^i(ds'\mid s,a,\overline{\tau }) \end{aligned}$$

    are continuous in \((s,a,\overline{\tau })\),

  2. (b)

    for \(i=1,\ldots ,N\) and \(s\in S^i\)

    $$\begin{aligned} \sup _{(a,\overline{\tau })\in A^i(s)\times \Pi _{i=1}^N\Delta (D^i)}\int _Sw(s')Q^i(s'\mid s,a,\overline{\tau })\le \alpha w(s). \end{aligned}$$
(A4”):

For \(i=1,\ldots ,N\),

$$\begin{aligned} \lim _{T\rightarrow \infty }\sup _{\begin{array}{c} \pi ^i\in \mathcal {M}^i,\\ (\overline{\tau })\in \Pi _{t=0}^\infty \Pi _{j=1}^N\Delta (D^j) \end{array}}\left\| \sum _{t=T}^\infty \int _{S^i{\setminus }\{ s^*\}}w(s')\alpha ^{-t}\left( Q^i_*\right) ^t(ds'\mid s,\pi ^i,(\overline{\tau }))\right\| _w=0. \end{aligned}$$

Here and in the sequel for \(i=1,\ldots ,N\), \(s\in S^i\), \(\pi ^i\in \mathcal {M}^i\) and \((\overline{\tau })=(\overline{\tau }_0,\overline{\tau }_1,\ldots )\in \Pi _{t=0}^\infty \Pi _{j=1}^N\Delta (D^j)\),

$$\begin{aligned} \left( Q^i_*\right) ^1(\cdot \mid s,\pi ^i,(\overline{\tau })):=\int _{a\in A^i(s)}Q^i_*(\cdot \mid s,a,\overline{\tau }_0)\pi _0^i(da\mid s) \end{aligned}$$

and for \(t\ge 2\):

$$\begin{aligned} \left( Q^i_*\right) ^t(\cdot \mid s,\pi ^i,(\overline{\tau })):=\int _{S^i}\int _{A^i(s)}Q^i(\cdot \mid s',a,\overline{\tau }_t)\pi _{t-1}^i(da\mid s')\left( Q^i_*\right) ^{t-1}(ds'\mid s,\pi ^i,(\overline{\tau })). \end{aligned}$$

For \(i=1,\ldots ,N\), \(s\in S^i\), \(\pi ^i\in \mathcal {M}^i\) and \(\overline{\tau }\in \Pi _{j=1}^N\Delta (D^j)\),

$$\begin{aligned} \left( Q^i_*\right) ^t(\cdot \mid s,\pi ^i,\overline{\tau }):=\left( Q^i_*\right) ^t(\cdot \mid s,\pi ^i,(\overline{\tau },\overline{\tau },\ldots )). \end{aligned}$$

Remark 3

By assuming (A4) or (A4”), we build upon the framework of transient total-reward Markov decision processes introduced by Veinott [55] in the context of finite state and action spaces and generalized to Borel spaces in [22, 46]. The optimization problem faced by an individual from population i in our model of total reward mean-field game would for every fixed global state-action distribution \(\overline{\tau }\) be transient, if

$$\begin{aligned} \sup _{\pi ^i\in \mathcal {M}^i}\left\| \sum _{t=0}^\infty \int _{S^i{\setminus }\{ s^*\}}w(s')\left( Q^i_*\right) ^t(ds'\mid s,\pi ^i,\overline{\tau })\right\| _w<\infty , \end{aligned}$$

which is clearly true under (A2) (b) and (A4). Roughly speaking, this would mean that for a reward function such that \(\left\| r^i(\cdot ,\cdot ,\overline{\tau })\right\| _w<\infty \), the total reward is finite for any Markov strategy applied. In (A4) we strengthen this assumption by requireing that the convergence of the total reward to its value is uniform across all Markov strategies with respect to the w-norm. (A4”) is an adjustement of this condition to the case when the decision-maker optimizes his behavior against a flow \((\overline{\tau })\).

5.2 Main Results

Theorem 6

Suppose that the assumptions (A1–A4) are satisfied. Then the multi-population discrete-time mean-field game with total payoff defined with \(r^i\), \(Q^i\), \(S^i\) and \(A^i\), \(i=1,\ldots ,N\), has a stationary mean-field equilibrium.

Let us start by noticing that total reward of a player from population i using a given strategy \(\pi \) when the behaviour of the others is constant over time and described by \(\overline{\tau }\) in the MDP model with transition probability \(Q^*_i\) is the same as the reward until reaching state \(s^*\) in the model with transition probability \(Q_i\). Let us next define for any \(i\in \{ 1,\ldots ,N\}\) and \(\overline{\tau }\in \Pi _{j=1}^N\Delta (D^j)\)

$$\begin{aligned} V^i_{*\overline{\tau }}(s):=\max _{f^i\in \mathcal {F}^i}\mathbb {E}^{\delta _s,\overline{Q_*},f^i}\sum _{t=0}^\infty r^i(s_t^i,a_t^i,\overline{\tau }), \end{aligned}$$

that is, the optimal value for the total-reward Markov decision process of player from population i when the behaviour of all the other players is described by the state-action measure \(\overline{\tau }\), fixed over time. Crucial properties of function \(V^i_{*\cdot }(\cdot )\) are given in lemma below.

Lemma 7

Under assumptions (A1–A4) for each \(i\in \{ 1,\ldots ,N\}\), \(V^i_{*\overline{\tau }}(s)\) is jointly continuous in \((s,\overline{\tau })\). Moreover, there exists a constant L such that \(\left\| V^i_{*\overline{\tau }}(\cdot )\right\| _w\le L\) for any \(\overline{\tau }\in \Pi _{j=1}^N\Delta _w(D^j)\).

Proof

Fix i and \(\overline{\tau }\), and note that under (A2) (b) and (A4),

$$\begin{aligned} L_i:=\left\| \sum _{t=0}^\infty \int _{S^i{\setminus }\{ s^*\}}w(s')\left( Q^i_*\right) ^t(ds'\mid s,\pi ^i,\overline{\tau })\right\| _w \end{aligned}$$

is finite for \(i=1,\ldots ,N\). Hence, we immediately see that the total-reward Markov decision process defined with \(S^i\), \(A^i\), \(r^i\) and \(Q^i_*\) satisfies the assumptionsFootnote 8 of Theorem 12 in [22]. Therefore, with a help of this theorem and Proposition 1 in [22] we can define:

  1. (a)

    The function \(\zeta ^i:S^i\times \Pi _{j=1}^N\Delta _w(D^j)\rightarrow [0,\infty )\)

    $$\begin{aligned} \zeta ^i(s,\overline{\tau }):=\sup _{\pi ^i\in \mathcal {M}^i}\left[ \sum _{t=0}^\infty \int _{S^i{\setminus }\{ s^*\}}w(s')\left( Q^i_*\right) ^t(ds'\mid s,\pi ^i,\overline{\tau })\right] \end{aligned}$$
    (15)

    satisfying for \(s\in S^i\) and \(\overline{\tau }\in \Pi _{j=1}^N\Delta _w(D^j)\), \(\zeta ^i(s,\overline{\tau })\in [w(s),L_iw(s)]\),

  2. (b)

    The discount factor \(\beta :=\max _{j=1,\ldots ,N}\frac{L_j-1}{L_j}\),

  3. (c)

    Modified one-stage rewards \(r^i_*\):

    $$\begin{aligned} r^i_*(s,a,\overline{\tau }):=\frac{r^i(s,a,\overline{\tau })}{\zeta ^i(s,\overline{\tau })}, \end{aligned}$$
  4. (d)

    Modified transition probabilities \(Q^i_{**}\) given by

    $$\begin{aligned}&Q^i_{**}(B\mid s,a,\overline{\tau }):=\\&\left\{ \begin{array}{ll} \frac{1}{\beta \zeta ^i(s,\overline{\tau })}\int _{B}\zeta ^i(s',\overline{\tau })Q^i_*(ds'\mid s,a,\overline{\tau }),&{} \text{ if } B\in \mathcal {B}(S^i{\setminus }\{ s^*\}),\\ &{} (s,a)\in D^i{\setminus }\{ (s^*,a^*)\}\\ 1-\frac{1}{\beta \zeta ^i(s,\overline{\tau })}\int _{B}\zeta ^i(s',\overline{\tau })Q^i_*(ds'\mid s,a,\overline{\tau }),&{} \text{ if } B=\{ s^*\}),\\ {} &{}(s,a)\in D^i{\setminus }\{ (s^*,a^*)\}\\ 1,&{} \text{ if } B=\{ s^*\}, (s,a)=(s^*,a^*) \end{array}\right. \end{aligned}$$

such that \(S^i\), \(A^i\), \(r^i_*\) and \(Q^i_{**}\) define a \(\beta \)-discounted Markov decision process with a value

$$\begin{aligned} V^i_{\beta \overline{\tau }}(s)=\frac{V^i_{*\overline{\tau }}(s)}{\zeta (s,\overline{\tau })} \end{aligned}$$
(16)

for \(s\in S^i\), \(\overline{\tau }\in \Pi _{j=1}^N\Delta _w(D^j)\). Moreover, optimal stationary strategies exist and coincide in both MDPs.

Next note, that if \(\zeta ^i\) is a continuous function, then the model defined by \(S^i\), \(A^i\), \(r^i_*\) and \(Q^i_{**}\) satisfies assumptions (A1–A3) with function w replaced by \(w_*\equiv 1\) (in particular, (A2) follows from the fact that \(\zeta ^i(s,\cdot )\ge w(s)\) for \(s\in S^i\)). This implies that by Lemma 2, \(V^i_{\beta \overline{\tau }}(s)\) is continuous in \((s,\overline{\tau })\) and \(\left\| V^i_{\beta \overline{\tau }}(\cdot )\right\| \le \frac{R}{1-\beta }\). Hence, combining (16) with the fact that \(\zeta ^i\) is continuous and \(\zeta ^i(s,\cdot )\le L_iw(s)\) we obtain that \(V^i_{*\overline{\tau }}(s)\) is also continuous in \((s,\overline{\tau })\). Moreover, for any \(\overline{\tau }\in \Pi _{j=1}^N\Delta _w(D^j)\),

$$\begin{aligned} \left\| V^i_{*\overline{\tau }}(\cdot )\right\| _w\le \frac{L_iR}{1-\beta }\le \frac{\max _{j=1,\ldots ,N}L_jR}{1-\beta }=:L, \end{aligned}$$

which proves the thesis of the lemma. Hence, all we need to do is to show that \(\zeta ^i\) is continuous.

To do that, we note that \(\zeta ^i\) is clearly the limit of the sequence of functions \(\left\{ w_n^{\overline{\tau }}\right\} _{n\ge 0}\), defined with the following recurrence: \(w_0^{\overline{\tau }}:=w\), \(w_n^{\overline{\tau }}:=T^i_{*\overline{\tau }}(w_{n-1}^{\overline{\tau }})\) for \(n=1,2,\ldots \), where for any \(\overline{\tau }\in \Pi _{j=1}^N\Delta _w(D^j)\),

$$\begin{aligned} T^i_{*\overline{\tau }}(u)(s):=\sup _{a\in A^i(s)}\left[ w(s)+\int _{S^i{\setminus }\{ s^*\}}u(s')Q^i_*(ds\mid s,a,\overline{\tau })\right] . \end{aligned}$$

We next show by indution that each \(w_n^{\overline{\tau }}(s)\) is continuous in \((s,\overline{\tau })\). For \(n=0\) the claim is true by the definition of w. Suppose it holds for \(n=k-1\). Then by Theorem 3.3 in [54] (the assumptions given in Remark 3.4 (ii) there are satisfied with \(g=L_iw\) because \(w_{k-1}^{\overline{\tau }}(\cdot )\le \zeta ^i(\cdot ,\overline{\tau })\le L_iw(\cdot )\)), \(w(s)+\int _{S^i{{\setminus }}\{ s^*\}}w_{k-1}^{\overline{\tau }}(s')Q^i_*(ds'\mid s,a,\overline{\tau })\) is jointly continuous in \((s,a,\overline{\tau })\), hence, by Proposition 7.32 in [12]

$$\begin{aligned} w_{k}^{\overline{\tau }}(s)=\sup _{a\in A^i(s)}\left[ w(s)+\int _{S^i{\setminus }\{ s^*\}}w_{k-1}^{\overline{\tau }}(s')Q^i_*(ds'\mid s,a,\overline{\tau })\right] \end{aligned}$$

is also (jointly) continuous. Therefore, the claim is true for any \(n\ge 1\).

To finish the proof, let us take convergent sequences \(\{ s_k\}_{k\ge 1}\) in \(S^i\) and \(\{ \overline{\tau }_k\}_{k\ge 1}\) in \(\Pi _{j=1}^N\Delta _w(D^j)\) such that \(s_k\rightarrow s_*\) and \(\overline{\tau }_k\Rightarrow \overline{\tau }_*\). We will show that \(\zeta ^i(s_k,\overline{\tau }_k)\rightarrow \zeta ^i(s_*,\overline{\tau }_*)\). Since the set \(K:=\{ s_k: k\ge 1\}\cup \{ s_*\}\) is clearly compact, there exists a value W such that \(W\ge |w(s)|\) for \(s\in K\). Now, fix any \(\varepsilon >0\). By (A4) there exists an \(t^*\) such that

$$\begin{aligned} \sup _{\pi ^i\in \mathcal {M}^i}\left\| \sum _{t=t^*+1}^\infty \int _{S^i{\setminus }\{ s^*\}}w(s')\left( Q^i_*\right) ^t(ds'\mid s,\pi ^i,\overline{\tau })\right\| _w<\frac{\varepsilon }{3W}. \end{aligned}$$

This immediately implies

$$\begin{aligned}&\left|w_{t^*}^{\overline{\tau }_*}(s_*)-\zeta ^i(s_*,\overline{\tau }_*)\right|\le \left|\sup _{\pi ^i\in \mathcal {M}^i}\sum _{t=t^*+1}^\infty \int _{S^i{\setminus }\{ s^*\}}w(s')\left( Q^i_*\right) ^t(ds'\mid s_*,\pi ^i,\overline{\tau })\right|\nonumber \\&\quad \le w(s_*)\sup _{\pi ^i\in \mathcal {M}^i}\left\| \sum _{t=t^*+1}^\infty \int _{S^i{\setminus }\{ s^*\}}w(s')\left( Q^i_*\right) ^t(ds'\mid s_*,\pi ^i,\overline{\tau })\right\| _w \le W\frac{\varepsilon }{3W}=\frac{\varepsilon }{3}. \end{aligned}$$
(17)

and, for any \(k\ge 1\),

$$\begin{aligned}&\left|w_{t^*}^{\overline{\tau }_k}(s_k)-\zeta ^i(s_k,\overline{\tau }_k)\right|\nonumber \\&\quad \le w(s_k)\sup _{\pi ^i\in \mathcal {M}^i}\left\| \sum _{t=t^*+1}^\infty \int _{S^i{\setminus }\{ s^*\}}w(s')\left( Q^i_*\right) ^t(ds'\mid s_k,\pi ^i,\overline{\tau })\right\| _w\le \frac{\varepsilon }{3} \end{aligned}$$
(18)

Finally, from the joint continuity of \(w_{t^*}^{\cdot }(\cdot )\), there exists a \(k_0\in \mathbb {N}\) such that for any \(k\ge k_0\)

$$\begin{aligned} \left|w_{t^*}^{\overline{\tau }_k}(s_k)-w_{t^*}^{\overline{\tau }_*}(s_*)\right|<\frac{\varepsilon }{3}. \end{aligned}$$
(19)

Combining (17), (18) and (19), we obtain that for any \(k\ge k_0\)

$$\begin{aligned}{} & {} \left|\zeta ^i(s_k,\overline{\tau }_k)-\zeta ^i(s_*,\overline{\tau }_*)\right|\le \left|\zeta ^i(s_k,\overline{\tau }_k)-w_{t^*}^{\overline{\tau }_k}(s_k)\right|\\{} & {} \quad +\left|w_{t^*}^{\overline{\tau }_k}(s_k)-w_{t^*}^{\overline{\tau }_*}(s_*)\right|+\left|w_{t^*}^{\overline{\tau }_*}(s_*)-\zeta ^i(s_*,\overline{\tau }_*)\right|<\varepsilon , \end{aligned}$$

which ends the proof that \(\zeta ^i(\cdot ,\cdot )\) is continuous. \(\square \)

Proof of Theorem 6

As in the case of the discounted reward, we define the correspondences from \(\Pi _{j=1}^N\Delta (D^j)\) to \(\Delta (D^i)\):

$$\begin{aligned} \Theta ^i(\overline{\tau }):= & {} \left\{ \eta ^i\in \Delta _w^M(D^i): \eta ^i_{S^i}(\cdot )=\int _{D^i}Q^i(\cdot \mid s,a,\overline{\tau })\eta ^i(ds\times da) \right\} , \\ \Psi ^i_*(\overline{\tau }):= & {} \left\{ \eta ^i\in \Theta ^i(\overline{\tau }): \int _{S^i}V^i_{*\overline{\tau }}(s)\eta ^i_{S^i}(ds)\right. \\= & {} \left. \int _{D^i}\left[ r^i(s,a,\overline{\tau })+\int _{S^i}V^i_{*\overline{\tau }}(s')Q^i_*(ds'\mid s,a,\overline{\tau })\right] \eta ^i(ds\times da) \right\} \end{aligned}$$

Using similar arguments to those employed in the proof of Theorem 1 (with a difference that instead of Lemma 2 we apply Lemma 7 when necessary), we can prove that \(\Psi ^i_*\) (we do not need to prove that for \(\Theta ^i\), as it is defined in exactly the same way as in the case of the \(\beta \)-discounted reward) that it has nonempty convex values and that its graph is closed. Then we define the correspondence

$$\begin{aligned} \overline{\Psi }_*(\overline{\tau }):=\Psi ^1_*(\overline{\tau })\times \cdots \times \Psi ^N_*(\overline{\tau }) \end{aligned}$$

and show that it has a fixed point, which, again using similar arguments as in the proof of Theorem 1, can be proved to correspond to a stationary mean-field equilibrium in the total reward discrete-time mean field game considered in the theorem. \(\square \)

In the last result of this section we give conditions under which a Markov mean-field equilibrium exists in the total-reward game.

Theorem 8

Suppose that the assumptions (A1”), (A2”), (A3) and (A4”) are satisfied. Then for any \(\overline{\mu }_0\in \Pi _{j=1}^N\Delta _w(S^j)\) the multi-population discrete-time mean-field game with total payoff defined with \(r^i\), \(Q^i\), \(S^i\) and \(A^i\), \(i=1,\ldots ,N\), has a Markov mean-field equilibrium.

Proof

Let us fix \(\overline{\mu }_0\) and M satisfying (A1”). Recall the notation used in the proof of Theorem 4

$$\begin{aligned} \Xi ^i=\Pi _{t=0}^\infty \Delta _w^{(t)}(D^i),\quad \Xi =\Pi _{i=1}^N\Xi ^i. \end{aligned}$$

Next, for any flow of measure-vectors \((\overline{\tau }):=(\overline{\tau }_0,\overline{\tau }_1,\ldots )\in \Xi \) and any \(i\in \{ 1,\ldots ,N\}\) let us define

$$\begin{aligned} V^{i,t}_{*(\overline{\tau })}(s):=\max _{\pi ^i\in \mathcal {M}^i}\mathbb {E}^{\delta _s,\overline{Q_*},\pi ^i}\sum _{k=t}^\infty r^i(s_k^i,a_k^i,\overline{\tau }_k), \end{aligned}$$

that is, the optimal value at time \(t\ge 0\) for the total-reward Markov decision process of player from population i when the behaviour of all the other players at each stage is described by the flow \((\overline{\tau })\). Using the standard method of transforming a nonhomogeneous Markov decision process into a homogeneous one and Theorem 12 in [22]Footnote 9 we may show that \(V^i_{*(\overline{\tau })}(s)\) can be obtained from the optimal reward in the discounted Markov decision process with state space \(S^i\times \mathbb {N}\) and:

  1. (a)

    The function \(\widetilde{\zeta }^i:S^i\times \mathbb {N}\times \Xi \rightarrow [0,\infty )\)

    $$\begin{aligned} \widetilde{\zeta }^i(s,t,(\overline{\tau })):=\sup _{\pi ^i\in \mathcal {M}^i}\left[ \sum _{k=0}^\infty \int _{S^i{\setminus }\{ s^*\}}w(s')\alpha ^{-k}\left( Q^i_*\right) ^k(ds'\mid s,\pi ^i,(\overline{\tau }))\right] \end{aligned}$$

    satisfying for \(s\in S^i\), \(t\ge 0\) and \((\overline{\tau })\in \Xi \), \(\widetilde{\zeta }^i(s,t,(\overline{\tau }))\in [\alpha ^{-t}w(s),\widetilde{L}_i\alpha ^{-t}w(s)]\) withFootnote 10

    $$\begin{aligned} \widetilde{L}_i:=\sup _{(\overline{\tau })\in \Xi }\left\| \sum _{t=0}^\infty \int _{S^i{\setminus }\{ s^*\}}w(s')\alpha ^{-t}\left( Q^i_*\right) ^t(ds'\mid s,\pi ^i,(\overline{\tau }))\right\| _w \end{aligned}$$
  2. (b)

    The discount factor \(\widetilde{\beta }:=\max _{j=1,\ldots ,N}\frac{\widetilde{L}_j-1}{\widetilde{L}_j}\),

  3. (c)

    Modified one-stage rewards \(\widetilde{r}^i_*\):

    $$\begin{aligned} \widetilde{r}^i_*(s,t,a,(\overline{\tau })):=\frac{r^i(s,a,\overline{\tau }_t)}{\widetilde{\zeta }^i(s,t,(\overline{\tau }))}, \end{aligned}$$
  4. (d)

    Modified transition probabilities \(\widetilde{Q}^i_{**}\) given by

  5. (e)

    Sets of feasible actions given by \(\widetilde{A}^i(\cdot ,t):=A^i(\cdot )\).

In fact, if \(V^i_{\widetilde{\beta }(\overline{\tau })}(s,t)\) denotes the optimal value in the modified (discounted) model,

$$\begin{aligned} V^{i,t}_{*(\overline{\tau })}(s)=V^i_{\widetilde{\beta }(\overline{\tau })}(s,t)\widetilde{\zeta }^i(s,t,(\overline{\tau })) \end{aligned}$$

for any \(s\in S^i\) and \(t\ge 0\). Moreover, optimal stationary strategies in the new model (which exist by Theorem 12 in [22]) correspond to optimal Markov strategies in the original one. Finally, repeating the arguments used in the proof of Lemma 7 (the assumptions (A1–A4) used there are satisfied with w(s) replaced by \(\widetilde{w}(s,t):=w(s)\alpha ^{-t}\), which clearly is a moment function on \(S\times \mathbb {N}\)) we can show that \(V^i_{\widetilde{\beta }\cdot }(\cdot ,t)\) and \(\widetilde{\zeta }^i(\cdot ,t,\cdot )\) are continuous and their \(\widetilde{w}\)-norms are bounded by \(\widetilde{L}:=\frac{R\max _{j=1,\ldots ,N}\widetilde{L}_j}{1-\widetilde{\beta }}\), which implies that for any \(s\in S^i\) and \(t\ge 0\), \(V^{i,t}_{*(\overline{\tau })}(s)\le \widetilde{L}\lambda ^tw(s)\).

Next we define the correspondence \(\widetilde{\Psi }^i_*\) from \(\Xi \) into \(\Xi ^i\), (\(i=1,\ldots ,N\)). In order to do it, we recall the definition of correspondence \(\widetilde{\Theta }^i\) introduced in the proof of Theorem 4

$$\begin{aligned} \widetilde{\Theta }^i((\overline{\tau }))=\left\{ (\eta ^i)\in \Xi ^i\!: \left( \eta ^i_0\right) _{S^i}\!=\!\mu _0^i \text{ and } \!\left( \eta ^i_t\right) _{S^i}(\cdot )\!=\!\int _{D^i}\! Q^i(\cdot \mid s,a,\overline{\tau }_{t-1})\tau ^i_{t-1}(ds\times da) \right\} \!. \end{aligned}$$

Now let

$$\begin{aligned}&\widetilde{\Psi }^i_*((\overline{\tau })):= \left\{ (\eta ^i)\in \widetilde{\Theta }^i((\overline{\tau })): \int _{S^i}V^{i,t-1}_{*(\overline{\tau })}(s)\left( \eta ^i_t\right) _{S^i}(ds)\right. \\&\quad =\left. \int _{D^i}\left[ r^i(s,a,\overline{\tau }_{t-1})+\int _{S^i}V^{i,t}_{*(\overline{\tau })}(s')Q^i_*(ds'\mid s,a,\overline{\tau }_{t-1})\right] \eta ^i_{t-1}(ds\times da) \text{ for } t\ge 1 \right\} \end{aligned}$$

As we have shown in the proof of Theorem 4,Footnote 11 for each i, the correspondence \(\widetilde{\Theta }^i\) has non-empty values and closed graph. We next prove that for any fixed \(i\in \{ 1,\ldots ,N\}\), \(\widetilde{\Psi }^i_*\) has similar properties.

We start by noting, that if \(\pi ^i_*\) is an optimal deterministic Markov policy in the optimization problem of a player from population i maximizing his total reward when the behaviour of all the other players at each stage is described by the flow \((\overline{\tau })\), then for each \(t\ge 1\) and \(s\in S^i\) it satisfies

$$\begin{aligned} V^{i,t-1}_{*(\overline{\tau })}(s)=\left[ r^i(s,\pi ^i_{*t-1}(s),\overline{\tau }_{t-1})+\int _{S^i}V^{i,t}_{*(\overline{\tau })}(s')Q^i_*(ds'\mid s,\pi ^i_{*t-1}(s),\overline{\tau }_{t-1})\right] , \end{aligned}$$

which implies that for any \(t\ge 1\),

$$\begin{aligned}{} & {} \int _{S^i}V^{i,t-1}_{*(\overline{\tau })}(s)\left( \Pi ^i_{t-1}(\pi ^i_*,\Phi ^i(\cdot \mid \overline{\tau }_{t-1}))\right) _{S^i}(ds)=\int _{S^i}V^{i,t-1}_{*(\overline{\tau })}(s)\Phi ^i(\cdot \mid \overline{\tau }_{t-1})(ds)\\{} & {} \quad =\int _{S^i}\left[ r^i(s,\pi ^i_{*t-1}(s),\overline{\tau }_{t-1})+\int _{S^i}V^{i,t}_{*(\overline{\tau })}(s')Q^i_*(ds'\mid s,\pi ^i_{*t-1}(s),\overline{\tau }_{t-1})\right] \Phi ^i(\cdot \mid \overline{\tau }_{t-1})(ds)\\{} & {} \quad =\int _{D^i}\left[ r^i(s,a,\overline{\tau }_{t-1})+\int _{S^i}V^{i,t}_{*(\overline{\tau })}(s')Q^i_*(ds'\mid s,a,\overline{\tau }_{t-1})\right] \Pi ^i_{t-1}(\pi ^i_*,\Phi ^i(\cdot \mid \overline{\tau }_{t-1}))(ds\times da), \end{aligned}$$

Hence, measure flow \((\eta ^i)\) defined by (13) with \(\pi ^i:=\pi ^i_*\) is an element of \(\widetilde{\Psi }^i_*((\overline{\tau }))\).

In the penultimate part of the proof we show that the graph of \(\widetilde{\Psi }^i_*\) is closed. Let \(\left\{ (\overline{\tau }^{(n)})\right\} _{n\ge 0}\) in \(\Xi \) and \(\left\{ (\eta ^{i,(n)})\right\} _{n\ge 0}\) in \(\Xi ^i\) be convergent sequences such that \((\eta ^{i,(n)})\in \widetilde{\Psi }^i_*((\overline{\tau }^{(n)}))\) for each n. Moreover, let \((\eta ^{i,(n)})\Rightarrow (\eta ^i)\) and \((\overline{\tau }^{(n)})\Rightarrow (\overline{\tau })\) as \(n\rightarrow \infty \) for some \((\eta ^i)\in \Xi ^i\) and \((\overline{\tau })\in \Xi \). Since the graph of \(\widetilde{\Theta }^i\) is closed, showing that \(\widetilde{\Psi }^i_*\) has the same property only requires proving that the equalities defining \(\widetilde{\Psi }^i_*\) hold for \((\eta ^i)\) and \((\overline{\tau })\). Let us fix \(t\ge 1\). From the definition of \(\widetilde{\Psi }^i_*\) we know that for each n

$$\begin{aligned}{} & {} \int _{S^i}V^{i,t-1}_{*(\overline{\tau }^{(n)})}(s)\left( \eta ^{i,(n)}_t\right) _{S^i}(ds)\\{} & {} \quad =\int _{D^i}\left[ r^i(s,a,\overline{\tau }^{(n)}_{t-1})+\int _{S^i}V^{i,t}_{*(\overline{\tau }^{(n)})}(s')Q^i_*(ds'\mid s,a,\overline{\tau }^{(n)}_{t-1})\right] \eta ^{i,(n)}_{t-1}(ds\times da)\nonumber \end{aligned}$$
(20)

Then by the continuity of \(Q^i_*\) and \(V^{i,t}_{*(\overline{\tau }^{(n)})}\) and Theorem 3.3 in [54] (see also Remark 3.4 (ii) there—by (A2”) (b) the assumption presented there is true for \(g=\widetilde{L}\alpha ^tw\)), \(\int _{S^i}V^{i,t}_{*(\overline{\tau }^{(n)})}(s')Q^i_*(ds'\mid \cdot ,\cdot ,\overline{\tau }^{(n)}_{t-1})\) converges continuously to \(\int _{S^i}V^{i,t}_{*(\overline{\tau }}(s')Q^i_*(ds'\mid \cdot ,\cdot ,\overline{\tau }_{t-1})\). As also \(r^i(\cdot ,\cdot ,\overline{\tau }^{(n)}_{t-1})\) converges continuously to \(r^i(\cdot ,\cdot ,\overline{\tau }_{t-1})\) by (A1”), using Theorem 3.3 in [54] again (again with \(g=\widetilde{L}\alpha ^tw\), which satisfies the assumption given in Remark 3.4 (ii) by (A1”) and (A2”) (b)), we can pass to the limit in (20), obtaining

$$\begin{aligned}{} & {} \int _{S^i}V^{i,t-1}_{*(\overline{\tau })}(s)\left( \eta ^i_t\right) _{S^i}(ds)\\{} & {} \quad =\int _{D^i}\left[ r^i(s,a,\overline{\tau }_{t-1})+\int _{S^i}V^{i,t}_{*(\overline{\tau })}(s')Q^i_*(ds'\mid s,a,\overline{\tau }_{t-1})\right] \eta ^i_{t-1}(ds\times da) \end{aligned}$$

As t was arbitrary, this ends the proof that the graph of \(\widetilde{\Psi }^i_*\) is closed.

The remainder of the proof is identical to the argument presented in the proof of Theorem 4: We define the correspondence from \(\Xi \) into itself:

$$\begin{aligned} \widetilde{\Psi }_*((\overline{\tau })):=\widetilde{\Psi }^1_*((\overline{\tau }))\times \cdots \times \widetilde{\Psi }^N_*((\overline{\tau })). \end{aligned}$$

By what we have shown, \(\widetilde{\Psi }_*\) has nonempty values and its graph is closed. Convexity of values of \(\widetilde{\Psi }_*\) is obvious. As we know, \(\Xi \) is compact in product topology. Therefore, Glickberg’s fixed point theorem [24] implies that \(\widetilde{\Psi }_*\) has a fixed point. Suppose \((\overline{\tau ^*})\) is this fixed point. Disintegrating \(\tau ^{*i}_t\) gives for \(i=1,\ldots ,N\) and \(t=0,1,\ldots \) stochastic kernels \(\pi ^{*i}_t\) and measures \(\mu ^{*i}_t\) which (after similar modifications as in the proof of Theorem 1) correspond to the Markov strategies and global state flows in mean-field Markov equilibrium in the total-reward game. \(\square \)

Remark 4

It should be noted here that Theorems 6 and 8 applied to a game with a single population extend the existing results about the existence of equilibria in single-population total-reward mean-field games. The only results of such type in the literature appear in [57] and concern model with finite state and action spaces. In [57] it was assumed that the probability of reaching \(s^*\) within some fixed number of stages is for any strategies used by the players bigger than some \(p_0>0\). Assumptions (A4) and (A4”) used here can be seen as counterparts of this assumption for our model with weight function w applied to the states. It is easy to see that in the finite state and action case (A4) reduces to the simpler assumption described above.

Remark 5

It is also worth noting that some total-reward mean-field game models that are not directly considered in the article can be treated as specific cases of our framework. Firstly, a total reward game without a replacement of dead players by new-born ones is a specific case with \(s^*\) being an absorbing state. For such a case the existence of Markov mean-field equilibrium is guaranteed by Theorem 8 without any modification of assumptions.

Another case that can be treated as an instance of our model is the total reward game with a finite horizon. In this case the application of our framework requires replacing individual state spaces of players of each population \(S^i\) with \(\left( (S^i{\setminus }\{ s^*\})\times \{ 0,\ldots ,T\}\right) \cup \{ s^*\}\) (with T denoting the time horizon). If the stochastic kernel \(Q^i\) denotes the transition probability for population i in the original model, the transitions in the modified one \(\widehat{Q}^i\) are defined completely with the following formulas:

$$\begin{aligned} \widehat{Q}^i(B\times \{ t+1\}\mid (s,t),a,\overline{\tau })=Q^i(B\mid s,a,\overline{\tau }_S) \end{aligned}$$

for \(s\in S^i\), \(a\in A^i(s)\), \(t<T\) and \(B\in \mathcal {B}(S^i)\), with \(\overline{\tau }_S=(\tau ^1_{S^1},\ldots ,\tau ^N_{S^N})\) being a vector of marginals of measures \(\tau ^i\) on original individual state spaces \(S^i\), and

$$\begin{aligned} \widehat{Q}^i(\{ s^*\}\mid (\cdot ,T),\cdot ,\cdot )\equiv 1\equiv \widehat{Q}^i(\{ s^*\}\mid s^*,a^*,\cdot ). \end{aligned}$$

Then the single-stage rewards are defined independently of the time components of both the individual and the global state. The existence of Markov mean-field equilibria is then assured by Theorem 8 under assumptions (A1)–(A3) on original primitives of the model (which correspond to (A1”), (A2”) and (A3) for the modified one). (A4”) is satisfied automatically. A similar transformation allows for considering the non-stationary model with finite horizon as well as the case when the time horizon of each individual is a random variable with a finite expected value independent from the Markov chain of his individual states.

6 Concluding Remarks

In the paper we have presented a model of discrete-time mean-field game with several populations of players. Games of this type have not been studied in the literature in the discrete-time setting. The main results presented in this article are stationary and Markov mean-field equilibrium existence theorems for two payoff criteria: \(\beta \)-discounted payoff and total payoff proved under some rather general assumptions on one-step reward functions and individual transition kernels of the players. It is also worth noting that the games with total payoff have only been studied in finite state space case, hence, the results presented here also extend those for total-payoff mean-field games with a single population. The article is the first of two papers on multiple-population discrete-time mean-field games with discounted or total payoff. In the second one we provide theorems showing that under some additional assumptions equilibria obtained in the mean-field models are approximate equilibria in their n-person counterparts when n is large enough. We also plan further research on the topic of discrete-time mean-field games with multiple populations of players which will concentrate on games with long-run average reward.