Abstract
In the paper we present a model of discrete-time mean-field game with several populations of players. Mean-field games with multiple populations of the players have only been studied in the literature in the continuous-time setting. The main results of this article are the first stationary and Markov mean-field equilibrium existence theorems for discrete-time mean-field games of this type. We consider two payoff criteria: \(\beta \)-discounted payoff and total payoff. The results are provided under some rather general assumptions on one-step reward functions and individual transition kernels of the players. In addition, the results for total payoff case, when applied to a single population, extend the theory of mean-field games also by relaxing some strong assumptions used in the existing literature.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Dynamic games with a large number of players are a natural tool to model dynamic interactions in many areas of science, yet they do not attract much attention due to complexity of such models. One of the natural ways to deal with problems with a large number of agents that have been developped in different fields of research is to replace such complex models with relatively simpler ones with a continuum of infinitesimal players. This kind of approximations have appeared in one-step games at least since two seminal papers by Wardrop [56] and Schmeidler [53], but for a long time have not been introduced to dynamic game models. The situation has changed since a series of papers by Lasry and Lions [41, 42] and by Huang, Caines and Malhamé [36,37,38] where models of non-cooperative differential games with a continuum of identical players have been introduced. The idea on which these models were founded was that for games of this type in the limit with infinite number of players, the game problem can be reduced to a much simpler single-agent decision problem. A huge number of publications on the topic have followed during the last decade and the literature is still growing fast. A review of the existing results on differential-type mean-field games can be found in the books [8, 16] or the survey [27].
Similar discrete-time models have appeared in the literature significantly earlier in the paper by Jovanovic and Rosenthal [39] under the name of anonymous sequential games, but have not attracted as much attention as their continuous-time counterparts. However, since then some further theoretical results on this type of games have appeared. The models with discounted payoff criterion have been studied in [3, 10, 11, 18, 21, 49,50,51,52]. Conditions under which Nash equilibria in finite-player discounted-utility games converge to equilibria of respective anonymous models were analyzed in [29, 30, 35, 47, 49]. In [48, 57, 58] long-time average payoff has been considered, while [57] have treated the games with total reward criterion. In [5, 6], algorithms allowing to compute mean-field equilibria in both discounted and average reward games have been presented.
All of the papers enumerated above have considered the case with only one population of symmetrical players. There is no reason however not to consider mean-field games with a bigger number of populations. As long as this number is small, considering this kind of limit model rather than a game with a huge finite number of players should be a significant simplification of the problem. Moreover, there are natural applications of this type of models. For example, we may want to analyze the influence of two related industries on each other. If each of these industries consists of a large number of firms competing with each other over time, their interaction can be modeled using a mean-field model. But as we would like to model the interaction between the two industries, the aggregate behaviour of companies from each industry should affect the preformance of the other one, so these two different populations need to be modeled within a single mean-field framework. Another natural application could be in modeling the opinion dynamics in social media. The herding behaviour in such a setting has been widely discussed in the literature. In particular, it has been analyzed using mean-field models. We know however that this dynamics will change when we have several groups with different political views clashing with each other. They can be introduced into the mean-field model by treating them as different populations. In case of continuous time, multiple-population mean-field models have been introduced in [36] and further studied in [1, 7, 9, 19, 20, 23, 44]. As far as we know, there have been no papers on discrete-time mean-field games with multiple populations of players. In this article, we try to fill in that gap by introducing two models of games of this type: one with discounted payoff, another with total payoff. In both cases we provide the results about the existence of mean-field equilibria in such games under some natural assumptions. It is worth mentioning here that some of the results we present, notably all concerning total payoff criterion, are proved under much less restrictive assumptions than those used in the existing literature on single-population mean-field games. As single-popultaion games are just a specific case of the model presented here, in that way the paper also extends the theory for single-population mean-field games. This is further discussed when the relevant results are presented.
The organization of the paper is as follows: In Sect. 2 we present the way to model the discrete-time mean-field games with several populations of the players. In Sect. 3 we introduce some notation used in the remainder of the article. In Sects. 4 and 5 we present several mean-field equilibrium-existence theorems for cases of discounted and total payoff, respectively. Finally, in Sect. 6 we give some concluding remarks.
2 The Model
Mean-field game models were designed to approximate dynamic game situations with a large number of symmetric agents. In multi-population mean-field games we still assume that the number of agents is large, but they are homogenic only within a smaller group called a population. The number of populations is finite and fixed, and their mutual interactions are encompassed in each individual’s rewards and transitions. Each population has its own reward function and transition kernel (which may or may not operate on the same state space), which makes it significantly different from the models considered in the literature on discrete-time mean-field games. Below we describe the model formally.
A multi-population discrete-time mean-field game is described by the following objects:
-
We assume that the game is played in discrete time, that is \(t\in \{ 1,2,\ldots \}\).
-
The game is played by an infinite number (continuum) of players divided into N populations. Each player has a private state s, changing over time. We assume that the set of individual states \(S^i\) is the same for each player in population i (\(i=1,\ldots ,N\)), and that it is a nonempty closed subset of a locally compact Polish space S.Footnote 1
-
A vector \(\overline{\mu }=(\mu ^1,\ldots ,\mu ^N)\in \Pi _{i=1}^N\Delta (S^i)\) of N probability distributions over Borel setsFootnote 2 of \(S^i\), \(i=1,\ldots ,N\), is called a global state of the game. Its ith component describes the proportion of ith population, which is in each of the individual states. We assume that at every stage of the game each player knows both his private state and the global state, and that his knowledge about individual states of his opponents is limited to the global state.
-
The set of actions available to a player from population i in state \((s,\overline{\mu })\) is given by \(A^i(s)\), with \(A:=\bigcup _{i\in \{ 1,\ldots ,N\}, s\in S^i}A^i(s)\)—a compact metric space. For any i, \(A^i(\cdot )\) is a non-empty compact valued correspondence such that
$$\begin{aligned} D^i:=\{ (s,a)\in S^i\times A: a\in A^i(s)\} \end{aligned}$$is a measurable set. Note that we assume that the sets of actions available to a player only depend on his private state and not on the global state of the game.
-
The global distribution of the state-action pairs is denoted by \(\overline{\tau }=(\tau ^1,\ldots ,\tau ^N)\in \Pi _{i=1}^N\Delta (D^i)\). Again, it gives the distributions of state-action pairs within the population divided into subpopulations \(i=1,\ldots ,N\).
-
Immediate reward of an individual from population i is given by a measurable function \(r^i:D^i\times \Pi _{i=1}^N\Delta (D^i)\rightarrow \mathbb {R}\). \(r^i(s,a,\overline{\tau })\) gives the reward of a player at any stage of the game when his private state is s, his action is a and the distribution of state-action pairs among the entire player population is \(\overline{\tau }\).
-
Transitions are defined for each individual separately with stochastic kernels \(Q^i:D^i\times \Pi _{i=1}^N\Delta (D^i)\rightarrow \Delta (S^i)\) denoting transition probability for players from ith population. \(Q^i(B\mid \cdot ,\cdot ,\overline{\tau })\) is product-measurable for any \(B\in \mathcal {B}(S^i)\), any \(\overline{\tau }\in \Pi _{i=1}^N\Delta (D^i)\) and \(i\in \{ 1,\ldots ,N\}\).
-
The global state at time \(t+1\), \(\overline{\mu _t}\), is given by the aggregation of individual transitions of the players done by the formula
$$\begin{aligned} \mu _{t+1}^{i}(\cdot )=\Phi ^i(\cdot \mid \overline{\tau _t}):=\int _{D^i}Q^i(\cdot \mid s,a,\overline{\tau _t})\tau _t^i(ds\times da). \end{aligned}$$As it can be clearly seen, the transition of the global state is deterministic.
A sequence \(\pi ^i=\{\pi _t^i\}_{t=0}^\infty \) of functions \(\pi _t^i:S^i\rightarrow \Delta (A)\), such that \(\pi _t^i(B\mid \cdot )\) is measurable for any \(B\in \mathcal {B}(A)\) and any t, satisfying \(\pi _t^i(A^i(s)\mid s )=1\) for every \(s\in S^i\) and every t, is called a Markov strategy for a player of population i. A function \(f^i:S^i \rightarrow \Delta (A)\), such that \(f^i(B\mid \cdot )\) is measurable for any \(B\in \mathcal {B}(A)\), satisfying \(f^i(A^i(s)\mid s )=1\) for every \(s\in S^i\) is called a stationary strategy. The set of all Markov strategies for players from ith population is denoted by \(\mathcal {M}^i\) while that of stationary strategies by \(\mathcal {F}^i\). As in MDPs, stationary strategies can be seen as a specific case of Markov strategies that do not depend on t. In the paper we never consider general (history-dependent) strategies.Footnote 3
Next, let \(\Pi ^i_t(\pi ^i,\mu ^i)\) denote the state-action distribution of the ith population players at time t in the mean-field game corresponding to the distribution of individual states in population i, \({\mu }^i\) and a Markov strategy for players of population i, \(\pi ^i\in \mathcal {M}^i\), that is
The vector \((\Pi _t^1(\pi ^1,{\mu }^1),\ldots ,\Pi _t^N(\pi ^N,{\mu }^N))\) will be denoted by \(\overline{\Pi }_t(\overline{\pi },\overline{\mu })\). When we use this notation for stationary strategies, we skip the subscript t.
Given the evolution of the global state, which depends on the strategies of the players in a deterministic manner, we can define the individual history of a player \(\alpha \) (from any given population i) as the sequence of his consecutive individual states and actions \(h=(s^\alpha _0,a^\alpha _0,s^\alpha _1,a^\alpha _1,\ldots )\). By the Ionescu-Tulcea theorem (see Chap. 7 in [12]), for any Markov strategies \(\pi ^\alpha \) of player \(\alpha \) and \(\sigma ^1,\ldots ,\sigma ^N\) of other players (including all other players of the same population), any initial global state \(\overline{\mu _0}\) and any initial private state of player \(\alpha \), s, there exists a unique probability measure \(\mathbb {P}^{s,\overline{\mu _0},\overline{Q},\pi ^\alpha ,\overline{\sigma }}\) on the set of all infinite individual histories of the game \(H=(D^i)^\infty \) endowed with Borel \(\sigma \)-algebra, such that for any \(B\in \mathcal {B}(S^i)\), \(E\in \mathcal {B}(A)\) and any partial history \(h^\alpha _t=(s^\alpha _0,a^\alpha _0,\ldots ,s^\alpha _{t-1},a^\alpha _{t-1},s^\alpha _t)\in (D^i)^t\times S^i=:H_t\), \(t\in \mathbb {N}\),
with state-action distributions defined by \(\tau ^j_0=\Pi ^j_0(\sigma ^j,\mu ^j_0)\), \(\tau ^j_{t+1}=\Pi ^i_t(\sigma ^j,\Phi ^j(\cdot \mid \overline{\tau ^t}))\) for \(t=1,2,\ldots \) and \(j=1,\ldots ,N\).
Now we are ready to define the two types of reward we shall consider in this paper. For \(\beta \in (0,1)\), the \(\beta \)-discounted rewardFootnote 4 for a player \(\alpha \) from population i using policy \(\pi ^i\in \mathcal {M}^i\) when other players use policies \(\sigma ^j\in \mathcal {M}^j\) (depending on the population j they belong to) and the initial global state is \(\overline{\mu _0}\), with the initial individual state of player \(\alpha \) being \(s^i_0\) is defined as follows:
where \(\tau ^j_0=\Pi _0^j(\sigma ^j,\mu ^j_0)\), \(\tau ^j_{t+1}=\Pi _t^j(\sigma ^j,\Phi ^j(\cdot \mid \overline{\tau ^t}))\) for \(t=1,2,\ldots \) and \(j=1,\ldots ,N\).
To define the total reward in our game let us distinguish one state in S, say \(s^*\), isolated from \(S{\setminus }\{ s^*\}\) and assume that \(A^i(s^*)=\{ a^*\}\) independently of \(i\in \{ 1,\ldots ,N\}\) for some fixed \(a^*\) isolated from \(A{\setminus }\{ a^*\}\). Moreover, let us assume that \(s^*\in S^i\) for \(i=1,\ldots ,N\). Then the total reward of a player from population i using policy \(\pi ^i\in \mathcal {M}^i\) when other players apply policies \(\overline{\sigma }=(\sigma ^1,\ldots ,\sigma ^N)\) and the initial global state is \(\overline{\mu _0}\), with the initial individual state of player \(\alpha \) being \(s_0^i\), is defined in the following way:
where \(\tau ^j_0=\Pi _0^j(\sigma ^j,\mu ^j_0)\), \(\tau ^j_{t+1}=\Pi _t^j(\sigma ^j,\Phi ^j(\cdot \mid \overline{\tau ^t}))\) for \(t=1,2,\ldots \) and \(j=1,\ldots ,N\), while \(\mathcal {T}^i\) is the moment of the first arrival of the process \(\{ s_t^i\}\) to \(s^*\) (we assume it is finite with probability 1Footnote 5). The total reward is interpreted as the reward accumulated by the player over the whole of his lifetime. State \(s^*\) is an artificial state (so is action \(a^*\)) denoting that a player is dead. \(\overline{\mu _0}\) corresponds to the distribution of the states across the population when he is born, while \(s^i_0\) is his own state when he is born. The fact that after some time the state of a player can become again different from \(s^*\) should be interpreted as that after some time the player is replaced by some new-born one.
As the type of reward introduced above is not commonly used in the literature, below we present an example of a situation that can be modeled as a mean-field game with this type of optimality criterion.
Example 1
Suppose a water region is monitored using a large population of wireless sensors. The job of each sensor is to make measurements of the state of the water and send them to some base station. Each sensor is powered by a battery whose capacity is limited. The private state of a sensor is two-dimensional: it consists of its last measurement and its battery level. The action of a sensor is the frequency at which it sends its data to the base station. The higher the frequency is, the better the quality of the monitoring service becomes, but at the same time the speed at which the battery depletes increases. Thus the goal of each sensor is to use its available power efficiently. This can be modeled by the summation of its one-step rewards for the monitoring activity (which are monotonic in the urgency of data sent—that is, if the measurments point to something harmful, the speed of sending them becomes important, hence, the one-step reward becomes high) over a lifetime of its battery, which corresponds to the total reward defined above. Note, that in such a setting the reward for each sensor is computed over a different time frame from the moment its battery is replaced (we may assume this is done a time unit after the battery is emptied) until the moment it depletes again. At the same time the players are symmetric and the situation is stationary as far as the primitives of the model are concerned.
Next, we define the solutions we will be looking for:
Definition 1
Stationary strategies \(f^1\in \mathcal {F}^1,\ldots f^N\in \mathcal {F}^N\) and a global state \(\overline{\mu }\in \Pi _{i=1}^N\Delta (S^i)\) form a stationary mean-field equilibrium in the \(\beta \)-discounted reward game if for any i, \(s^i_0\in S^i\), and every other stationary strategy of a player from population i, \(g^i\in \mathcal {F}^i\)
and if \(\overline{\mu }_0=\overline{\mu }\), then \(\overline{\mu }_t=\overline{\mu }\) for every \(t\ge 1\) if strategies \(f^1,\ldots ,f^N\) are used by all the players.
Markov strategies \(\pi ^1\in \mathcal {M}^1,\ldots \pi ^N\in \mathcal {M}^N\) and a global state flow \((\overline{\mu }_0^*,\overline{\mu }_1^*,\ldots )\in (\Pi _{i=1}^N\Delta (S^i))^\infty \) form a Markov mean-field equilibrium in the \(\beta \)-discounted reward game if for any i, \(s^i_0\in S^i\), and every other Markov strategy of a player from population i, \(\sigma ^i\in \mathcal {M}^i\)
and if \(\overline{\mu }_0=\overline{\mu }^*_0\) implies \(\overline{\mu }_t=\overline{\mu }^*_t\) for every \(t\ge 1\) if strategies \(\pi ^1,\ldots ,\pi ^N\) are used by all the players.
Similarly,
Definition 2
Stationary strategies \(f^1\in \mathcal {F}^1,\ldots f^N\in \mathcal {F}^N\) and a global state \(\overline{\mu }\in \Pi _{i=1}^N\Delta (S^i)\) form a stationary mean-field equilibrium in the total reward game if for any i, \(s_i^0\in S^i\), and every other stationary strategy of a player from population i, \(g^i\in \mathcal {F}^i\)
Moreover, if \(\overline{\mu }_0=\overline{\mu }\), then \(\overline{\mu }_t=\overline{\mu }\) for every \(t\ge 1\) if strategies \(f^1,\ldots ,f^N\) are used by all the players.
Markov strategies \(\pi ^1\in \mathcal {M}^1,\ldots \pi ^N\in \mathcal {M}^N\) and a global state flow \((\overline{\mu }_0^*,\overline{\mu }_1^*,\ldots )\in (\Pi _{i=1}^N\Delta (S^i))^\infty \) form a Markov mean-field equilibrium in the total reward game if for any i, t, \(s_i^t\in S^i\) and every other Markov strategy of a player from population i, \(\sigma ^i\in \mathcal {M}^i\),
with \({}^ta\) denoting for any infinite vector \(a=(a_0,a_1,\ldots )\), the vector \((a_t,a_{t+1},\ldots )\). Moreover, if \(\overline{\mu }_0=\overline{\mu }^*_0\), then \(\overline{\mu }_t=\overline{\mu }^*_t\) for every \(t\ge 1\) if strategies \(\pi ^1,\ldots ,\pi ^N\) are used by all the players.
3 Preliminaries
As we have written, we assume that S and A are metric spaces. The metric on S will be denoted by \(d_S\) while that on A by \(d_A\). Whenever we relate to a metric on a product space, we mean the sum of the metrics on its coordinates. Some of the assumptions presented below will be given with respect to the moment function \(w_0:S\rightarrow [1,\infty )\), that is a continuous function satisfying
for some sequence \(\{ K_n\} _{n\ge 1}\) of compact subsets of S.
In order to study both bounded and unbounded one-stage reward functions, we define the following function:
For any function \(h:S\rightarrow \mathbb {R}\) we define its w-norm as
Whenever we speak of functions defined on a product of S and some other space, their w-norm is defined similarly, with the help of the same function w.
By \(B_w(S)\) we denote the space of all real-valued measurable functions from S to \(\mathbb {R}\) with finite w-norm. and by \(C_w(S)\)—the space of all continuous functions in \(B_w(S)\). Clearly, both \(B_w(S)\) and \(C_w(S)\) are Banach spaces. The same can be said of \(B_w(S\times A)\) and \(C_w(S\times A)\)—the spaces of bounded and bounded continuous functions from \(S\times A\) to \(\mathbb {R}\) with finite w-norm.Footnote 6
Analogously, for any finite signed measure \(\mu \) on S, we define the w-norm of \(\mu \) as
It should be noted that in case \(w\equiv 1\), \(\Vert \mu \Vert _w\) is the total variation distance (see e.g. [33], Section 7.2).
There are two standard types of convergence of probability measures which are used in the paper: the weak convergence denoted by \(\Rightarrow \) and the strong (or setwise) convergence denoted by \(\rightarrow \) and defined (for any Borel space \((X,\mathcal {B}(X))\)) by
It is known (see e.g. [45], Theorem 6.6) that the weak topology can be metrized using the metric
where \(\{\phi _i\}_{i\ge 1}\) is a sequence of continuous bounded functions from S to \(\mathbb {R}\) whose elements form a dense subset of the unit ball in C(S). Strong convergence topology is in general not metric.
Next, let
It has been shown in [49] that \(\Delta _w(S)\) can be metrized using the metric
We will use the topology defined by this metric (called w-topology in the sequel) as the standard topology on \(\Delta _w(S)\).
We will also use the notation
with analogously defined metrics also denoted by \(\rho \) (metric defining weak convergence) and \(\rho _w\) (w-metric) as well as similar notation for subsets of S or \(S\times A\).
Whenever we speak about continuity of correspondences, we refer to the following definitions:
Let X and Y be two metric spaces and \(F:X\rightarrow Y\), a correspondence. Let \(F^{-1}(G)=\{ x\in X: F(x)\cap G\ne \emptyset \}\). We say that F is upper semicontinuous iff \(F^{-1}(G)\) is closed for any closed \(G\subset Y\). F is lower semicontinuous iff \(F^{-1}(G)\) is open for any open \(G\subset Y\). F is said to be continuous iff it is both upper and lower semicontinuous. For more on (semi)continuity of correspondences see [32], Appendix D or [4], Chapter 17.2.
4 The Existence of Stationary and Markov Mean Field Equilibria in Discounted Payoff Game
4.1 Assumptions
In this section, we address the problem of the existence of an equilibrium in discrete-time mean-field games with \(\beta \)-discounted payoff. We begin by presenting the set of assumptions used in our results.
- (A1):
-
For \(i=1,\ldots ,N\), \(r^i\) is continuous and bounded above by some constant R on \(D^i\times \Pi _{i=1}^N\Delta (D^i)\). Moreover, for \(i=1,\ldots ,N\) and \(s\in S^i\),
$$\begin{aligned} \inf _{(a,\overline{\tau })\in A^i(s)\times \Pi _{i=1}^N\Delta _w(D^i)}r^i(s,a,\overline{\tau })\ge -Rw(s). \end{aligned}$$ - (A2):
-
For \(i=1,\ldots ,N\) and any sequence \(\{ s_n,a_n,\overline{\tau }_n\}\subset D^i\times \Pi _{i=1}^N\Delta (D^i)\) such that \(s_n\rightarrow s_*\), \(a_n\rightarrow a_*\) and \(\overline{\tau }_n\Rightarrow \overline{\tau }^*\), \(Q^i(\cdot \mid s_n,a_n,\overline{\tau }_n)\rightarrow Q(\cdot \mid s_*,a_*,\overline{\tau }^*)\). Moreover,
- (a):
-
for \(i=1,\ldots ,N\) the functions
$$\begin{aligned} \int _S w(s')Q^i(ds'\mid s,a,\overline{\tau }) \end{aligned}$$are continuous in \((s,a,\overline{\tau })\),
- (b):
-
for \(i=1,\ldots ,N\) and \(s\in S^i\)
$$\begin{aligned} \sup _{(a,\overline{\tau })\in A^i(s)\times \Pi _{i=1}^N\Delta (D^i)}\int _Sw(s')Q^i(s'\mid s,a,\overline{\tau })\le w(s). \end{aligned}$$
- (A3):
-
For \(i=1,\ldots ,N\), correspondences \(A^i\) are continuous.
In some theorems weaker versions of assumptions (A1) and (A2) will be used:
- (A1’):
-
For \(i=1,\ldots ,N\), \(r^i\) is continuous and bounded above by some constant R on \(D^i\times \Pi _{i=1}^N\Delta (D^i)\). Moreover, there exist non-negative constants \(\alpha \), \(\gamma \), M satisfying \(\alpha \beta \gamma <1\) and
$$\begin{aligned} \int _Sw(s)\mu _0^i(ds)\le M\quad \text{ for } i=1,\ldots ,N, \end{aligned}$$and such that for \(i=1,\ldots ,N\), \(s\in S^i\) and \(t=0,1,2,\ldots \),
$$\begin{aligned} \inf _{(a,\overline{\tau })\in A^i(s)\times \Pi _{i=1}^N\Delta _w^{(t)}(D^i)}r^i(s,a,\overline{\tau })\ge -R\gamma ^tw(s) \end{aligned}$$with \(\Delta _w^{(t)}(D^i):=\left\{ \tau ^i\in \Delta _w(D^i): \int _{D^i}w(s)\tau ^i(ds\times da)\le \alpha ^tM\right\} \).
- (A2’):
-
For \(i=1,\ldots ,N\) and any sequence \(\{ s_n,a_n,\overline{\tau }_n\}\subset D^i\times \Pi _{i=1}^N\Delta _w(D^i)\) such that \(s_n\rightarrow s_*\), \(a_n\rightarrow a_*\) and \(\overline{\tau }_n\Rightarrow \overline{\tau }^*\), \(Q^i(\cdot \mid s_n,a_n,\overline{\tau }_n)\Rightarrow Q(\cdot \mid s_*,a_*,\overline{\tau }^*)\). Moreover,
- (a):
-
for \(i=1,\ldots ,N\) the functions
$$\begin{aligned} \int _S w(s')Q^i(ds'\mid s,a,\overline{\tau }) \end{aligned}$$are continuous in \((s,a,\overline{\tau })\),
- (b):
-
for \(i=1,\ldots ,N\) and \(s\in S^i\)
$$\begin{aligned} \sup _{(a,\overline{\tau })\in A^i(s)\times \Pi _{i=1}^N\Delta (D^i)}\int _Sw(s')Q^i(s'\mid s,a,\overline{\tau })\le \alpha w(s). \end{aligned}$$
4.2 Main Results
In the first main result of this section we prove the existence of stationary mean-field equilibrium in discounted discrete-time mean-field games.
Theorem 1
Suppose that the assumptions (A1–A3) are satisfied. Then for any \(\beta \in (0,1)\) the multi-population discrete-time mean-field game with \(\beta \)-discounted payoff defined with \(r^i\), \(Q^i\), \(S^i\) and \(A^i\), \(i=1,\ldots ,N\), has a stationary mean-field equilibrium.
Remark 1
It should be noted here that the results given in Theorem 1 applied to the model with a single population extend the existing results for such a case. The most general result of this type in the literature appears in [39] and concerns the case with compact individual state space. Here, the individual state spaces are arbitrary closed subsets of a locally compact Polish space. As a consequence, also the reward functions are not necessarily bounded (we only assume that they are bounded above and have a finite w-norm).
In the proof of the theorem we adapt the techniques introduced in [39] to our case. We precede the proof of the theorem with two lemmas.
Lemma 2
For any \(\overline{\tau }\in \Pi _{i=1}^N\Delta (D^i)\) letFootnote 7
that is, let \(V^i_{\beta ,\overline{\tau }}\) be the optimal value for the \(\beta \)-discounted Markov decision process of player from population i when the behaviour of all the other players is described by the state-action measure \(\overline{\tau }\), fixed over time. Under assumptions (A1–A3) \(V^i_{\beta ,\overline{\tau }}(s)\) is jointly continuous in \((s,\overline{\tau })\) on \(S^i\times \Pi _{i=1}^N\Delta _w(D^i)\) and \(\Vert V^i_{\beta ,\overline{\tau }}\Vert _w\le \frac{R}{1-\beta }\).
Proof
Let us fix an \(i\in \{ 1,\ldots ,N\}\) and define for any \(\overline{\tau }\in \Pi _{j=1}^N\Delta _w(D^j)\)
Note, that clearly (by assumptions (A1) and (A2) (b)) \(T^i_{\overline{\tau }}\) maps \(B_w(S^i)\) into itself. Moreover, for any \(u_1,u_2\in B_w(S^i)\),
where the penultimate inequality follows from the definition of the w-norm, while the last one from assumption (A2) (b). Hence, \(T^i_{\overline{\tau }}\) is a contraction defined on a complete space. By the Banach fixed point theorem it has a unique fixed point, which is by Theorem 4.2.3 in [32] equal to \(V^i_{\beta ,\overline{\tau }}\). Moreover, this fixed point can be obtained as \(\lim _{n\rightarrow \infty }\left( T^i_{\overline{\tau }}\right) ^n(u_0)\) for any given \(u_0\in B_w(S^i)\).
Let \(u_0^{\overline{\tau }}\equiv 0\) and define for \(n=1,2,\ldots \) \(u_n^{\overline{\tau }}:=T^i_{\overline{\tau }}(u_{n-1}^{\overline{\tau }})\). We will next show that for each n, \(u_n^{\overline{\tau }}(s)\) is continuous in \((s,{\overline{\tau }})\) and \(\Vert u_n^{\overline{\tau }}\Vert _w\le \frac{R}{1-\beta }\).
We prove these statements by induction on n. For \(n=0\) both claims are obvious. Suppose they hold for \(n=k-1\). Then by Theorem 3.3 in [54] (see also Remark 3.4 (ii) there—the assumptions given there are satisfied with \(g=\frac{R}{1-\beta }w\) by (A2) (b)), \(r^i(s,a,\overline{\tau })+\beta \int _Su_{k-1}^{\overline{\tau }}(s')Q^i(ds'\mid s,a,\overline{\tau })\) is jointly continuous in \((s,a,\overline{\tau })\), hence, by Proposition 7.32 in [12]
is also (jointly) continuous. We also have (here the third inequality is a consequence of assumption (A2) (b), while the fourth one follows from (A1) and our inductive assumption):
Thus, the second claim has been proved for \(n=k\).
To finish the proof, let us take convergent sequences \(\{ s_k\}_{k\ge 1}\) in \(S^i\) and \(\{ \overline{\tau }_k\}_{k\ge 1}\) in \(\Pi _{j=1}^N\Delta _w(D^j)\) such that \(s_k\rightarrow s_*\) and \(\overline{\tau }_k\Rightarrow \overline{\tau }_*\). We will show that \(V^i_{\beta ,\overline{\tau }_k}(s_k)\rightarrow V^i_{\beta ,\overline{\tau }_*}(s_*)\). We start the proof by noticing that the set \(K:=\{ s_k: k\ge 1\}\cup \{ s_*\}\) is clearly compact, hence there exists a value W such that \(W\ge |w(s)|\) for \(s\in K\). Now, fix any \(\varepsilon >0\) and let \(n_0\) be such that
Clearly, by repeated use of (3) for \(u_1=u_0^{\overline{\tau }_*}\) and \(u_2=V^i_{\beta ,\overline{\tau }_*}\), we obtain
hence
Similarly we obtain that for any \(k\ge 1\),
Finally, from the joint continuity of \(u_{n_0}^{\cdot }(\cdot )\), there exists a \(k_0\in \mathbb {N}\) such that for any \(k\ge k_0\)
Now, combining (4), (5) and (6), we obtain that for any \(k\ge k_0\)
which ends the proof that \(V^i_{\beta ,\cdot }(\cdot )\) is continuous. Proof that \(\Vert V^i_{\beta ,\overline{\tau }}\Vert _w\le \frac{R}{1-\beta }\) is elementary. \(\square \)
In the next lemma we show that for any i and a stationary strategy from \(\mathcal {F}^i\), the invariant measure of the Markov chain of its individual states will be in a subset of probability measures on \(S^i\) whose w norm is bounded by M.
Lemma 3
Suppose assumptions (A1–A3) hold and \(M>0\) is such that for each \(i\in \{ 1,\ldots ,N\}\) the set
is nonempty. Then for each \(\overline{\tau }\in \Pi _{j=1}^N\Delta (D^j)\), \(i=1,\ldots ,N\) and any stationary strategy \(f^i\in \mathcal {F}^i\), there exists a \(\mu _{f^i,\overline{\tau }}\in \Delta _w^M(S^i)\) such that
for any \(B\in \mathcal {B}(S)\).
Proof
Let us fix \(i\in \{ 1,\ldots ,N\}\) and note that for any \(\overline{\tau }\in \Pi _{j=1}^N\Delta (D^j)\), and any stationary strategy \(f^i\in \mathcal {F}^i\), the transition probability
is clearly strongly continuous.
Next, suppose that the initial distribution of individual state of a player from population i is \(\rho _0^i\in \Delta _w^M(S^i)\). We prove by induction that the same is true for \(\rho _t^i\), the distribution of his state when \(t=1,2,\ldots \) if he uses strategy \(f^i\) and the behaviour of the other players is described by \(\overline{\tau }\). Suppose the thesis is true for t. Then by assumption (A2) (b) we have
By Remark 1 in [31] we know that the sequence of measures
(whose elements clearly belong to \(\Delta _w^M(S^i)\), as it is a convex set) has a subsequence weakly converging to an invariant measure of the Markov chain with transition probability \(Q^i(\cdot \mid \cdot ,f^i,\overline{\tau })\). Let us call it \(\mu _{f^i,\overline{\tau }}\). It can be easily showed that \(\Delta _w^M(S^i)\) is tight, hence, by Prohorov’s theorem (Theorem 6.1 in [13]) relatively compact. But this implies that \(\Delta _w^M(S^i)\) is closed in weak convergence topology, hence the invariant measure \(\mu _{f^i,\overline{\tau }}\in \Delta _w^M(S^i)\). \(\square \)
Proof of Theorem 1:
Let M be such as in Lemma 3 and for \(i=1,\ldots ,N\) let
Let us further define the correspondences from \(\Pi _{j=1}^N\Delta _w^M(D^j)\) to \(\Delta _w^M(D^i)\), \(i=1,\ldots ,N\):
We will now verify, that for each i, \(\Theta ^i\) and \(\Psi ^i_\beta \) have some useful properties. We fix \(i\in \{ 1,\ldots ,N\}\) for all these considerations.
First note that \(\eta =\Pi ^i(f^i,\mu _{f^i,\overline{\tau }})\) clearly belongs to \(\Theta ^i(\overline{\tau })\), as for any \(B\in \mathcal {B}(S^i)\),
where the first and the last equality follow from the definition of \(\Pi ^i(\cdot ,\cdot )\), the second comes from the definition of invariant measure, while the third one from (7). Moreover, by Lemma 3\(\mu _{f^i,\overline{\tau }}\in \Delta _w^M(S^i)\), which immediately implies that \(\eta =\Pi ^i(f^i,\mu _{f^i,\overline{\tau }})\in \Delta _w^M(D^i)\).
We next show that the graph of \(\Theta ^i\) is closed in weak convergence topology. To prove that, first note that for any bounded continuous function \(u:S^i\rightarrow \mathbb {R}\), \(\int _{S^i}u(s)Q^i(ds\mid \cdot ,\cdot ,\cdot )\) is, by the strong continuity of \(Q^i\), a continuous function, so for any sequences \(\eta _n^i\in \Delta (D^i)\) and \(\overline{\tau _n}\in \Pi _{j=1}^N\Delta (D^j)\) such that \(\eta _n^i\in \Theta ^i(\overline{\tau _n})\) with \(\eta _n^i\Rightarrow \eta ^i\) and \(\overline{\tau _n}\Rightarrow \overline{\tau }\), \(\int _{S^i}u(s)Q^i(ds\mid \cdot ,\cdot ,\overline{\tau _n})\) converges continuously to \(\int _{S^i}u(s)Q^i(ds\mid \cdot ,\cdot ,\overline{\tau })\). Hence, by Theorem 3.3 in [54], we have
which means that \(\int _{D^i}Q^i(\cdot \mid s,a,\overline{\tau _n})\eta _n^i(ds\times da)\Rightarrow \int _{D^i}Q^i(\cdot \mid s,a,\overline{\tau })\eta ^i(ds\times da)\). From the uniqueness of the limit this implies that \(\left( \eta ^i\right) _{S^i}=\int _{D^i}Q^i(\cdot \mid s,a,\overline{\tau })\eta ^i(ds\times da)\), hence \(\eta ^i\in \Theta ^i(\overline{\tau })\), which implies that the graph of \(\Theta ^i\) is closed.
By Theorem 4.2.3 in [32] there exists an optimal stationary policy \(f^i_*\) in the optimization problem of a player from population i maximizing his discounted reward when the behaviour of all the other players is described by the global state \(\overline{\tau }\), fixed over time. Moreover, \(f^i_*\) is a measurable selector attaining maximum on the RHS of the equation
Then we can write that
which implies that \(\Pi ^i(f^i_*,\mu _{f^i_*,\overline{\tau }})\in \Psi ^i_\beta (\overline{\tau })\).
Next we show that the graph of \(\Psi ^i_\beta \) is closed in the weak convergence topology. Let us take sequences \(\overline{\tau _n}\in \Pi _{j=1}^N\Delta _w^M(D^j)\) and \(\eta _n\in \Delta _w^M(D^i)\) such that \(\eta _n\in \Theta (\overline{\tau _n})\) for every \(n\in \mathbb {N}\) with \(\eta _n\Rightarrow \eta \) and \(\overline{\tau _n}\Rightarrow \overline{\tau }\). Since the graph of \(\Theta ^i\) is closed, to show that so is that of \(\Psi ^i_\beta \), we only need to prove that the equality defining the set \(\Psi ^i_\beta (\overline{\tau })\) is satisfied for \(\overline{\tau }\) and \(\eta \). Note however that for each n
Then by the continuity of \(Q^i\) and \(V^i_{\beta ,\overline{\tau _n}}\) and Theorem 3.3 in [54] (see also Remark 3.4 (ii) there—by (A2) (b) the assumption presented there is true for \(g=\frac{R}{1-\beta }w\)), \(\int _{S^i}V^i_{\beta ,\overline{\tau _n}}(s')Q^i(ds'\mid \cdot ,\cdot ,\overline{\tau _n})\) converges continuously to \(\int _{S^i}V^i_{\beta ,\overline{\tau }}(s')Q^i(ds'\mid \cdot ,\cdot ,\overline{\tau })\). As also \(r^i(\cdot ,\cdot ,\overline{\tau _n})\) converges continuously to \(r^i(\cdot ,\cdot ,\overline{\tau })\) by (A1), we may apply Theorem 3.3 in [54] again (again with \(g=\frac{R}{1-\beta }w\), which satisfies the assumption given in Remark 3.4 (ii) by (A1) and (A2) (b)), we can pass to the limit in (9), obtaining
which ends the proof that the graph of \(\Psi ^i\) is closed. \(\square \)
Finally, we can also note that for each \(\overline{\tau }\), the set \(\Psi ^i_\beta (\overline{\tau })\) is clearly convex.
Next, let us define the following correspondence mapping \(\Pi _{i=1}^N\Delta (D^i)\) into itself:
It is obvious that our previous considerations imply that \(\overline{\Psi }_\beta \) also has nonempty and convex values and that its graph is closed. To finish the proof we need to note that the function \(w_0\) (which is a moment on S) is also a moment on \(S\times A\) (as A is compact), hence each \(\Delta _w^M(D^i)\) is tight. Now Prohorov’s theorem implies that \(\Delta _w^M(D^i)\) is compact in weak convergence topology for \(i=1,\ldots ,N\) and \(\Pi _{j=1}^N\Delta _w^M(D^j)\) is compact in product topology. Therefore, by the Glickberg fixed point theorem [24], \(\overline{\Psi }_\beta \) it has a fixed point.
Suppose \(\overline{\tau }_*\) is this fixed point. By the well-known result, see e.g. [34] p. 89, for each \(i\in \{ 1,\ldots ,N\}\), \(\tau ^i_*\) can be disintegrated into a stochastic kernel \(g^i_*\in \mathcal {F}^i_0\) and its marginal on \(S^i\), \((\tau ^i_*)_{S^i}\), that is, satisfying for any \(D\in \mathcal {B}(D^i)\)
Let us further define
Then, since \(\tau ^i_*\in \Psi ^i_\beta (\overline{\tau }_*)\),
otherwise inequality in the definition of \(S^i_0\) would imply an inequality in the definition of \(\Psi ^i_\beta \).
Let us thus define the strategy
It is clear that for any \(s\in S^i\),
Then, for any \(D\in \mathcal {B}(D^i)\) we can reason as follows:
where the first equality follows from (10), the second and the last one—from the definition of integral over a set, while the third and the fourth one use the definition of strategy \(\widehat{f^i}\) and (11).
Note however that the two last equalities proved imply that there exist strategies \(\widehat{f^i}\) and invariant measures \(\mu ^i_*=(\tau ^i_*)_S\) for each population \(i\in \{ 1,\ldots ,N\}\), such that \(\widehat{f^i}\) is the best response in the \(\beta \)-discounted game against \(\overline{\tau }_*\) and \(\overline{\tau }_*\) is the stationary global state corresponding to the profile of strategies \((\overline{f^1},\ldots ,\overline{f^N})\) and the initial global state \((\mu ^1_*,\ldots ,\mu ^N_*)\), hence a stationary mean-field equilibrium in the \(\beta \)-discounted game. \(\Box \)
The last result in this section gives the conditions under which Markov mean-field equilibria in our models exist. They are based on one of the theorems given in [49]. It should be noted here that the assumptions in that paper are slightly stronger than in our model when applied to a single population. Namely, in our model the rewards and the transitions depend on the state-action distribution of the other players, while in [49] the dependence is only on the distribution of private states. Also, in our model we allow the set of feasible actions to depend on player’s private state, while in [49] there was no such dependence.
Theorem 4
Suppose that the assumptions (A1’–A2’) and (A3) are satisfied. Then for any \(\beta \in (0,1)\) and any \(\overline{\mu }_0\in \Pi _{j=1}^N\Delta _w(S^j)\) the multi-population discrete-time mean-field game with \(\beta \)-discounted payoff defined with \(r^i\), \(Q^i\), \(S^i\) and \(A^i\), \(i=1,\ldots ,N\), has a Markov mean-field equilibrium.
Remark 2
As we have already noted, in our model the rewards and the transitions may depend on the state-action distribution of the players, which differs from [49], where the dependence is only on the distribution of private states. Such an assumption is not new to the mean-field game literature. In case of discrete-time games it has already been used in the first paper on this type of games [39]. It has also been applied in [10, 11, 57, 58]. As for the continuous time case, this type of models have been introduced by Gomes and Voskayan [28] under the name of extended mean-field games. Cardaliaguet and Lehalle have proposed the name of mean field games of controls in [15] for this type of framework. Some further results on the topic include [2, 14, 17, 26, 40, 43].
We precede the proof of Theorem 4 by a counterpart of Lemma 2 for the Markov case. It requires some additional notation. First, let us define for \(i=1,\ldots ,N\) the sets
Next, let for \(t\ge 0\)
Using these constants, we define for \(i=1,\ldots ,N\) and \(t\ge 0\) the sets
It is easy to see that under (A1’), \(\mathcal {C}_i\) with metric
where \(\delta \) is chosen such that \(\delta >\gamma \) and \(\alpha \beta \delta <1\), is a complete metric space.
Lemma 5
For any state-action measure flow \(\left( \overline{\tau }\right) :=\left( \overline{\tau }_0,\overline{\tau }_1,\ldots \right) \in \Xi \), let
that is, let it be the optimal value at time t for the \(\beta \)-discounted Markov decision process of player from population i when the behaviour of all the other players is described by the flow \(\left( \overline{\tau }\right) \). Under assumptions (A1’–A2’) and (A3) for any \(i\in \{ 1,\ldots ,N\}\) and \(t\ge 0\), \(V^{i,t}_{\beta ,\left( \overline{\tau }\right) }\in \mathcal {C}_i^t\).
Proof
Let us fix an \(i\in \{ 1,\ldots ,N\}\) and define for any \(\left( \overline{\tau }\right) \in \Xi \) and \(t\ge 0\)
By Proposition 7.32 in [12] for any \(u\in \mathcal {C}^i_{t+1}\), \(T^{i,t}_{\left( \overline{\tau }\right) }(u)\) is continuous. Moreover,
where the last inequality follows from (A1’) and (A2’) (b) (note that \(\tau _t^i\in \Delta ^{(t)}_w(D^i)\) is implied by the assumption that \(\mu _0^i\in \Delta _w(D^i)\) and (b) of (A2’) applied to the recursive formula for \(\overline{\tau }_t\)). Hence, \(T^{i,t}_{\left( \overline{\tau }\right) }\) maps \(\mathcal {C}^i_{t+1}\) into \(\mathcal {C}^i_{t}\). Next, for any \(u_1,u_2\in \mathcal {C}^i_{t+1}\), we have
where the last inequality follows from the definition of the w-norm and the assumption (A2’) (b).
We next define the operator \(T^{i}_{\left( \overline{\tau }\right) }:\mathcal {C}_i\rightarrow \mathcal {C}_i\) with the formula
From what we have shown, it really maps \(\mathcal {C}_i\) into itself. As (12) implies that for any \((u_0,u_1,\ldots )\) and \((v_0,v_1,\ldots )\) in \(\mathcal {C}_i\),
it is an \(\alpha \beta \delta \)-contraction defined on a complete space. By the Banach fixed point theorem it has a unique fixed point. By Theorems 14.4 and 17.1 in [34] the elements of this vector are equal to \(V^{i,t}_{\beta ,\left( \overline{\tau }\right) }\), \(t\ge 0\), which ends the proof that the optimal value functions \(V^{i,t}_{\beta ,\left( \overline{\tau }\right) }\in \mathcal {C}_i^t\) for \(t\ge 0\). \(\square \)
Now we are ready to pass to the main part of the proof of Theorem 4.
Proof of Theorem 4:
We start by defining the correspondences from \(\Xi \) into \(\Xi ^i\), (\(i=1,\ldots ,N\)) with the formulas:
We next prove that \(\widetilde{\Theta }^i\) and \(\widetilde{\Psi }^i_{\beta }\) have some useful properties. We fix \(i\in \{ 1,\ldots ,N\}\) for these considerations. We start by showing that for any Markov strategy \(\pi ^i\in \mathcal {M}^i\) the flow \((\eta ^i)\) defined with the recurrence
is an element of \(\widetilde{\Theta }^i((\overline{\tau }))\). We do it by induction on t. For \(t=0\) both \(\left( \eta ^i_0\right) _{S^i}=\mu _0^i\) and \(\int _{D^i}w(s)\eta ^i_0(ds\times da)\le M\) are obvious (by the definition of \(\Pi _0^i\) and Assumption (A1’)). Now suppose \(\int _{D^i}w(s)\eta ^i_{t}(ds\times da)\le M\alpha ^{t}\). Then by the defnition of \(\Phi ^i\),
Moreover, by (A2’) (b) we have
which shows that \(\eta ^i_{t+1}\in \Xi ^i_{t+1}\) and by the induction principle that \((\eta ^i)\in \Xi ^i\).
Next we prove that the graph of \(\widetilde{\Theta }^i\) is closed. To do that, we take convergent sequences \(\left\{ (\overline{\tau }^{(n)})\right\} _{n\ge 0}\) in \(\Xi \) and \(\left\{ (\eta ^{i,(n)})\right\} _{n\ge 0}\) in \(\Xi ^i\) such that \((\eta ^{i,(n)})\in \widetilde{\Theta }^i((\overline{\tau }^{(n)}))\) for each n. Moreover, \((\eta ^{i,(n)})\Rightarrow (\eta ^i)\) and \((\overline{\tau }^{(n)})\Rightarrow (\overline{\tau })\) as \(n\rightarrow \infty \) for some \((\eta ^i)\in \Xi ^i\) and \((\overline{\tau })\in \Xi \). Now fix \(t\ge 1\). By the joint strong continuity of \(Q^i\) for any bounded continuous function \(u:S^i\rightarrow \mathbb {R}\), \(\int _{S^i}u(s)Q^i(ds\mid \cdot ,\cdot ,\overline{\tau }_{t-1}^{(n)})\) converges continuously to \(\int _{S^i}u(s)Q^i(ds\mid \cdot ,\cdot ,\overline{\tau }_{t-1})\). Hence, by Theorem 3.3 in [54], we have
which means that \(\int _{D^i}Q^i(\cdot \mid s,a,\overline{\tau }_{t-1}^{(n)})\tau ^{i,(n)}_{t-1}(ds\times da)\Rightarrow \int _{D^i}Q^i(\cdot \mid s,a,\overline{\tau }_{t-1})\tau ^i_{t-1}(ds\times da)\). Therefore, we have \(\left( \eta ^i_t\right) _{S^i}=\int _{D^i}Q^i(\cdot \mid s,a,\overline{\tau }_{t-1})\tau ^i_{t-1}(ds\times da)\) for each \(t\ge 0\), hence \(\eta ^i\in \widetilde{\Theta }^i((\overline{\tau }))\), which implies that the graph of \(\widetilde{\Theta }^i\) is closed.
Next note, that if \(\pi ^i_\beta \) is an optimal deterministic Markov policy in the optimization problem of a player from population i maximizing his \(\beta \)-discounted reward when the behaviour of all the other players at each stage is described by the flow \((\overline{\tau })\), then for each \(t\ge 1\) and \(s\in S^i\) it satisfies
which implies that for any \(t\ge 1\),
Hence, measure flow \((\eta ^i)\) defined by (13) with \(\pi ^i:=\pi ^i_\beta \) is an element of \(\widetilde{\Psi }^i_\beta ((\overline{\tau }))\).
Next we prove that the graph of \(\widetilde{\Psi }^i_\beta \) is closed. Let \(\left\{ (\overline{\tau }^{(n)})\right\} _{n\ge 0}\) in \(\Xi \) and \(\left\{ (\eta ^{i,(n)})\right\} _{n\ge 0}\) in \(\Xi ^i\) be convergent sequences such that \((\eta ^{i,(n)})\in \widetilde{\Psi }^i_\beta ((\overline{\tau }^{(n)}))\) for each n. Moreover, let \((\eta ^{i,(n)})\Rightarrow (\eta ^i)\in \Xi ^i\) and \((\overline{\tau }^{(n)})\Rightarrow (\overline{\tau })\in \Xi \) as \(n\rightarrow \infty \). Since the graph of \(\widetilde{\Theta }^i\) is closed, showing that \(\widetilde{\Psi }^i_\beta \) has the same property only requires proving that the equalities defining \(\widetilde{\Psi }^i_\beta \) hold for \((\eta ^i)\) and \((\overline{\tau })\). Let us fix \(t\ge 1\). The definition of \(\widetilde{\Psi }^i_\beta \) implies that for each n
By the continuity of \(Q^i\) and \(V^{i,t}_{\beta ,(\overline{\tau }^{(n)})}\), and Theorem 3.3 in [54] (the assumption presented in Remark 3.4 (ii) there is true for \(g=L_tw\) by Lemma 5), \(\int _{S^i}V^{i,t}_{\beta ,(\overline{\tau }^{(n)})}(s')Q^i(ds'\mid \cdot ,\cdot ,\overline{\tau }^{(n)}_{t-1})\) converges continuously to \(\int _{S^i}V^{i,t}_{\beta ,(\overline{\tau }}(s')Q^i(ds'\mid \cdot ,\cdot ,\overline{\tau }_{t-1})\). As also \(r^i(\cdot ,\cdot ,\overline{\tau }^{(n)}_{t-1})\) converges continuously to \(r^i(\cdot ,\cdot ,\overline{\tau }_{t-1})\) by (A1’), using Theorem 3.3 in [54] once more (now with \(g=L_{t-1}w\) in Remark 3.4 (ii) there, again by by Lemma 5), we can pass to the limit in (14), obtaining
As t was arbitrary, this ends the proof that the graph of \(\widetilde{\Psi }^i_\beta \) is closed.
To finalize the proof, we define the correspondence from \(\Xi \) into itself:
What we have shown already implies that \(\widetilde{\Psi }_\beta \) has nonempty values and that its graph is closed. Convexity of values of \(\widetilde{\Psi }_\beta \) is obvious. As w is a moment function, each \(\Delta _w^{(t)}(D^i)\) is tight, hence, by Prohorov’s theorem it is compact. This implies that \(\Xi \) is compact in product topology. Therefore, by the Glickberg fixed point theorem [24], \(\widetilde{\Psi }_\beta \) has a fixed point. Suppose \((\overline{\tau ^*})\) is this fixed point. Disintegrating \(\tau ^{*i}_t\) gives for \(i=1,\ldots ,N\) and \(t=0,1,\ldots \) stochastic kernels \(\pi ^{*i}_t\) and measures \(\mu ^{*i}_t\) which (after similar modifications as in the proof of Theorem 1) correspond to the Markov strategies and global state flows in mean-field Markov equilibrium in our game. \(\square \)
5 The Existence of Stationary and Markov Mean Field Equilibria in Total Payoff Game
5.1 Assumptions
In this section, we address the problem of the existence of an equilibrium in the mean-field games with total payoff. In its main results we shall add new assumption (A4) or (A4”) to those defined in Sect. 4.1. Its formulation requires defining for \(i=1,\ldots ,N\), \(s\in S^i\), \(a\in A^i(s)\) and \(\overline{\tau }\in \Pi _{j=1}^N\Delta (D^j)\) the modified transition probabilities \(Q^i_*\):
The first new assumption will be used to prove the existence of a stationary mean-field equilibrium in total reward models.
- (A4):
-
For \(i=1,\ldots ,N\) and \(\overline{\tau }\in \Pi _{j=1}^N\Delta (D^j)\),
$$\begin{aligned} \lim _{T\rightarrow \infty }\sup _{\pi ^i\in \mathcal {M}^i}\left\| \sum _{t=T}^\infty \int _{S^i{\setminus }\{ s^*\}}w(s')\left( Q^i_*\right) ^t(ds'\mid s,\pi ^i,\overline{\tau })\right\| _w=0. \end{aligned}$$
In case of the results about the existence of Markov mean-field equilibria in the discounted case, two of the assumptions (A1’) and (A2’) refered to the discount factor \(\beta \), which does not exist in the total reward model. Hence, apart from the new assumption (A4”), new versions of these two assumptions will be necessary. For technical reasons, there are also some additional restrictions added.
- (A1”):
-
For \(i=1,\ldots ,N\), \(r^i\) is continuous and bounded above by some constant R on \(D^i\times \Pi _{i=1}^N\Delta (D^i)\). Moreover, there exist non-negative constants \(\alpha \), \(\gamma \), M satisfying \(\alpha \le \gamma \), \(\alpha \gamma <1\) and
$$\begin{aligned} \int _Sw(s)\mu _0^i(ds)\le M\quad \text{ for } i=1,\ldots ,N, \end{aligned}$$and such that for \(i=1,\ldots ,N\), \(s\in S^i\) and \(t=0,1,2,\ldots \),
$$\begin{aligned} \inf _{(a,\overline{\tau })\in A^i(s)\times \Pi _{i=1}^N\Delta _w^{(t)}(D^i)}r^i(s,a,\overline{\tau })\ge -R\gamma ^tw(s) \end{aligned}$$with \(\Delta _w^{(t)}(D^i)=\left\{ \tau ^i\in \Delta _w(D^i): \int _{D^i}w(s)\tau ^i(ds\times da)\le \alpha ^tM\right\} \).
- (A2”):
-
For \(i=1,\ldots ,N\) and any sequence \(\{ s_n,a_n,\overline{\tau }_n\}\subset D^i\times \Pi _{i=1}^N\Delta _w(D^i)\) such that \(s_n\rightarrow s_*\), \(a_n\rightarrow a_*\) and \(\overline{\tau }_n\Rightarrow \tau ^*\), \(Q^i(\cdot \mid s_n,a_n,\overline{\tau }_n)\rightarrow Q(\cdot \mid s_*,a_*,\tau ^*)\). Moreover,
-
(a)
for \(i=1,\ldots ,N\) the functions
$$\begin{aligned} \int _S w(s')Q^i(ds'\mid s,a,\overline{\tau }) \end{aligned}$$are continuous in \((s,a,\overline{\tau })\),
-
(b)
for \(i=1,\ldots ,N\) and \(s\in S^i\)
$$\begin{aligned} \sup _{(a,\overline{\tau })\in A^i(s)\times \Pi _{i=1}^N\Delta (D^i)}\int _Sw(s')Q^i(s'\mid s,a,\overline{\tau })\le \alpha w(s). \end{aligned}$$
- (A4”):
-
For \(i=1,\ldots ,N\),
$$\begin{aligned} \lim _{T\rightarrow \infty }\sup _{\begin{array}{c} \pi ^i\in \mathcal {M}^i,\\ (\overline{\tau })\in \Pi _{t=0}^\infty \Pi _{j=1}^N\Delta (D^j) \end{array}}\left\| \sum _{t=T}^\infty \int _{S^i{\setminus }\{ s^*\}}w(s')\alpha ^{-t}\left( Q^i_*\right) ^t(ds'\mid s,\pi ^i,(\overline{\tau }))\right\| _w=0. \end{aligned}$$
Here and in the sequel for \(i=1,\ldots ,N\), \(s\in S^i\), \(\pi ^i\in \mathcal {M}^i\) and \((\overline{\tau })=(\overline{\tau }_0,\overline{\tau }_1,\ldots )\in \Pi _{t=0}^\infty \Pi _{j=1}^N\Delta (D^j)\),
and for \(t\ge 2\):
For \(i=1,\ldots ,N\), \(s\in S^i\), \(\pi ^i\in \mathcal {M}^i\) and \(\overline{\tau }\in \Pi _{j=1}^N\Delta (D^j)\),
Remark 3
By assuming (A4) or (A4”), we build upon the framework of transient total-reward Markov decision processes introduced by Veinott [55] in the context of finite state and action spaces and generalized to Borel spaces in [22, 46]. The optimization problem faced by an individual from population i in our model of total reward mean-field game would for every fixed global state-action distribution \(\overline{\tau }\) be transient, if
which is clearly true under (A2) (b) and (A4). Roughly speaking, this would mean that for a reward function such that \(\left\| r^i(\cdot ,\cdot ,\overline{\tau })\right\| _w<\infty \), the total reward is finite for any Markov strategy applied. In (A4) we strengthen this assumption by requireing that the convergence of the total reward to its value is uniform across all Markov strategies with respect to the w-norm. (A4”) is an adjustement of this condition to the case when the decision-maker optimizes his behavior against a flow \((\overline{\tau })\).
5.2 Main Results
Theorem 6
Suppose that the assumptions (A1–A4) are satisfied. Then the multi-population discrete-time mean-field game with total payoff defined with \(r^i\), \(Q^i\), \(S^i\) and \(A^i\), \(i=1,\ldots ,N\), has a stationary mean-field equilibrium.
Let us start by noticing that total reward of a player from population i using a given strategy \(\pi \) when the behaviour of the others is constant over time and described by \(\overline{\tau }\) in the MDP model with transition probability \(Q^*_i\) is the same as the reward until reaching state \(s^*\) in the model with transition probability \(Q_i\). Let us next define for any \(i\in \{ 1,\ldots ,N\}\) and \(\overline{\tau }\in \Pi _{j=1}^N\Delta (D^j)\)
that is, the optimal value for the total-reward Markov decision process of player from population i when the behaviour of all the other players is described by the state-action measure \(\overline{\tau }\), fixed over time. Crucial properties of function \(V^i_{*\cdot }(\cdot )\) are given in lemma below.
Lemma 7
Under assumptions (A1–A4) for each \(i\in \{ 1,\ldots ,N\}\), \(V^i_{*\overline{\tau }}(s)\) is jointly continuous in \((s,\overline{\tau })\). Moreover, there exists a constant L such that \(\left\| V^i_{*\overline{\tau }}(\cdot )\right\| _w\le L\) for any \(\overline{\tau }\in \Pi _{j=1}^N\Delta _w(D^j)\).
Proof
Fix i and \(\overline{\tau }\), and note that under (A2) (b) and (A4),
is finite for \(i=1,\ldots ,N\). Hence, we immediately see that the total-reward Markov decision process defined with \(S^i\), \(A^i\), \(r^i\) and \(Q^i_*\) satisfies the assumptionsFootnote 8 of Theorem 12 in [22]. Therefore, with a help of this theorem and Proposition 1 in [22] we can define:
-
(a)
The function \(\zeta ^i:S^i\times \Pi _{j=1}^N\Delta _w(D^j)\rightarrow [0,\infty )\)
$$\begin{aligned} \zeta ^i(s,\overline{\tau }):=\sup _{\pi ^i\in \mathcal {M}^i}\left[ \sum _{t=0}^\infty \int _{S^i{\setminus }\{ s^*\}}w(s')\left( Q^i_*\right) ^t(ds'\mid s,\pi ^i,\overline{\tau })\right] \end{aligned}$$(15)satisfying for \(s\in S^i\) and \(\overline{\tau }\in \Pi _{j=1}^N\Delta _w(D^j)\), \(\zeta ^i(s,\overline{\tau })\in [w(s),L_iw(s)]\),
-
(b)
The discount factor \(\beta :=\max _{j=1,\ldots ,N}\frac{L_j-1}{L_j}\),
-
(c)
Modified one-stage rewards \(r^i_*\):
$$\begin{aligned} r^i_*(s,a,\overline{\tau }):=\frac{r^i(s,a,\overline{\tau })}{\zeta ^i(s,\overline{\tau })}, \end{aligned}$$ -
(d)
Modified transition probabilities \(Q^i_{**}\) given by
$$\begin{aligned}&Q^i_{**}(B\mid s,a,\overline{\tau }):=\\&\left\{ \begin{array}{ll} \frac{1}{\beta \zeta ^i(s,\overline{\tau })}\int _{B}\zeta ^i(s',\overline{\tau })Q^i_*(ds'\mid s,a,\overline{\tau }),&{} \text{ if } B\in \mathcal {B}(S^i{\setminus }\{ s^*\}),\\ &{} (s,a)\in D^i{\setminus }\{ (s^*,a^*)\}\\ 1-\frac{1}{\beta \zeta ^i(s,\overline{\tau })}\int _{B}\zeta ^i(s',\overline{\tau })Q^i_*(ds'\mid s,a,\overline{\tau }),&{} \text{ if } B=\{ s^*\}),\\ {} &{}(s,a)\in D^i{\setminus }\{ (s^*,a^*)\}\\ 1,&{} \text{ if } B=\{ s^*\}, (s,a)=(s^*,a^*) \end{array}\right. \end{aligned}$$
such that \(S^i\), \(A^i\), \(r^i_*\) and \(Q^i_{**}\) define a \(\beta \)-discounted Markov decision process with a value
for \(s\in S^i\), \(\overline{\tau }\in \Pi _{j=1}^N\Delta _w(D^j)\). Moreover, optimal stationary strategies exist and coincide in both MDPs.
Next note, that if \(\zeta ^i\) is a continuous function, then the model defined by \(S^i\), \(A^i\), \(r^i_*\) and \(Q^i_{**}\) satisfies assumptions (A1–A3) with function w replaced by \(w_*\equiv 1\) (in particular, (A2) follows from the fact that \(\zeta ^i(s,\cdot )\ge w(s)\) for \(s\in S^i\)). This implies that by Lemma 2, \(V^i_{\beta \overline{\tau }}(s)\) is continuous in \((s,\overline{\tau })\) and \(\left\| V^i_{\beta \overline{\tau }}(\cdot )\right\| \le \frac{R}{1-\beta }\). Hence, combining (16) with the fact that \(\zeta ^i\) is continuous and \(\zeta ^i(s,\cdot )\le L_iw(s)\) we obtain that \(V^i_{*\overline{\tau }}(s)\) is also continuous in \((s,\overline{\tau })\). Moreover, for any \(\overline{\tau }\in \Pi _{j=1}^N\Delta _w(D^j)\),
which proves the thesis of the lemma. Hence, all we need to do is to show that \(\zeta ^i\) is continuous.
To do that, we note that \(\zeta ^i\) is clearly the limit of the sequence of functions \(\left\{ w_n^{\overline{\tau }}\right\} _{n\ge 0}\), defined with the following recurrence: \(w_0^{\overline{\tau }}:=w\), \(w_n^{\overline{\tau }}:=T^i_{*\overline{\tau }}(w_{n-1}^{\overline{\tau }})\) for \(n=1,2,\ldots \), where for any \(\overline{\tau }\in \Pi _{j=1}^N\Delta _w(D^j)\),
We next show by indution that each \(w_n^{\overline{\tau }}(s)\) is continuous in \((s,\overline{\tau })\). For \(n=0\) the claim is true by the definition of w. Suppose it holds for \(n=k-1\). Then by Theorem 3.3 in [54] (the assumptions given in Remark 3.4 (ii) there are satisfied with \(g=L_iw\) because \(w_{k-1}^{\overline{\tau }}(\cdot )\le \zeta ^i(\cdot ,\overline{\tau })\le L_iw(\cdot )\)), \(w(s)+\int _{S^i{{\setminus }}\{ s^*\}}w_{k-1}^{\overline{\tau }}(s')Q^i_*(ds'\mid s,a,\overline{\tau })\) is jointly continuous in \((s,a,\overline{\tau })\), hence, by Proposition 7.32 in [12]
is also (jointly) continuous. Therefore, the claim is true for any \(n\ge 1\).
To finish the proof, let us take convergent sequences \(\{ s_k\}_{k\ge 1}\) in \(S^i\) and \(\{ \overline{\tau }_k\}_{k\ge 1}\) in \(\Pi _{j=1}^N\Delta _w(D^j)\) such that \(s_k\rightarrow s_*\) and \(\overline{\tau }_k\Rightarrow \overline{\tau }_*\). We will show that \(\zeta ^i(s_k,\overline{\tau }_k)\rightarrow \zeta ^i(s_*,\overline{\tau }_*)\). Since the set \(K:=\{ s_k: k\ge 1\}\cup \{ s_*\}\) is clearly compact, there exists a value W such that \(W\ge |w(s)|\) for \(s\in K\). Now, fix any \(\varepsilon >0\). By (A4) there exists an \(t^*\) such that
This immediately implies
and, for any \(k\ge 1\),
Finally, from the joint continuity of \(w_{t^*}^{\cdot }(\cdot )\), there exists a \(k_0\in \mathbb {N}\) such that for any \(k\ge k_0\)
Combining (17), (18) and (19), we obtain that for any \(k\ge k_0\)
which ends the proof that \(\zeta ^i(\cdot ,\cdot )\) is continuous. \(\square \)
Proof of Theorem 6
As in the case of the discounted reward, we define the correspondences from \(\Pi _{j=1}^N\Delta (D^j)\) to \(\Delta (D^i)\):
Using similar arguments to those employed in the proof of Theorem 1 (with a difference that instead of Lemma 2 we apply Lemma 7 when necessary), we can prove that \(\Psi ^i_*\) (we do not need to prove that for \(\Theta ^i\), as it is defined in exactly the same way as in the case of the \(\beta \)-discounted reward) that it has nonempty convex values and that its graph is closed. Then we define the correspondence
and show that it has a fixed point, which, again using similar arguments as in the proof of Theorem 1, can be proved to correspond to a stationary mean-field equilibrium in the total reward discrete-time mean field game considered in the theorem. \(\square \)
In the last result of this section we give conditions under which a Markov mean-field equilibrium exists in the total-reward game.
Theorem 8
Suppose that the assumptions (A1”), (A2”), (A3) and (A4”) are satisfied. Then for any \(\overline{\mu }_0\in \Pi _{j=1}^N\Delta _w(S^j)\) the multi-population discrete-time mean-field game with total payoff defined with \(r^i\), \(Q^i\), \(S^i\) and \(A^i\), \(i=1,\ldots ,N\), has a Markov mean-field equilibrium.
Proof
Let us fix \(\overline{\mu }_0\) and M satisfying (A1”). Recall the notation used in the proof of Theorem 4
Next, for any flow of measure-vectors \((\overline{\tau }):=(\overline{\tau }_0,\overline{\tau }_1,\ldots )\in \Xi \) and any \(i\in \{ 1,\ldots ,N\}\) let us define
that is, the optimal value at time \(t\ge 0\) for the total-reward Markov decision process of player from population i when the behaviour of all the other players at each stage is described by the flow \((\overline{\tau })\). Using the standard method of transforming a nonhomogeneous Markov decision process into a homogeneous one and Theorem 12 in [22]Footnote 9 we may show that \(V^i_{*(\overline{\tau })}(s)\) can be obtained from the optimal reward in the discounted Markov decision process with state space \(S^i\times \mathbb {N}\) and:
-
(a)
The function \(\widetilde{\zeta }^i:S^i\times \mathbb {N}\times \Xi \rightarrow [0,\infty )\)
$$\begin{aligned} \widetilde{\zeta }^i(s,t,(\overline{\tau })):=\sup _{\pi ^i\in \mathcal {M}^i}\left[ \sum _{k=0}^\infty \int _{S^i{\setminus }\{ s^*\}}w(s')\alpha ^{-k}\left( Q^i_*\right) ^k(ds'\mid s,\pi ^i,(\overline{\tau }))\right] \end{aligned}$$satisfying for \(s\in S^i\), \(t\ge 0\) and \((\overline{\tau })\in \Xi \), \(\widetilde{\zeta }^i(s,t,(\overline{\tau }))\in [\alpha ^{-t}w(s),\widetilde{L}_i\alpha ^{-t}w(s)]\) withFootnote 10
$$\begin{aligned} \widetilde{L}_i:=\sup _{(\overline{\tau })\in \Xi }\left\| \sum _{t=0}^\infty \int _{S^i{\setminus }\{ s^*\}}w(s')\alpha ^{-t}\left( Q^i_*\right) ^t(ds'\mid s,\pi ^i,(\overline{\tau }))\right\| _w \end{aligned}$$ -
(b)
The discount factor \(\widetilde{\beta }:=\max _{j=1,\ldots ,N}\frac{\widetilde{L}_j-1}{\widetilde{L}_j}\),
-
(c)
Modified one-stage rewards \(\widetilde{r}^i_*\):
$$\begin{aligned} \widetilde{r}^i_*(s,t,a,(\overline{\tau })):=\frac{r^i(s,a,\overline{\tau }_t)}{\widetilde{\zeta }^i(s,t,(\overline{\tau }))}, \end{aligned}$$ -
(d)
Modified transition probabilities \(\widetilde{Q}^i_{**}\) given by
-
(e)
Sets of feasible actions given by \(\widetilde{A}^i(\cdot ,t):=A^i(\cdot )\).
In fact, if \(V^i_{\widetilde{\beta }(\overline{\tau })}(s,t)\) denotes the optimal value in the modified (discounted) model,
for any \(s\in S^i\) and \(t\ge 0\). Moreover, optimal stationary strategies in the new model (which exist by Theorem 12 in [22]) correspond to optimal Markov strategies in the original one. Finally, repeating the arguments used in the proof of Lemma 7 (the assumptions (A1–A4) used there are satisfied with w(s) replaced by \(\widetilde{w}(s,t):=w(s)\alpha ^{-t}\), which clearly is a moment function on \(S\times \mathbb {N}\)) we can show that \(V^i_{\widetilde{\beta }\cdot }(\cdot ,t)\) and \(\widetilde{\zeta }^i(\cdot ,t,\cdot )\) are continuous and their \(\widetilde{w}\)-norms are bounded by \(\widetilde{L}:=\frac{R\max _{j=1,\ldots ,N}\widetilde{L}_j}{1-\widetilde{\beta }}\), which implies that for any \(s\in S^i\) and \(t\ge 0\), \(V^{i,t}_{*(\overline{\tau })}(s)\le \widetilde{L}\lambda ^tw(s)\).
Next we define the correspondence \(\widetilde{\Psi }^i_*\) from \(\Xi \) into \(\Xi ^i\), (\(i=1,\ldots ,N\)). In order to do it, we recall the definition of correspondence \(\widetilde{\Theta }^i\) introduced in the proof of Theorem 4
Now let
As we have shown in the proof of Theorem 4,Footnote 11 for each i, the correspondence \(\widetilde{\Theta }^i\) has non-empty values and closed graph. We next prove that for any fixed \(i\in \{ 1,\ldots ,N\}\), \(\widetilde{\Psi }^i_*\) has similar properties.
We start by noting, that if \(\pi ^i_*\) is an optimal deterministic Markov policy in the optimization problem of a player from population i maximizing his total reward when the behaviour of all the other players at each stage is described by the flow \((\overline{\tau })\), then for each \(t\ge 1\) and \(s\in S^i\) it satisfies
which implies that for any \(t\ge 1\),
Hence, measure flow \((\eta ^i)\) defined by (13) with \(\pi ^i:=\pi ^i_*\) is an element of \(\widetilde{\Psi }^i_*((\overline{\tau }))\).
In the penultimate part of the proof we show that the graph of \(\widetilde{\Psi }^i_*\) is closed. Let \(\left\{ (\overline{\tau }^{(n)})\right\} _{n\ge 0}\) in \(\Xi \) and \(\left\{ (\eta ^{i,(n)})\right\} _{n\ge 0}\) in \(\Xi ^i\) be convergent sequences such that \((\eta ^{i,(n)})\in \widetilde{\Psi }^i_*((\overline{\tau }^{(n)}))\) for each n. Moreover, let \((\eta ^{i,(n)})\Rightarrow (\eta ^i)\) and \((\overline{\tau }^{(n)})\Rightarrow (\overline{\tau })\) as \(n\rightarrow \infty \) for some \((\eta ^i)\in \Xi ^i\) and \((\overline{\tau })\in \Xi \). Since the graph of \(\widetilde{\Theta }^i\) is closed, showing that \(\widetilde{\Psi }^i_*\) has the same property only requires proving that the equalities defining \(\widetilde{\Psi }^i_*\) hold for \((\eta ^i)\) and \((\overline{\tau })\). Let us fix \(t\ge 1\). From the definition of \(\widetilde{\Psi }^i_*\) we know that for each n
Then by the continuity of \(Q^i_*\) and \(V^{i,t}_{*(\overline{\tau }^{(n)})}\) and Theorem 3.3 in [54] (see also Remark 3.4 (ii) there—by (A2”) (b) the assumption presented there is true for \(g=\widetilde{L}\alpha ^tw\)), \(\int _{S^i}V^{i,t}_{*(\overline{\tau }^{(n)})}(s')Q^i_*(ds'\mid \cdot ,\cdot ,\overline{\tau }^{(n)}_{t-1})\) converges continuously to \(\int _{S^i}V^{i,t}_{*(\overline{\tau }}(s')Q^i_*(ds'\mid \cdot ,\cdot ,\overline{\tau }_{t-1})\). As also \(r^i(\cdot ,\cdot ,\overline{\tau }^{(n)}_{t-1})\) converges continuously to \(r^i(\cdot ,\cdot ,\overline{\tau }_{t-1})\) by (A1”), using Theorem 3.3 in [54] again (again with \(g=\widetilde{L}\alpha ^tw\), which satisfies the assumption given in Remark 3.4 (ii) by (A1”) and (A2”) (b)), we can pass to the limit in (20), obtaining
As t was arbitrary, this ends the proof that the graph of \(\widetilde{\Psi }^i_*\) is closed.
The remainder of the proof is identical to the argument presented in the proof of Theorem 4: We define the correspondence from \(\Xi \) into itself:
By what we have shown, \(\widetilde{\Psi }_*\) has nonempty values and its graph is closed. Convexity of values of \(\widetilde{\Psi }_*\) is obvious. As we know, \(\Xi \) is compact in product topology. Therefore, Glickberg’s fixed point theorem [24] implies that \(\widetilde{\Psi }_*\) has a fixed point. Suppose \((\overline{\tau ^*})\) is this fixed point. Disintegrating \(\tau ^{*i}_t\) gives for \(i=1,\ldots ,N\) and \(t=0,1,\ldots \) stochastic kernels \(\pi ^{*i}_t\) and measures \(\mu ^{*i}_t\) which (after similar modifications as in the proof of Theorem 1) correspond to the Markov strategies and global state flows in mean-field Markov equilibrium in the total-reward game. \(\square \)
Remark 4
It should be noted here that Theorems 6 and 8 applied to a game with a single population extend the existing results about the existence of equilibria in single-population total-reward mean-field games. The only results of such type in the literature appear in [57] and concern model with finite state and action spaces. In [57] it was assumed that the probability of reaching \(s^*\) within some fixed number of stages is for any strategies used by the players bigger than some \(p_0>0\). Assumptions (A4) and (A4”) used here can be seen as counterparts of this assumption for our model with weight function w applied to the states. It is easy to see that in the finite state and action case (A4) reduces to the simpler assumption described above.
Remark 5
It is also worth noting that some total-reward mean-field game models that are not directly considered in the article can be treated as specific cases of our framework. Firstly, a total reward game without a replacement of dead players by new-born ones is a specific case with \(s^*\) being an absorbing state. For such a case the existence of Markov mean-field equilibrium is guaranteed by Theorem 8 without any modification of assumptions.
Another case that can be treated as an instance of our model is the total reward game with a finite horizon. In this case the application of our framework requires replacing individual state spaces of players of each population \(S^i\) with \(\left( (S^i{\setminus }\{ s^*\})\times \{ 0,\ldots ,T\}\right) \cup \{ s^*\}\) (with T denoting the time horizon). If the stochastic kernel \(Q^i\) denotes the transition probability for population i in the original model, the transitions in the modified one \(\widehat{Q}^i\) are defined completely with the following formulas:
for \(s\in S^i\), \(a\in A^i(s)\), \(t<T\) and \(B\in \mathcal {B}(S^i)\), with \(\overline{\tau }_S=(\tau ^1_{S^1},\ldots ,\tau ^N_{S^N})\) being a vector of marginals of measures \(\tau ^i\) on original individual state spaces \(S^i\), and
Then the single-stage rewards are defined independently of the time components of both the individual and the global state. The existence of Markov mean-field equilibria is then assured by Theorem 8 under assumptions (A1)–(A3) on original primitives of the model (which correspond to (A1”), (A2”) and (A3) for the modified one). (A4”) is satisfied automatically. A similar transformation allows for considering the non-stationary model with finite horizon as well as the case when the time horizon of each individual is a random variable with a finite expected value independent from the Markov chain of his individual states.
6 Concluding Remarks
In the paper we have presented a model of discrete-time mean-field game with several populations of players. Games of this type have not been studied in the literature in the discrete-time setting. The main results presented in this article are stationary and Markov mean-field equilibrium existence theorems for two payoff criteria: \(\beta \)-discounted payoff and total payoff proved under some rather general assumptions on one-step reward functions and individual transition kernels of the players. It is also worth noting that the games with total payoff have only been studied in finite state space case, hence, the results presented here also extend those for total-payoff mean-field games with a single population. The article is the first of two papers on multiple-population discrete-time mean-field games with discounted or total payoff. In the second one we provide theorems showing that under some additional assumptions equilibria obtained in the mean-field models are approximate equilibria in their n-person counterparts when n is large enough. We also plan further research on the topic of discrete-time mean-field games with multiple populations of players which will concentrate on games with long-run average reward.
Data Availibility
No data associated in the manuscript.
Notes
As it can be clearly seen, the model encompasses in particular the situation when the state space for each population is the same and equal to S.
Here and in the sequel, for any set X, \(\Delta (X)\) denotes the set of probability distributions over the \(\sigma \)-algebra of Borel subsets of X, \(\mathcal {B}(X)\).
The lack of introduction of history-dependent strategies is only caused by the will to avoid additional complexity in the notation (which is rather complicated as it is). It should be noted though that, as in the case of discounted- and total-reward stochastic games, the Markov and stationary mean-field equilibria whose existence we prove later are in fact equilibria in the class of all strategies. This is a consequence of the fact that at a Markov (or stationary) equilibrium each agent faces the problem of maximizing a reward in a discounted (or total) reward Markov decision process which always admits a maximum for a Markov (stationary if the problem is time-homogeneous) policy.
Here we replace the superscript \(\alpha \) used to define the measure \(\mathbb {P}^{s,\overline{\mu _0},\overline{Q},\pi ^\alpha ,\overline{\sigma }}\) by i, as the situation is symmetric within the population.
The assumptions that we will impose later upon the model will in fact imply that the series defining the reward will be summable for any vector of strategies of the players.
We sometimes also use the notation \(B_w(S^i)\), \(C_w(S^i)\) for analogous sets of functions defined on these smaller domains.
The measure \(\mathbb {P}^{s,\overline{Q},f^i}\) is defined here similarly as in the case of discounted rewards.
With weight function \(V=w\), \(K=L_i\) and \(\overline{c}=R\).
Here we apply this theorem for weight function \(V(s,t):=w(s)\alpha ^{-t}\), \(K:=\widetilde{L}_i\) and \(\overline{c}=R\).
Such a value will exist by (A2”) (b) and (A4”).
The assumptions (A1’) and (A2’) used there are in fact slightly stronger than (A1”) and (A2”) assumed here, but the proof remains valid in the present case.
References
Achdou Y, Bardi M, Cirant M (2017) Mean field games models of segregation. Math Models Methods Appl Sci 27(1):75–113
Achdou Y, Kobeissi Z (2021) Mean field games of controls: finite difference approximations. Math Eng 3(3):1–35
Adlakha S, Johari R (2013) Mean field equilibrium in dynamic games with strategic complementarities. Oper Res 61(4):971–989
Aliprantis CD, Border KC (1999) Infinite dimensional analysis. A Hitchhiker’s guide. Springer, Berlin
Anahtarci B, Kariksiz CD, Saldi N (2020) Value iteration algorithm for mean-field games. Syst Control Lett 143:104744
Anahtarci B, Kariksiz CD, Saldi N (2022) Q-learning in regularized mean-field games. Dyn Games Appl. https://doi.org/10.1007/s13235-022-00450-2
Bardi M, Cirant M (2018) Uniqueness of solutions in mean field games with several populations and Neumann conditions. In: Cardaliaguet P, Porretta A, Salvarani F (eds) PDE models for multi-agent phenomena. Springer INdAM series, vol 28. Springer, Cham
Bensoussan A, Frehse J, Yam P (2013) Mean field games and mean field type control theory. Springer, New York
Bensoussan A, Huang T, Laurière M (2018) Mean field control and mean field game models with several populations. Minimax Theory Appl 3(2):173–209
Bergin J, Bernhardt D (1992) Anonymous sequential games with aggregate uncertainty. J Math Econ 21:543–562
Bergin J, Bernhardt D (1995) Anonymous sequential games: existence and characterization of equilibria. Econ Theory 5(3):461–89
Bertsekas DP, Shreve SE (1978) Stochastic optimal control: the discrete time case. Academic Press, New York
Billingsley P (1999) Convergence of probability measures, 2nd edn. Wiley, New York
Bonnans JF, Hadikhanloo S, Pfeiffer L (2019) Schauder estimates for a class of potential mean field games of controls. Appl Math Optim 83:1431–1464
Cardaliaguet P, Lehalle C-A (2018) Mean field game of controls and an application to trade crowding. Math Financ Econ 12(3):335–363
Carmona R, Delarue F (2018) Probabilistic theory of mean field games with applications. Springer, Berlin
Carmona R, Lacker D (2015) A probabilistic weak formulation of mean field games and applications. Ann Appl Probab 25(3):1189–1231
Chakrabarti SK (2003) Pure strategy Markov equilibrium in stochastic games with a continuum of players. J Math Econ 39(7):693–724
Cirant M (2015) Multi-population mean field games systems with Neumann boundary conditions. J Math Pures Appl 103(5):1294–1315
Cirant M, Verzini G (2017) Bifurcation and segregation in quadratic two-populations mean field games systems. ESAIM Control Optim Calc Var 23(3):1145–1177
Elliot R, Li X, Ni Y (2013) Discrete time mean-field stochastic linear-quadratic optimal control problems. Automatica 49:3222–3233
Feinberg EA, Huang J (2019) On the reduction of total-cost and average-cost MDPs to discounted MDPs. Naval Res Logist 66(1):38–56
Feleqi E (2013) The derivation of ergodic mean field game equations for several populations of players. Dyn Games Appl 3(4):523–536
Glicksberg IL (1952) A further generalization of the Kakutani fixed point theorem with application to Nash equilibrium points. Proc Am Math Soc 3:170–174
Gomes DA, Mohr J, Souza RR (2010) Discrete time, finite state space mean field games. J Math Pures Appl 93(3):308–328
Gomes DA, Patrizi S, Voskanyan V (2014) On the existence of classical solutions for stationary extended mean field games. Nonlinear Anal Theory Methods Appl 99:49–79
Gomes DA, Saúde J (2014) Mean field games models—a brief survey. Dyn Games Appl 4(2):110–154
Gomes DA, Voskanyan VK (2016) Extended deterministic mean-field games. SIAM J Control Optim 54(2):1030–1055
Green E (1980) Noncooperative price taking in large dynamic markets. J Econ Theory 22:155–181
Green E (1984) Continuum and finite-player noncooperative models of competition. Econometrica 52:975–993
Hernández-Lerma O, Lasserre JB (1995) Invariant probabilities for Feller–Markov chains. J Appl Math Stoch Anal 8(4):341–345
Hernández-Lerma O, Lasserre JB (1996) Discrete-time Markov control processes: basic optimality criteria. Springer, Berlin
Hernández-Lerma O, Lasserre JB (1999) Further topics on discrete-time Markov control processes. Springer, Berlin
Hinderer K (1970) Foundations of non-stationary dynamic programming with discrete-time parameter. Lecture Notes Opr. Res. Math. Syst., vol 33. Springer, Berlin
Housman D (1988) Infinite player noncooperative games and the continuity of the Nash equilibrium correspondence. Math Oper Res 13:488–496
Huang M, Caines PE, Malhamé RP (2006) Large population stochastic dynamic games: closed-loop McKean–Vlasov systems and the Nash certainty equivalence principle. Commun Inf Syst 6(3):221–251
Huang M, Caines PE, Malhamé RP (2007) Large-population cost-coupled LQG problems with nonuniform agents: individual-mass behavior and decentralized \(\varepsilon \)-Nash equilibria. IEEE Trans Autom Control 52(9):1560–1571
Huang M, Caines PE, Malhamé RP (2007) An Invariance principle in large population stochastic dynamic games. J Syst Sci Complex 20(2):162–172
Jovanovic B, Rosenthal RW (1988) Anonymous sequential games. J Math Econ 17:77–87
Kobeissi Z (2022) On classical solutions to the mean field game system of controls. Commun Partial Differ Equ 47(3):453–488
Lasry J-M, Lions P-L (2007) Large investor trading impacts on volatility. Ann Inst H Poincaré Anal Non Linéaire 24(2):311–323
Lasry J-M, Lions P-L (2007) Mean field games. Jpn J Math 2(1):229–260
Laurière M, Tangpi L (2022) Convergence of large population games to mean field games with interaction through the controls. SIAM J Math Anal 54(3):3535–3574
Morandotti M, Solombrino F (2020) Mean-field analysis of multipopulation dynamics with label switching. SIAM J Math Anal 52(2):1427–1462
Parthasarathy KR (1967) Probability measures on metric spaces. AMS Bookstore
Pliska SR (1978) On the transient case for Markov decision chains with general state spaces. In: Puterman ML (ed) Dynamic programming and its applications. Academic Press, New York, pp 335–349
Sabourian H (1990) Anonymous repeated games with a large number of players and random outcomes. J Econ Theory 51(1):92–110
Saldi N (2020) Discrete-time average-cost mean-field games on Polish spaces. Turk J Math 44:463–480
Saldi N, Başar T, Raginsky M (2018) Markov–Nash equilibria in mean-field games with discounted cost. SIAM J Control Optim 56(6):4256–4287
Saldi N, Başar T, Raginsky M (2019) Approximate Nash equilibria in partially observed stochastic games with mean-field interactions. Math Oper Res 44(3):1006–1033
Saldi N, Başar T, Raginsky M (2020) Approximate Markov–Nash equilibria for discrete-time risk-sensitive mean-field games. Math Oper Res 45(4):1596–1620
Saldi N, Başar T, Raginsky M (2022) Partially observed discrete-time risk-sensitive mean field games. Dyn Games Appl. https://doi.org/10.1007/s13235-022-00453-z
Schmeidler D (1973) Equilibrium points of nonatomic games. J Stat Phys 17:295–300
Serfozo R (1982) Convergence of Lebesgue integrals with varying measures. Sankhya Indian J Stat Ser A 44(3):380–402
Veinott AF (1969) Discrete dynamic programming with sensitive discount optimality criteria. Ann Math Stat 40:1635–1660
Wardrop JG (1952) Some theoretical aspects of road traffic research. Proc Inst Civ Eng Part 2:325–378
Więcek P, Altman E (2015) Stationary anonymous sequential games with undiscounted rewards. J Optim Theory Appl 166(2):686–710
Więcek P (2020) Discrete-time ergodic mean field games with average reward on compact spaces. Dyn Games Appl 10:222–256
Funding
This work was supported by the NCN Grant No. 2016/23/B/ST1/00425.
Author information
Authors and Affiliations
Contributions
Not applicable.
Corresponding author
Ethics declarations
Conflict of interest
The author declares that he has no conflict of interest.
Ethical Approval
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Więcek, P. Multiple-Population Discrete-Time Mean Field Games with Discounted and Total Payoffs: The Existence of Equilibria. Dyn Games Appl (2024). https://doi.org/10.1007/s13235-024-00567-6
Accepted:
Published:
DOI: https://doi.org/10.1007/s13235-024-00567-6
Keywords
- Mean-field game
- Discrete time
- Multiple-population game
- Stationary mean-field equilibrium
- Markov mean-field equilibrium
- Discounted payoff
- Total payoff