Multiple-Population Discrete-Time Mean Field Games with Discounted and Total Payoffs: Approximation of Games with Finite Populations

Więcek, Piotr

doi:10.1007/s13235-024-00570-x

Multiple-Population Discrete-Time Mean Field Games with Discounted and Total Payoffs: Approximation of Games with Finite Populations

Open access
Published: 15 June 2024

(2024)
Cite this article

Download PDF

You have full access to this open access article

Dynamic Games and Applications Aims and scope Submit manuscript

Multiple-Population Discrete-Time Mean Field Games with Discounted and Total Payoffs: Approximation of Games with Finite Populations

Download PDF

Piotr Więcek¹

141 Accesses
Explore all metrics

Abstract

In the paper we present a model of discrete-time mean-field game with multiple populations of players. Its main result shows that the equilibria obtained for the mean-field limit are approximate Markov–Nash equilibria for n-person counterparts of these mean-field games when the number of players in each population is large enough. We consider two payoff criteria: $\beta $-discounted payoff and total payoff. The existence of mean-field equilibria for games with both payoffs has been proven in our previous article, hence, the theorems presented here show in fact the existence of approximate equilibria in certain classes of stochastic games with large finite numbers of players. The results are provided under some rather general assumptions on one-step reward functions and individual transition kernels of the players. In addition, the results for total payoff case, when applied to a single population, extend the theory of mean-field games also by relaxing some strong assumptions used in the existing literature.

Multiple-Population Discrete-Time Mean Field Games with Discounted and Total Payoffs: The Existence of Equilibria

Article Open access 27 May 2024

Total Reward Semi-Markov Mean-Field Games with Complementarity Properties

Article Open access 14 May 2016

Stationary Equilibria of Mean Field Games with Finite State and Action Space

Article 08 January 2020

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The paper is the continuation of article [18], where we have presented a model of discrete-time mean-field game with multiple populations of players. In the paper we have provided the results about the conditions guaranteing existence of Markov or stationary equilibria in such games for two payoff criteria: $\beta $-discounted payoff and total payoff. These theorems were the first to deal with the problem of the existence of equilibrium in mean-field games with several populations in the discrete time setting, extending also some of the theory for single population discrete-time mean-field games. As mean-field games are only meant to serve as an approximation of real-life situations, where the populations of agents are large, but finite, it is crucial to complete the equilibrium-existence results for the class of games under investigation by the results showing that the equilibria obtained for the mean-field limit are approximate Markov–Nash equilibria for n-person counterparts of these mean-field games when the number of players in each population is large enough. This type of theorems have been provided for the single population case for different variants of the discrete-time mean-field game model in [10,11,12,13,14, 16, 17]. In this article, we build upon the theory presented in [11] to show that also in our case the Markov (or stationary) mean-field equilibria obtained in [18] are approximate equilibria in finite-player counterparts of the mean-field game when the number of players in each population goes to infinity.

The organization of the paper is as follows: In Sect. 2 we present the model of the discrete-time mean-field games with several populations of the players and its counterparts with finite number of players. In Sect. 3 we introduce some notation used in the remainder of the article. In Sect. 4 we give all the assumptions used in our theorems. Sections 5 contains all the main results of the article. We end with some concluding remarks in Sect. 6.

2 The Model

2.1 Multi-population Mean-Field Game Model

A multi-population discrete-time mean-field game is described by the following objects:

We assume that the game is played in discrete time, that is $t\in \{ 1,2,\ldots \}$.
The game is played by an infinite number (continuum) of players divided into N populations. Each player has a private state s, changing over time. We assume that the set of individual states $S^i$ is the same for each player in population i ($i=1,\ldots ,N$), and that it is a nonempty closed subset of a locally compact Polish space S.^{Footnote 1}
A vector $\overline{\mu }=(\mu ^1,\ldots ,\mu ^N)\in \Pi _{i=1}^N\Delta (S^i)$ of N probability distributions over Borel sets^{Footnote 2} of $S^i$, $i=1,\ldots ,N$, is called a global state of the game. Its i-th component describes the proportion of i-th population, which is in each of the individual states.

We assume that at every stage of the game each player knows both his private state and the global state, and that his knowledge about individual states of his opponents is limited to the global state.
The set of actions available to a player from population i in state $(s,\overline{\mu })$ is given by $A^i(s)$, with $A:=\bigcup _{i\in \{ 1,\ldots ,N\}, s\in S^i}A^i(s)$ – a compact metric space. For any i, $A^i(\cdot )$ is a non-empty compact valued correspondence such that
$$\begin{aligned} D^i:=\{ (s,a)\in S^i\times A: a\in A^i(s)\} \end{aligned}$$
is a measurable set. Note that we assume that the sets of actions available to a player only depend on his private state and not on the global state of the game.
The global distribution of the state-action pairs is denoted by $\overline{\tau }=(\tau ^1,\ldots ,\tau ^N)\in \Pi _{i=1}^N\Delta (D^i)$. Again, it gives the distributions of state-action pairs within the population divided into subpopulations $i=1,\ldots ,N$.
Immediate reward of an individual from population i is given by a measurable function $r^i:D^i\times \Pi _{i=1}^N\Delta (D^i)\rightarrow \mathbb {R}$. $r^i(s,a,\overline{\tau })$ gives the reward of a player at any stage of the game when his private state is s, his action is a and the distribution of state-action pairs among the entire player population is $\overline{\tau }$.
Transitions are defined for each individual separately with stochastic kernels $Q^i:D^i\times \Pi _{i=1}^N\Delta (D^i)\rightarrow \Delta (S^i)$ denoting transition probability for players from i-th population. $Q^i(B\mid \cdot ,\cdot ,\overline{\tau })$ is product-measurable for any $B\in \mathcal {B}(S^i)$, any $\overline{\tau }\in \Pi _{i=1}^N\Delta (D^i)$ and $i\in \{ 1,\ldots ,N\}$.
The global state at time $t+1$, $\overline{\mu _t}$, is given by the aggregation of individual transitions of the players done by the formula
$$\begin{aligned} \mu _{t+1}^{i}(\cdot )=\Phi ^i(\cdot \mid \overline{\tau _t}):=\int _{D^i}Q^i(\cdot \mid s,a,\overline{\tau _t})\tau _t^i(ds\times da). \end{aligned}$$
As it can be clearly seen, the transition of the global state is deterministic.

A sequence $\pi ^i=\{\pi _t^i\}_{t=0}^\infty $ of functions $\pi _t^i:S^i\rightarrow \Delta (A)$, such that $\pi _t^i(B\mid \cdot )$ is measurable for any $B\in \mathcal {B}(A)$ and any t, satisfying $\pi _t^i(A^i(s)\mid s )=1$ for every $s\in S^i$ and every t, is called a Markov strategy for a player of population i. A function $f^i:S^i \rightarrow \Delta (A)$, such that $f^i(B\mid \cdot )$ is measurable for any $B\in \mathcal {B}(A)$, satisfying $f^i(A^i(s)\mid s )=1$ for every $s\in S^i$ is called a stationary strategy. The set of all Markov strategies for players from i-th population is denoted by $\mathcal {M}^i$ while that of stationary strategies by $\mathcal {F}^i$. As in MDPs, stationary strategies can be seen as a specific case of Markov strategies that do not depend on t. In the paper we never consider general (history-dependent) strategies.

Next, let $\Pi ^i_t(\pi ^i,\mu ^i)$ denote the state-action distribution of the i-th population players at time t in the mean-field game corresponding to the distribution of individual states in population i, ${\mu }^i$ and a Markov strategy for players of population i, $\pi ^i\in \mathcal {M}^i$, that is

$$\begin{aligned} \Pi _t^i(\pi ^i,{\mu }^i)(B):=\int _B \pi _t^i(da\mid s)\mu ^i(ds)\quad \text{ for } B\in \mathcal {B}(D^i). \end{aligned}$$

The vector $(\Pi _t^1(\pi ^1,{\mu }^1),\ldots ,\Pi _t^N(\pi ^N,{\mu }^N))$ will be denoted by $\overline{\Pi }_t(\overline{\pi },\overline{\mu })$. When we use this notation for stationary strategies, we skip the subscript t.

Given the evolution of the global state, which depends on the strategies of the players in a deterministic manner, we can define the individual history of a player $\alpha $ (from any given population i) as the sequence of his consecutive individual states and actions $h=(s^\alpha _0,a^\alpha _0,s^\alpha _1,a^\alpha _1,\ldots )$. By the Ionescu-Tulcea theorem (see Chap. 7 in [2]), for any Markov strategies $\pi ^\alpha $ of player $\alpha $ and $\sigma ^1,\ldots ,\sigma ^N$ of other players (including all other players of the same population), any initial global state $\overline{\mu _0}$ and any initial private state of player $\alpha $, s, there exists a unique probability measure $\mathbb {P}^{s,\overline{\mu _0},\overline{Q},\pi ^\alpha ,\overline{\sigma }}$ on the set of all infinite individual histories of the game $H=(D^i)^\infty $ endowed with Borel $\sigma $-algebra, such that for any $B\in \mathcal {B}(S^i)$, $E\in \mathcal {B}(A)$ and any partial history $h^\alpha _t=(s^\alpha _0,a^\alpha _0,\ldots ,s^\alpha _{t-1},a^\alpha _{t-1},s^\alpha _t)\in (D^i)^t\times S^i=:H_t$, $t\in \mathbb {N}$,

$$\begin{aligned}{} & {} \mathbb {P}^{s,\overline{\mu _0},\overline{Q},\pi ^\alpha ,\overline{\sigma }}(h\in H: s^\alpha _0=s)=1, \end{aligned}$$

(1)

$$\begin{aligned}{} & {} \mathbb {P}^{s,\overline{\mu _0},\overline{Q},\pi ^\alpha ,\overline{\sigma }}(h\in H: a^\alpha _t\in E\mid h^\alpha _t)=\pi _t^\alpha (E\mid s^\alpha _t), \nonumber \\{} & {} \mathbb {P}^{s,\overline{\mu _0},\overline{Q},\pi ^\alpha ,\overline{\sigma }}(h\in H: s^\alpha _{t+1}\in B\mid (h^\alpha _t,a^\alpha _t))=Q^i(B\mid s^\alpha _t,a^\alpha _t,\overline{\tau _t}), \end{aligned}$$

(2)

with state-action distributions defined by $\tau ^j_0=\Pi ^j_0(\sigma ^j,\mu ^j_0)$, $\tau ^j_{t+1}=\Pi ^i_t(\sigma ^j,\Phi ^j(\cdot \mid \overline{\tau _t}))$ for $t=1,2,\ldots $ and $j=1,\ldots ,N$.

For $\beta \in (0,1)$, the $\beta $-discounted reward^{Footnote 3} for a player $\alpha $ from population i using policy $\pi ^i\in \mathcal {M}^i$ when other players use policies $\sigma ^j\in \mathcal {M}^j$ (depending on the population j they belong to) and the initial global state is $\overline{\mu _0}$, with the initial individual state of player $\alpha $ being $s^i_0$ is defined as follows:

$$\begin{aligned} J^i_\beta (s^i_0,\overline{\mu _0},\pi ^i,\overline{\sigma })=\mathbb {E}^{s^i_0,\overline{\mu _0},\overline{Q},\pi ^i,\overline{\sigma }}\sum _{t=0}^\infty \beta ^tr^i(s_t^i,a_t^i,\overline{\tau _t}), \end{aligned}$$

where $\tau ^j_0=\Pi _0^j(\sigma ^j,\mu ^j_0)$, $\tau ^j_{t+1}=\Pi _t^j(\sigma ^j,\Phi ^j(\cdot \mid \overline{\tau _t}))$ for $t=1,2,\ldots $ and $j=1,\ldots ,N$.

To define the total reward in our game let us distinguish one state in S, say $s^*$, isolated from $S\setminus \{ s^*\}$ and assume that $A^i(s^*)=\{ a^*\}$ independently of $i\in \{ 1,\ldots ,N\}$ for some fixed $a^*$ isolated from $A\setminus \{ a^*\}$. Moreover, let us assume that $s^*\in S^i$ for $i=1,\ldots ,N$. Then the total reward of a player from population i using policy $\pi ^i\in \mathcal {M}^i$ when other players apply policies $\overline{\sigma }=(\sigma ^1,\ldots ,\sigma ^N)$ and the initial global state is $\overline{\mu _0}$, with the initial individual state of player $\alpha $ being $s_0^i$, is defined in the following way:

$$\begin{aligned} J^i_*(s^i_0,\overline{\mu _0},\pi ^i,\overline{\sigma })=\mathbb {E}^{s^i_0,\overline{\mu _0},\overline{Q},\pi ^i,\overline{\sigma }}\sum _{t=0}^{\mathcal {T}^i-1}r^i(s_t^i,a_t^i,\overline{\tau _t}), \end{aligned}$$

where $\tau ^j_0=\Pi _0^j(\sigma ^j,\mu ^j_0)$, $\tau ^j_{t+1}=\Pi _t^j(\sigma ^j,\Phi ^j(\cdot \mid \overline{\tau _t}))$ for $t=1,2,\ldots $ and $j=1,\ldots ,N$, while $\mathcal {T}^i$ is the moment of the first arrival of the process $\{ s_t^i\}$ to $s^*$. The total reward is interpreted as the reward accumulated by the player over the whole of his lifetime. State $s^*$ is an artificial state (so is action $a^*$) denoting that a player is dead. $\overline{\mu _0}$ corresponds to the distribution of the states across the population when he is born, while $s^i_0$ is his own state when he is born. The fact that after some time the state of a player can become again different from $s^*$ should be interpreted as that after some time the player is replaced by some new-born one.

Finally we define the solutions we will be looking for:

Definition 1

Stationary strategies $f^1\in \mathcal {F}^1,\ldots f^N\in \mathcal {F}^N$ and a global state $\overline{\mu }\in \Pi _{i=1}^N\Delta (S^i)$ form a stationary mean-field equilibrium in the $\beta $-discounted reward game if for any i, $s^i_0\in S^i$, and every other stationary strategy of a player from population i, $g^i\in \mathcal {F}^i$

$$\begin{aligned} J^i_\beta (s^i_0,\overline{\mu },f^i,\overline{f})\ge J^i_\beta (s^i_0,\overline{\mu },g^i,\overline{f}) \end{aligned}$$

and if $\overline{\mu }_0=\overline{\mu }$, then $\overline{\mu }_t=\overline{\mu }$ for every $t\ge 1$ if strategies $f^1,\ldots ,f^N$ are used by all the players.

Markov strategies $\pi ^1\in \mathcal {M}^1,\ldots \pi ^N\in \mathcal {M}^N$ and a global state flow $(\overline{\mu }_0^*,\overline{\mu }_1^*,\ldots )\in (\Pi _{i=1}^N\Delta (S^i))^\infty $ form a Markov mean-field equilibrium in the $\beta $-discounted reward game if for any i, $s^i_0\in S^i$, and every other Markov strategy of a player from population i, $\sigma ^i\in \mathcal {M}^i$

$$\begin{aligned} J^i_\beta (s_0^i,\overline{\mu _0},\pi ^i,\overline{\pi })\ge J^i_\beta (s_0^i,\overline{\mu _0},\sigma ^i,\overline{\pi }) \end{aligned}$$

and if $\overline{\mu }_0=\overline{\mu }^*_0$ implies $\overline{\mu }_t=\overline{\mu }^*_t$ for every $t\ge 1$ if strategies $\pi ^1,\ldots ,\pi ^N$ are used by all the players.

Similarly,

Definition 2

Stationary strategies $f^1\in \mathcal {F}^1,\ldots f^N\in \mathcal {F}^N$ and a global state $\overline{\mu }\in \Pi _{i=1}^N\Delta (S^i)$ form a stationary mean-field equilibrium in the total reward game if for any i, $s_i^0\in S^i$, and every other stationary strategy of a player from population i, $g^i\in \mathcal {F}^i$

$$\begin{aligned} J^i_*(s^i_0,\overline{\mu },f^i,\overline{f})\ge J^i_*(s^i_0,\overline{\mu },g^i,\overline{f}). \end{aligned}$$

Moreover, if $\overline{\mu }_0=\overline{\mu }$, then $\overline{\mu }_t=\overline{\mu }$ for every $t\ge 1$ if strategies $f^1,\ldots ,f^N$ are used by all the players.

Markov strategies $\pi ^1\in \mathcal {M}^1,\ldots \pi ^N\in \mathcal {M}^N$ and a global state flow $(\overline{\mu }_0^*,\overline{\mu }_1^*,\ldots )\in (\Pi _{i=1}^N\Delta (S^i))^\infty $ form a Markov mean-field equilibrium in the total reward game if for any i, t, $s_i^t\in S^i$ and every other Markov strategy of a player from population i, $\sigma ^i\in \mathcal {M}^i$,

$$\begin{aligned} J^i_*(s^i_t,\overline{\mu }_t^*,{}^t\pi ^i,{}^t\overline{\pi })\ge J^i_*(s^i_t,\overline{\mu }_t^*,{}^t\sigma ^i,{}^t\overline{\pi }), \end{aligned}$$

with ${}^ta$ denoting for any infinite vector $a=(a_0,a_1,\ldots )$, the vector $(a_t,a_{t+1},\ldots )$. Moreover, if $\overline{\mu }_0=\overline{\mu }^*_0$, then $\overline{\mu }_t=\overline{\mu }^*_t$ for every $t\ge 1$ if strategies $\pi ^1,\ldots ,\pi ^N$ are used by all the players.

2.2 n-Person Counterparts of a Mean-Field Game

The n-person games that will be approximated by our model are discrete-time n-person stochastic games as defined in [6]. Below we define n-person stochastic counterparts of the mean-field game for the multi-population case.

There are n players in the game belonging to N populations. The number of players in population i is denoted by $n_i$, with $\sum _{i=1}^Nn_i=n$. Hence, the state space is $\Pi _{i=1}^N(S^i)^{n_i}$ while an arbitrary state in the game can be denoted by $\overline{s}=(\overline{s}^1,\ldots ,\overline{s}^N)$ with $\overline{s}^i=(s_1^i,\ldots ,s_{n_i}^i)$ for $i=1,\ldots ,N$. We shall also use the notation $\textbf{n}:=(n_1,\ldots ,n_N)$ with $\textbf{n}\rightarrow \infty $ standing for $n_i\rightarrow \infty $ for $i=1,\ldots ,N$. Similarly as in the case of the mean-field game, the set of actions available to the kth player in population i in state $\overline{s}$ is given by $A^i\left( s_k^i\right) $. An arbitrary action of the kth player in population i will be denoted by $a_k^i$ and an arbitrary profile of actions of all the players by $\overline{a}=(\overline{a}^1,\ldots ,\overline{a}^N)$ with $\overline{a}^i=(a_1^i,\ldots ,a_{n_i}^i)$ for $i=1,\ldots ,N$.
We assume that for each i the initial values in the vector $\overline{s}^i$ are i.i.d. vectors coming from an arbitrary known distribution $\mu _0^{*i}$. To simplify the notation we will write that the vector of initial distributions of states for each player is $\overline{\mu }_0^*$ or simply that $\overline{s_0}\sim \overline{\mu }_0^*$ to denote that.
Empirical state-action distribution in the game is defined as
$$\begin{aligned} \overline{\tau }(\overline{s},\overline{a})=(\tau ^1(\overline{s},\overline{a}),\ldots ,\tau ^N(\overline{s},\overline{a})) \end{aligned}$$
with $\tau ^i(\overline{s},\overline{a})=\frac{1}{n_i}\sum _{k=1}^{n_i}\delta _{(s_k^i,a_k^i)}$, $i=1,\ldots ,N$.
Individual immediate reward of kth player from population i, $r^{i,k}_n:\Pi _{j=1}^N(D^i)^{n_i}\rightarrow \mathbb {R}$, $i=1,\ldots ,N$, $k=1,\ldots ,n_i$ is defined for any profile of players’ states $\overline{s}$ and any profile of players’ actions $\overline{a}$ by
$$\begin{aligned} r^{i,k}_n(\overline{s},\overline{a}):=r^i\left( s_k^i,a_k^i,\overline{\tau }(\overline{s},\overline{a})\right) . \end{aligned}$$
The transition probability $Q_n:\Pi _{j=1}^N(D^i)^{n_i}\rightarrow \Delta (S^n)$ can be defined for any $\overline{s}$ and $\overline{a}$ by the formula (for the clarity of exposition we write it only for Borel rectangles, which obviously defines the product measure):
$$\begin{aligned}&Q_n(B_1^1\times \ldots \times B_{n_1}^1\ldots \times B_1^N\times \ldots \times B_{n_N}^N\mid \overline{s},\overline{a})\\&\quad :=Q^1\left( B_1^1\mid s_1^1,a_1^1,\overline{\tau }(\overline{s},\overline{a})\right) \ldots Q^1\left( B_{n_1}^1\mid s_{n_1}^1,a_{n_1}^1,\overline{\tau }(\overline{s},\overline{a})\right) \\&\qquad \ldots Q^N\left( B_1^N\mid s_1^N,a_1^N,\overline{\tau }(\overline{s},\overline{a})\right) \ldots Q^N\left( B_{n_N}^N\mid s_{n_N}^N,a_{n_N}^N,\overline{\tau }(\overline{s},\overline{a})\right) . \end{aligned}$$
In n-person game we assume that the players are limited to policies depending on their own state, that is each player from population i uses Markov policies from the set $\mathcal {M}^i$ or (more specifically) stationary policies from $\mathcal {F}^i$.
The functional maximized by each player is either his $\beta $-discounted reward reward or his total reward. The definitions of both are slight modifications of those for mean-field models. The $\beta $-discounted reward of kth player in population i is defined for any initial state $\overline{s_0}$ and any profile of policies of all the players $\overline{\pi }$ as
$$\begin{aligned} J^{k,i}_{\beta ,n}( \overline{s_0},\overline{\pi })=\mathbb {E}^{ \overline{s_0},Q_n,\overline{\pi }}\sum _{t=0}^\infty \beta ^tr^{i,k}_n(\overline{s_t},\overline{a_t}), \end{aligned}$$
with $\mathbb {P}^{\overline{s_0},Q_n,\overline{\pi }}$ denoting the measure on the set of all infinite histories of the game corresponding to $\overline{s_0}$, $Q_n$ and $\overline{\pi }$ defined with the help of the Ionescu-Tulcea theorem similarly as in case of the mean-field game.

Similarly, the total reward of kth player in population i is defined for any initial state $\overline{s_0}$ and any profile of policies of all the players $\overline{\pi }$ as
$$\begin{aligned} J^{i,k}_{*n}(\overline{s_0},\overline{\pi })=\mathbb {E}^{\overline{s_0},Q_n,\overline{\pi }}\sum _{t=0}^{\mathcal {T}^i_k-1}r^{i,k}_n(\overline{s_t},\overline{a_t}), \end{aligned}$$
with $\mathcal {T}_k^i$ denoting the moment of the first arrival of the process $\{ s_{k,t}^i\}$ to $s^*$.
Finally, the solution we will be looking for in n-person counterparts of the stochastic game is a variant of Nash equilibrium, the standard solution concept used in the stochastic game literature:

Definition 3

A profile of strategies $\overline{\pi }\in \Pi _{i=1}^N\left( \mathcal {M}^i\right) ^{n_i}$ is a Markov–Nash equilibrium in the n-person discounted-reward game if

$$\begin{aligned} \mathbb {E}\left[ J^{k,i}_{\beta ,n}(\overline{s_0},\overline{\pi })\mid \overline{s_0}\sim \overline{\mu }_0^*\right] \ge \mathbb {E}\left[ J^{k,i}_{\beta ,n}(\overline{s_0},[\overline{\pi }_{-i,k},\widehat{\pi _k^i}])\mid \overline{s_0}\sim \overline{\mu }_0^*\right] \end{aligned}$$

(3)

for any $\widehat{\pi _k^i}\in \mathcal {M}^i$, and $i\in \{ 1,\ldots , N\}$, $k\in \{ 1,\ldots ,n_i\}$.

The notation $[\overline{\pi }_{-i,k},\widehat{\pi _k^i}]$ denotes here and in the sequel the profile of policies $\overline{\pi }$ with the policy of kth player in population i replaced by $\widehat{\pi _k^i}$. If (3) is true up to some $\varepsilon >0$, we say that $\overline{\pi }$ is an $\varepsilon $-Markov–Nash equilibrium.

In case of the total reward, we will further reduce the requirements for our approximate solution. We will say that a profile of strategies $\overline{\pi }\in \Pi _{i=1}^N\left( \mathcal {M}^i\right) ^{n_i}$ is an $(\varepsilon ,T)$-Markov–Nash equilibrium in the n-person total-reward game if for $t\in \{ 0,\ldots ,T\}$, $i\in \{ 1,\ldots , N\}$, $k\in \{ 1,\ldots ,n_i\}$ and $\widehat{\pi _k^i}\in \mathcal {M}^i$:

$$\begin{aligned} \mathbb {E}\left[ J^{k,i}_{*n}(\overline{\mu _t},{}^t\overline{\pi })\mid \overline{s_0}\sim \overline{\mu }_0^*\right] \ge \mathbb {E}\left[ J^{k,i}_{*n}(\overline{\mu _t},[{}^t\overline{\pi }_{-i,k},{}^t\widehat{\pi _k^i}])\mid \overline{s_0}\sim \overline{\mu }_0^*\right] -\varepsilon \end{aligned}$$

3 Preliminaries

As we have written, we assume that S and A are metric spaces. The metric on S will be denoted by $d_S$ while that on A by $d_A$. Whenever we relate to a metric on a product space, we mean the sum of the metrics on its coordinates. Some of the assumptions presented below will be given with respect to the moment function $w_0:S\rightarrow [1,\infty )$, that is a continuous function satisfying

$$\begin{aligned} \lim _{n\rightarrow \infty }\inf _{s\in S\setminus K_n}w_0(s)=\infty \end{aligned}$$

for some sequence $\{ K_n\} _{n\ge 1}$ of compact subsets of S. Moreover,

$$\begin{aligned} w_0(s)\ge 1+d_S(s,s_0)^p \end{aligned}$$

(4)

for some $p\ge 1$ and $s_0\in S$.

In order to study both bounded and unbounded one-stage reward functions, we define the following function:

$$\begin{aligned} w:=\left\{ \begin{array}{ll} 1,&{} \text{ if } \text{ each } r_i \text{ is } \text{ bounded }\\ w_0,&{} \text{ otherwise }\end{array}\right. \end{aligned}$$

For any function $h:S\rightarrow \mathbb {R}$ we define its w-norm as

$$\begin{aligned} \left\| h\right\| _w:=\sup _{s\in S}\left|\frac{h(s)}{w(s)}\right|. \end{aligned}$$

Whenever we speak of functions defined on a product of S and some other space, their w-norm is defined similarly, with the help of the same function w.

By $B_w(S)$ we denote the space of all real-valued measurable functions from S to $\mathbb {R}$ with finite w-norm. and by $C_w(S)$ – the space of all continuous functions in $B_w(S)$. Clearly, both $B_w(S)$ and $C_w(S)$ are Banach spaces. The same can be said of $B_w(S\times A)$ and $C_w(S\times A)$ – the spaces of bounded and bounded continuous functions from $S\times A$ to $\mathbb {R}$ with finite w-norm.

Analogously, for any finite signed measure $\mu $ on S, we define the w-norm of $\mu $ as

$$\begin{aligned} \left\| \mu \right\| _w=\sup _{g\in B_w(S),\Vert g\Vert _w\le 1}\left|\int _S g(s)\mu (ds)\right|. \end{aligned}$$

It should be noted that in case $w\equiv 1$, $\Vert \mu \Vert _w$ is the total variation distance (see e.g. [8], Section 7.2).

There are two standard types of convergence of probability measures which are used in the paper: the weak convergence denoted by $\Rightarrow $ and the strong (or setwise) convergence denoted by $\rightarrow $ and defined (for any Borel space $(X,\mathcal {B}(X))$) by

$$\begin{aligned} \mu _n\rightarrow \mu \quad \Longleftrightarrow \quad \mu _n(B)\rightarrow \mu (B) \text{ for } \text{ any } B\in \mathcal {B}(X). \end{aligned}$$

It is known (see e.g. [9], Theorem 6.6) that the weak topology can be metrized using the metric

$$\begin{aligned} \rho (\mu ,\nu ):=\sum _{m=1}^\infty 2^{-m}\left|\int _S\phi _m(s)\mu (ds)-\int _S\phi _m(s)\nu (ds)\right|, \end{aligned}$$

where $\{\phi _i\}_{i\ge 1}$ is a sequence of continuous bounded functions from S to $\mathbb {R}$ whose elements form a dense subset of the unit ball in C(S). Strong convergence topology is in general not metric.

Next, let

$$\begin{aligned} \Delta _w(S):=\left\{ \mu \in \Delta (S):\int _Sw(s)\mu (ds)<\infty \right\} . \end{aligned}$$

It has been shown in [11] that $\Delta _w(S)$ can be metrized using the metric

$$\begin{aligned} \rho _w(\mu ,\nu ):=\rho (\mu ,\nu )+\left|\int _Sw(s)\mu (ds)-\int _Sw(s)\nu (ds)\right|\end{aligned}$$

It can be shown that under (4) $\Delta _w(S)$ with metric $\rho _w$ is a Polish space (see [3, 11] for more on that). We will use the topology defined by this metric (called w-topology in the sequel) as the standard topology on $\Delta _w(S)$.

We will also use the notation

$$\begin{aligned} \Delta _w(S\times A):=\left\{ \tau \in \Delta (S\times A):\int _{S\times A}w(s)\tau (ds\times da)<\infty \right\} \end{aligned}$$

with analogously defined metrics also denoted by $\rho $ (metric defining weak convergence) and $\rho _w$ (w-metric) as well as similar notation for subsets of S or $S\times A$.

Whenever we speak about continuity of correspondences, we refer to the following definitions:

Let X and Y be two metric spaces and $F:X\rightarrow Y$, a correspondence. Let $F^{-1}(G)=\{ x\in X: F(x)\cap G\ne \emptyset \}$. We say that F is upper semicontinuous iff $F^{-1}(G)$ is closed for any closed $G\subset Y$. F is lower semicontinuous iff $F^{-1}(G)$ is open for any open $G\subset Y$. F is said to be continuous iff it is both upper and lower semicontinuous. For more on (semi)continuity of correspondences see [7], Appendix D or [1], Chapter 17.2.

4 Assumptions

In this section, we present the set of assumptions used in our results. It contains the assumptions from [18] used there to prove the existence of Markov mean-field equilibria in games with either discounted or total payoff and new assumption (A5) necessary to show that these equilibria are approximate equilibria for games with large finite number of players. The numbering of assumptions (including prime symbols) is consistent with that used in [18], where the basic versions of the assumptions were the strongest ones, used to prove the existence of stationary equilibria in mean-field games. We start by the assumptions used in the discounted case.

(A1’):

For $i=1,\ldots ,N$, $r^i$ is continuous and bounded above by some constant R on $D^i\times \Pi _{i=1}^N\Delta (D^i)$. Moreover, there exist non-negative constants $\alpha $, $\gamma $, M satisfying $\alpha \beta \gamma <1$ and

$$\begin{aligned} \int _Sw(s)\mu _0^i(ds)\le M\quad \text{ for } i=1,\ldots ,N, \end{aligned}$$

and such that for $i=1,\ldots ,N$, $s\in S^i$ and $t=0,1,2,\ldots $,

$$\begin{aligned} \inf _{(a,\overline{\tau })\in A^i(s)\times \Pi _{i=1}^N\Delta _w^{(t)}(D^i)}r^i(s,a,\overline{\tau })\ge -R\gamma ^tw(s) \end{aligned}$$

with $\Delta _w^{(t)}(D^i):=\left\{ \tau ^i\in \Delta _w(D^i): \int _{D^i}w(s)\tau ^i(ds\times da)\le \alpha ^tM\right\} $.

(A2’):

For $i=1,\ldots ,N$ and any sequence $\{ s_n,a_n,\overline{\tau }_n\}\subset D^i\times \Pi _{i=1}^N\Delta _w(D^i)$ such that $s_n\rightarrow s_*$, $a_n\rightarrow a_*$ and $\overline{\tau }_n\Rightarrow \overline{\tau }^*$, $Q^i(\cdot \mid s_n,a_n,\overline{\tau }_n)\Rightarrow Q(\cdot \mid s_*,a_*,\overline{\tau }^*)$. Moreover,

(a):

for $i=1,\ldots ,N$ the functions

$$\begin{aligned} \int _S w(s')Q^i(ds'\mid s,a,\overline{\tau }) \end{aligned}$$

are continuous in $(s,a,\overline{\tau })$,

(b):

for $i=1,\ldots ,N$ and $s\in S^i$

$$\begin{aligned} \sup _{(a,\overline{\tau })\in A^i(s)\times \Pi _{i=1}^N\Delta (D^i)}\int _Sw(s')Q^i(s'\mid s,a,\overline{\tau })\le \alpha w(s). \end{aligned}$$

(A3):

For $i=1,\ldots ,N$, correspondences $A^i$ are continuous.

Assumptions (A1’) and (A2’) are modified for the total payoff case and complemented by new assumption (A4”). Their formulation requires defining for $i=1,\ldots ,N$, $s\in S^i$, $a\in A^i(s)$ and $\overline{\tau }\in \Pi _{j=1}^N\Delta (D^j)$ the modified transition probabilities $Q^i_*$:

$$\begin{aligned} Q^i_*(\cdot \mid s,a,\overline{\tau }):=\left\{ \begin{array}{ll} Q^i(\cdot \mid s,a,\overline{\tau }),&{} \text{ if } s\ne s^*\\ \delta _{s^*},&{} \text{ if } s=s^*\end{array}\right. \end{aligned}$$

(A1”):

For $i=1,\ldots ,N$, $r^i$ is continuous and bounded above by some constant R on $D^i\times \Pi _{i=1}^N\Delta (D^i)$. Moreover, there exist non-negative constants $\alpha $, $\gamma $, M satisfying $\alpha \le \gamma $, $\alpha \gamma <1$ and

$$\begin{aligned} \int _Sw(s)\mu _0^i(ds)\le M\quad \text{ for } i=1,\ldots ,N, \end{aligned}$$

and such that for $i=1,\ldots ,N$, $s\in S^i$ and $t=0,1,2,\ldots $,

$$\begin{aligned} \inf _{(a,\overline{\tau })\in A^i(s)\times \Pi _{i=1}^N\Delta _w^{(t)}(D^i)}r^i(s,a,\overline{\tau })\ge -R\gamma ^tw(s) \end{aligned}$$

with $\Delta _w^{(t)}(D^i):=\left\{ \tau ^i\in \Delta _w(D^i): \int _{D^i}w(s)\tau ^i(ds\times da)\le \alpha ^tM\right\} $.

(A2”):

For $i=1,\ldots ,N$ and any sequence $\{ s_n,a_n,\overline{\tau }_n\}\subset D^i\times \Pi _{i=1}^N\Delta _w(D^i)$ such that $s_n\rightarrow s_*$, $a_n\rightarrow a_*$ and $\overline{\tau }_n\Rightarrow \tau ^*$, $Q^i(\cdot \mid s_n,a_n,\overline{\tau }_n)\rightarrow Q(\cdot \mid s_*,a_*,\tau ^*)$. Moreover,

(a):

for $i=1,\ldots ,N$ the functions

$$\begin{aligned} \int _S w(s')Q^i(ds'\mid s,a,\overline{\tau }) \end{aligned}$$

are continuous in $(s,a,\overline{\tau })$,

(b):

for $i=1,\ldots ,N$ and $s\in S^i$

$$\begin{aligned} \sup _{(a,\overline{\tau })\in A^i(s)\times \Pi _{i=1}^N\Delta (D^i)}\int _Sw(s')Q^i(s'\mid s,a,\overline{\tau })\le \alpha w(s). \end{aligned}$$

(A4”):

For $i=1,\ldots ,N$,

$$\begin{aligned} \lim _{T\rightarrow \infty }\sup _{\begin{array}{c} \pi ^i\in \mathcal {M}^i,\\ (\overline{\tau })\in \Pi _{t=0}^\infty \Pi _{j=1}^N\Delta (D^j) \end{array}}\left\| \sum _{t=T}^\infty \int _{S^i\setminus \{ s^*\}}w(s')\alpha ^{-t}\left( Q^i_*\right) ^t(ds'\mid s,\pi ^i,(\overline{\tau }))\right\| _w=0. \end{aligned}$$

Finally, we present additional assumptions used to prove the approximation theorems. Beforehand, for $i=1,\ldots ,N$ let us define the following moduli of continuity:

$$\begin{aligned} \omega _{Q^i}(\delta ):= & {} \sup _{(s,a)\in D^i}\sup _{\overline{\tau },\overline{\eta }:\sum _{j=1}^N\widetilde{\rho }_w(\tau ^j,\eta ^{j})\le \delta }\left\| Q^i(\cdot \mid s,a,\overline{\tau })-Q^i(\cdot \mid s,a,\overline{\eta })\right\| _w,\\ \omega _{r^i}(\delta ):= & {} \sup _{(s,a)\in D^i}\sup _{\overline{\tau },\overline{\eta }:\sum _{j=1}^N\widetilde{\rho }_w(\tau ^j,\eta ^{j})\le \delta }\left|r^i(s,a,\overline{\tau })-r^i(s,a,\overline{\eta })\right|, \end{aligned}$$

where $\widetilde{\rho }_w=\rho _w$ if any $r^i$ is unbounded, or $\widetilde{\rho }_w=\rho $ otherwise.

For any function $g:\Pi _{i=1}^N\Delta _w(D^i)\rightarrow \mathbb {R}$, $i=1,\ldots ,N$ we next define its w-norm as follows:

$$\begin{aligned} \Vert g\Vert ^*_w:=\sup _{\overline{\tau }\in \Pi _{i=1}^N\Delta _w(D^i)}\frac{|g(\overline{\tau })|}{\sum _{i=1}^N\int _{D^i}w(s)\tau ^i(ds\times da)}. \end{aligned}$$

Now we can formulate our additional assumptions. They adapt the assumptions used in [11] to our multi-population case.

(A5):

(a):

We assume that $\omega _{Q^i}(\delta )\rightarrow 0$ and $\omega _{r^i}(\delta )\rightarrow 0$ for $i=1,\ldots ,N$ as $\delta \rightarrow 0$. Moreover, the following real-valued functions defined on $\Pi _{i=1}^N\Delta _w(D^i)$:

$$\begin{aligned} \Omega _Q^{\overline{\tau }}(\overline{\eta }):=\max _i\omega _{Q^i}(\sum _{j=1}^N\widetilde{\rho }_w(\eta ^j, \tau ^j)) \text{ and } \Omega _r^{\overline{\tau }}(\overline{\eta }):=\max _i\omega _{r^i}(\sum _{j=1}^N\widetilde{\rho }_w(\eta ^j, \tau ^j)) \end{aligned}$$

have finite w-norm.

(b):

There exist non-negative real numbers B and $B_0$ such that for $i=1,\ldots ,N$ and $s\in S^i$

$$\begin{aligned} \sup _{(a,\overline{\tau })\in A^i(s)\times \Pi _{i=1}^N\Delta (D^i)}\int _Sw^2(s')Q^i(s'\mid s,a,\overline{\tau })\le B w^2(s) \end{aligned}$$

and $\int _Sw^2(s)\mu _0^{*i}(ds)\le B_0$.

5 Main Results

5.1 Results for the Discounted Payoff Case

In the first of our main results we address the case of discounted reward game.

Theorem 1

Suppose assumptions (A1’), (A2’), (A3) and (A5) hold and suppose $\overline{\pi }$ and $(\overline{\mu }^*_0,\overline{\mu }^*_1,\ldots )$ form a Markov mean-field equilibrium in the multi-population discrete-time mean-field game existing by Theorem 4 in [18]. If in addition, for each $t\ge 0$ and $i=1,\ldots ,N$, $\pi ^i_t$ is weakly continuous, then for any $\varepsilon >0$ there exist positive integers $n_i(\varepsilon )$, $i=1,\ldots ,N$ such that the vector of strategies where each player from population i uses policy $\pi ^i$ is an $\varepsilon $-Markov–Nash equilibrium in any n-person stochastic counterpart of the $\beta $-discounted mean-field game if $n^i\ge n^i(\varepsilon )$, $i=1,\ldots ,N$.

Remark 1

Note that stationary mean-field equilibrium existing according to Theorem 1 in [18] is a specific case of Markov mean-field equilibrium with stationarity condition imposed on global states of the game at subsequent stages. Hence, the result provided by Theorem 1 also holds in this case.

The proof of Theorem 1 will adapt the techniques used in [11] to our model. It will require introducing some additional notation. Recall the Polish space $(\Delta _w(S\times A),\rho _w)$. Define the Wasserstein distance of order 1 on the set of probability measures over $\Delta _w(S\times A)$, $\Delta (\Delta _w(S\times A))$ by the formula:

$$\begin{aligned} W_1(\Phi ,\Psi ):=\inf _{\gamma \in \Gamma (\Phi ,\Psi )}\int _{\Delta (\Delta _w(S\times A))\times \Delta (\Delta _w(S\times A))}d_S(x,y)\; d\gamma (x,y)\}, \end{aligned}$$

where $\Gamma (\Phi ,\Psi )$ denotes the collection of all measures on $\Delta (\Delta _w(S\times A))\times \Delta (\Delta _w(S\times A))$ with marginals $\Phi $ and $\Psi $ on the 1st and the 2nd coordinate.

Next, for $i=1,\ldots ,N$, define the following spaces:

$$\begin{aligned} \Delta _1^i(\Delta _w(S\times A)):= & {} \left\{ \Phi \in \Delta (\Delta _w(S\times A)):\; \int _{\Delta _w(S\times A)}\rho _w(\tau ,\Pi ^i_0(\pi ^i,\mu ^{*i}_0))\Phi (d\tau )<\infty \right\} ,\\ C_w(\Pi _{i=1}^N\Delta _w(D^i)):= & {} \left\{ g:\Pi _{i=1}^N\Delta _w(D^i)\rightarrow \mathbb {R}:\; g \text{ is } \text{ continuous } \text{ and } \Vert g\Vert ^*_w<\infty \right\} . \end{aligned}$$

Before we get to the actual proof of the theorem, note that the game is symmetric, hence, proving only that the inequality defining $\varepsilon $-Nash equilibrium holds for the first player in the first population will be enough to verify the theorem.

In our proof we shall use the following notation:

We will use the notation $\overline{\pi }$ also to denote the vector of strategies in the n-person counterpart of the mean-field game where each player from population i uses strategy $\pi ^i$. The strategy vector where the first player in the first population changes his strategy to an arbitrary weakly continuous Markov strategy $\widehat{\pi ^1_1}$ will be denoted by $\widehat{\pi }$.
The vector of states at time t in n-person game with $n_i$ players in population i, $i=1,\ldots ,N$ will be denoted by $\overline{s}_t^{\textbf{n}}=(\overline{s}_t^{\textbf{n},1},\ldots ,\overline{s}_t^{\textbf{n},N})$ with $\overline{s}_t^{\textbf{n},i}=(s_{t,1}^{\textbf{n},i},\ldots ,s_{t,n_i}^{\textbf{n},i})$ for $i=1,\ldots ,N$. Similarly, the vector of actions at time t will be denoted by $\overline{a}_t^{\textbf{n}}=(\overline{a}_t^{\textbf{n},1},\ldots ,\overline{a}_t^{\textbf{n},N})$ with $\overline{a}_t^{\textbf{n},i}=(a_{t,1}^{\textbf{n},i},\ldots ,a_{t,n_i}^{\textbf{n},i})$ for $i=1,\ldots ,N$. The corresponding empirical state-action distribution of i-th population will be denoted by $e_i^\textbf{n}$.
When we want to distinguish between what happens when strategy vector $\overline{\pi }$ is used and when $\widehat{\pi }$, an overline or a hat is added to a specific symbol, in particular the empirical state-action distributions at time t when the two strategy vectors are used will be denoted by $\overline{e}_{i,t}^\textbf{n}$ and $\widehat{e}_{i,t}^\textbf{n}$, respectively, with $\overline{\overline{e}_{t}^\textbf{n}}=(\overline{e}_{1,t}^\textbf{n},\ldots ,\overline{e}_{N,t}^\textbf{n})$ and $\overline{\widehat{e}_{t}^\textbf{n}}=(\widehat{e}_{1,t}^\textbf{n},\ldots ,\widehat{e}_{N,t}^\textbf{n})$, while the state and the action of the first player in the first population at time t by $\bar{s}_{t,1}^{\textbf{n},1}$, $\bar{a}_{t,1}^{\textbf{n},1}$, and by $\hat{s}_{t,1}^{\textbf{n},1}$, $\hat{a}_{t,1}^{\textbf{n},1}$, respectively.
For any random element $\theta $, its distribution will be denoted by $\mathcal {L}(\theta )$. In particular, the distributions of random elements $(\bar{s}_{t,1}^{\textbf{n},1},\bar{a}_{t,1}^{\textbf{n},1})$, $(\hat{s}_{t,1}^{\textbf{n},1},\hat{a}_{t,1}^{\textbf{n},1})$, $\overline{e}_{i,t}^\textbf{n}$ and $\widehat{e}_{i,t}^\textbf{n}$ will be denoted by $\mathcal {L}\left( \bar{s}_{t,1}^{\textbf{n},1},\bar{a}_{t,1}^{\textbf{n},1}\right) $, $\mathcal {L}\left( \hat{s}_{t,1}^{\textbf{n},1},\hat{a}_{t,1}^{\textbf{n},1}\right) $, $\mathcal {L}\left( \overline{e}_{i,t}^\textbf{n}\right) $ and $\mathcal {L}\left( \widehat{e}_{i,t}^\textbf{n}\right) $, respectively.
The equilibrium state distribution in the mean-field limit at time t will be denoted by $\mu _t^{*i}$, $i=1,\ldots ,N$ while equilibrium state-action distribution in the mean-field limit $\Pi ^i_t(\pi ^i,\mu _t^{*i})$ by $\tau _t^{*i}$ with $\overline{\mu _t^*}$ and $\overline{\tau _t^*}$ standing for their vectors. Finally, state and action of the first player in i-th population at time t if the first player in the first population uses policy $\widehat{\pi _1^1}$ while others stick to their policies in the mean-field equilibrium, will be denoted by $\hat{s}_{t,1}^{i}$ and $\hat{a}_{t,1}^{i}$ with $\mathcal {L}\left( \hat{s}_{t,1}^{i},\hat{a}_{t,1}^{i}\right) $ their joint distribution.

In the first lemma, we adapt one of the results from Lemma 4.3 in [11] to our multidimensional case.

Lemma 2

Let $\Phi _\textbf{n}^i\in \Delta _1^i(\Delta _w(S\times A))$ and $\delta _{\tau _i^*}\in \Delta _1^i(\Delta _w(S\times A))$ for $\textbf{n}\in \textbf{N}^N$ and $i=1,\ldots ,N$. Then if $W_1(\Phi _\textbf{n}^i,\delta _{\tau ^{*i}})\rightarrow 0$ for $i=1,\ldots ,N$ as $\textbf{n}\rightarrow \infty $, then

$$\begin{aligned} \mathbb {E}\left[ |F\left( \overline{\tau }_\textbf{n}\right) -F\left( \overline{\tau }\right) |\right] \rightarrow 0\quad \text{ as } \textbf{n}\rightarrow 0 \end{aligned}$$

for any $F\in C_w\left( \Pi _{i=1}^N\Delta _w(D^i)\right) $ and any $\Pi _{i=1}^N\Delta _w(D^i)$-valued random elements $\overline{\tau }_\textbf{n}=\left( {\tau }_\textbf{n}^1,\ldots ,{\tau }_\textbf{n}^N\right) $, $\textbf{n}\in \textbf{N}^N$, such that $\mathcal {L}\left( {\tau }_\textbf{n}^i\right) =\Phi _\textbf{n}^i$, $i=1,\ldots , N$, and a $\Pi _{i=1}^N\Delta _w(D^i)$-valued random element $\overline{\tau }=\left( {\tau }^1,\ldots ,{\tau }^N\right) $ such that $\mathcal {L}\left( {\tau }^i\right) =\delta _{\tau ^{*i}}$, $i=1,\ldots ,N$

Proof

In the proof we will inductively (with respect to N) verify a stronger result, stating that for any function F satisfying the assumptions of the lemma, the functions $\widetilde{F}_\textbf{n}:\Delta _w(D^N)\rightarrow \mathbb {R}$, defined as

$$\begin{aligned} \widetilde{F}_\textbf{n}(\tau ^N)=\mathbb {E}\left[ F(\tau _\textbf{n}^1,\ldots ,\tau _\textbf{n}^{N-1},\tau ^N)\mid \mathcal {L}(\tau _\textbf{n}^{1})=\Phi _\textbf{n}^1,\ldots ,\mathcal {L}(\tau _\textbf{n}^{N-1})=\Phi _\textbf{n}^{N-1}\right] \end{aligned}$$

and $\widehat{F}:\Delta _w(D^N)\rightarrow \mathbb {R}$, defined as

$$\begin{aligned} \widehat{F}(\tau ^N)=F(\tau ^{*1},\ldots ,\tau ^{*N-1},\tau ^N) \end{aligned}$$

satisfy for any convergent sequence $\left\{ \tau ^N_k\right\} _{k\ge 1}\subset \Delta _w(D^N)$

$$\begin{aligned} \widetilde{F}_\textbf{n}(\tau ^N_k)\rightarrow _{\textbf{n}\rightarrow \infty ,k\rightarrow \infty } \widehat{F}(\lim _{k\rightarrow \infty }\tau ^N_k). \end{aligned}$$

(5)

We precede the main part of the proof by showing that for any $\textbf{n}\in \mathbb {N}^N$ and any $F\in C_w\left( \Pi _{i=1}^N\Delta _w(D^i)\right) $, $\widetilde{F}_\textbf{n}\in C_w(\Delta _w(D^N))$.

Note that from the definition of the $\Vert \cdot \Vert _w^*$ norm we know that for any $\overline{\tau }\in \Pi _{i=1}^N\Delta _w(D^i)$

$$\begin{aligned} \frac{|F(\overline{\tau })|}{\int _{D^N}w(s)\tau ^N(ds\times da)}\le & {} \Vert F\Vert _w^*\left( 1+\frac{\sum _{i=1}^{N-1}\int _{D^i}w(s)\tau ^i(ds\times da)}{\int _{D^N}w(s)\tau ^N(ds\times da)}\right) \\\le & {} \Vert F\Vert _w^*\left( 1+\sum _{i=1}^{N-1}\int _{D^i}w(s)\tau ^i(ds\times da)\right) , \end{aligned}$$

with the last inequality following from the fact that $w\ge 1$. Consequently,

$$\begin{aligned} \frac{\left|\widetilde{F}_\textbf{n}(\tau ^N)\right|}{\int _{D^N}w(s)\tau ^N(ds\times da)}\le \Vert F\Vert _w^*\left( 1+\sum _{i=1}^{N-1}\mathbb {E}\left[ \int _{D^i}w(s)\tau ^i(ds\times da)\mid \mathcal {L}(\tau _\textbf{n}^{i})=\Phi _\textbf{n}^i)\right] \right) . \end{aligned}$$

By Lemma 4.3 in [11] we know that the RHS of the above inequality converges to

$$\begin{aligned} \Vert F\Vert _w^*\left( 1+\sum _{i=1}^{N-1}\int _{D^i}w(s)\tau ^{*i}(ds\times da)\right) <\infty \end{aligned}$$

as $\textbf{n}\rightarrow \infty $. This however implies that there exists a number $W^N_*$ such that for every $\textbf{n}\in \mathbb {N}^N$,

$$\begin{aligned} \frac{\left|\widetilde{F}_\textbf{n}(\tau ^N)\right|}{\int _{D^N}w(s)\tau ^N(ds\times da)}\le W^N_*, \end{aligned}$$

which means that for each $\textbf{n}$, $\left\| \widetilde{F}_\textbf{n}\right\| _w^*\le W^N_*$.

To show that for each $\textbf{n}$, $\widetilde{F}_\textbf{n}$ is continuous, we take a convergent sequence $\left\{ \tau ^N_k\right\} _{k\ge 1}\subset \Delta _w(D^N)$ and note that

$$\begin{aligned}{} & {} \lim _{k\rightarrow \infty }\left|\widetilde{F}_\textbf{n}(\tau ^N_k)-\widehat{F}(\lim _{k\rightarrow \infty }\tau ^N_k)\right|\nonumber \\{} & {} \quad =\lim _{k\rightarrow \infty }\Bigg |\mathbb {E}\left[ F(\tau _\textbf{n}^1,\ldots ,\tau _\textbf{n}^{N-1},\tau ^N_k)-F(\tau _\textbf{n}^1,\ldots ,\tau _\textbf{n}^{N-1},\lim _{k\rightarrow \infty }\tau ^N_k)\right. \nonumber \\{} & {} \qquad \left. \mid \mathcal {L}(\tau _\textbf{n}^{1})=\Phi _\textbf{n}^1,\ldots ,\mathcal {L}(\tau _\textbf{n}^{N-1})=\Phi _\textbf{n}^{N-1}\right] \Bigg |\nonumber \\{} & {} \quad \le \lim _{k\rightarrow \infty } \mathbb {E}\left[ \left|F(\tau _\textbf{n}^1,\ldots ,\tau _\textbf{n}^{N-1},\tau ^N_k)-F(\tau _\textbf{n}^1,\ldots ,\tau _\textbf{n}^{N-1},\lim _{k\rightarrow \infty }\tau ^N_k)\right|\right. \nonumber \\{} & {} \qquad \left. \mid \mathcal {L}(\tau _\textbf{n}^{1})=\Phi _\textbf{n}^1,\ldots ,\mathcal {L}(\tau _\textbf{n}^{N-1})=\Phi _\textbf{n}^{N-1}\right] \end{aligned}$$

(6)

Note that by the definition of the w-topology, the sequence of integrals $\int _{D^N}w(s)\tau ^N_k(ds\times da)$ converges to $\int _{D^N}w(s)\lim _{k\rightarrow \infty }\tau ^N_k(ds\times da)$, hence there exists a number $W^N>0$ such that

$$\begin{aligned} \int _{D^N}w(s)\tau ^N_k(ds\times da)\le W^N\quad \text{ for } k\ge 1 \end{aligned}$$

(7)

Next, note that by the definition of the $\Vert \cdot \Vert _w^*$ norm,

$$\begin{aligned} \left|F(\tau _\textbf{n}^1,\ldots ,\tau _\textbf{n}^{N-1},\tau ^N_k)\right|\le \left\| F\right\| _w^*\left( \sum _{i=1}^{N-1}\int _{D^i}w(s)\tau ^i_\textbf{n}(ds\times da)+\int _{D^N}w(s)\tau ^N_k(ds\times da)\right) \end{aligned}$$

The function on the RHS is the sum of a function of variables $\tau ^i_\textbf{n}$, $i=1,\ldots ,N-1$ with a finite integral with respect to the measure $\Pi _{i=1}^{N-1}\Phi ^i_\textbf{n}$ by the assumption of the lemma and a bounded term independent of these variables (a function of $\tau ^N_k$ bounded by $\left\| F\right\| _w^*W^N$ for any $k\ge 1$ by (7)). Hence, by the dominated convergence theorem the RHS of (6) equals zero for any $\textbf{n}$, which implies that $\widetilde{F}_\textbf{n}$ is continuous.

Next, we turn to the inductive proof of (5). It is obvious that it holds for $N=1$. Suppose that (5) holds for some $N-1$ for any function satisfying the assumptions of the lemma. We will show it is also true for N. First, note that

$$\begin{aligned}{} & {} \lim _{\textbf{n}\rightarrow \infty ,k\rightarrow \infty }\left|\widetilde{F}_\textbf{n}(\tau ^N_k)-\widehat{F}(\lim _{k\rightarrow \infty }\tau ^N_k)\right|\nonumber \\{} & {} \quad =\lim _{\textbf{n}\rightarrow \infty ,k\rightarrow \infty }\Bigg |\mathbb {E}\left[ F(\tau _\textbf{n}^1,\ldots ,\tau _\textbf{n}^{N-1},\tau ^N_k)\mid \mathcal {L}(\tau _\textbf{n}^{1})=\Phi _\textbf{n}^1,\ldots ,\mathcal {L}(\tau _\textbf{n}^{N-1}) =\Phi _\textbf{n}^{N-1}\right] \nonumber \\{} & {} \qquad -F(\tau ^{*1},\ldots ,\tau ^{*N-1},\lim _{k\rightarrow \infty }\tau ^N_k)\Bigg |\nonumber \\{} & {} \quad \le \lim _{\textbf{n}\rightarrow \infty ,k\rightarrow \infty }\mathbb {E}\left[ \left|F(\tau _\textbf{n}^1,\ldots ,\tau _\textbf{n}^{N-1},\tau ^N_k)-F(\tau _\textbf{n}^1,\ldots ,\tau _\textbf{n}^{N-1}, \lim _{k\rightarrow \infty }\tau ^N_k)\right|\right. \nonumber \\{} & {} \qquad \left. \mid \mathcal {L}(\tau _\textbf{n}^{1})=\Phi _\textbf{n}^1,\ldots ,\mathcal {L}(\tau _\textbf{n}^{N-1})=\Phi _\textbf{n}^{N-1}\right] \nonumber \\{} & {} \qquad +\lim _{\textbf{n}\rightarrow \infty }\Bigg |\mathbb {E}\left[ F(\tau _\textbf{n}^1,\ldots ,\tau _\textbf{n}^{N-1},\lim _{k\rightarrow \infty }\tau ^N_k) -F(\tau ^{*1},\ldots ,\tau ^{*N-1},\lim _{k\rightarrow \infty }\tau ^N_k)\right. \nonumber \\{} & {} \qquad \left. \mid \mathcal {L}(\tau _\textbf{n}^{1})=\Phi _\textbf{n}^1,\ldots ,\mathcal {L}(\tau _\textbf{n}^{N-1})=\Phi _\textbf{n}^{N-1}\right] \Bigg |\end{aligned}$$

(8)

To show that the first term on the RHS of (8) equals zero, we define functions $\theta _k^N, \theta ^N:\Pi _{i=1}^{N-1}\Delta _w(D^i)\rightarrow \mathbb {R}$ as

$$\begin{aligned} \theta _k^N(\tau ^1,\ldots ,\tau ^{N-1})= & {} F(\tau ^1,\ldots ,\tau ^{N-1},\tau ^N_k),\\ \theta ^N(\tau ^1,\ldots ,\tau ^{N-1})= & {} F(\tau ^1,\ldots ,\tau ^{N-1},\lim _{k\rightarrow \infty }\tau ^N_k). \end{aligned}$$

The definition of the $\Vert \cdot \Vert ^*_w$ norm, (7) and the fact that $w\ge 1$ imply then that

$$\begin{aligned} \left\| \theta _k^N\right\| _w^*\le \Vert F\Vert _w^*(1+W^N) \text{ for } k\ge 1\quad \text{ and }\quad \left\| \theta ^N\right\| _w^*\le \Vert F\Vert _w^*(1+W^N). \end{aligned}$$

As $\theta _k^N$ converges continuously to $\theta ^N$, by Theorem 3.3 in [15], the first term on the RHS of (8) is zero.

To show that the same is true for the second term, we first rewrite it as follows:

$$\begin{aligned}{} & {} \lim _{\textbf{n}\rightarrow \infty }\Bigg |\mathbb {E}\left[ F(\tau _\textbf{n}^1,\ldots ,\tau _\textbf{n}^{N-1},\lim _{k\rightarrow \infty }\tau ^N_k)-F(\tau ^{*1}, \ldots ,\tau ^{*N-1},\lim _{k\rightarrow \infty }\tau ^N_k)\right. \\{} & {} \qquad \left. \mid \mathcal {L}(\tau _\textbf{n}^{1})=\Phi _\textbf{n}^1,\ldots ,\mathcal {L}(\tau _\textbf{n}^{N-1}) =\Phi _\textbf{n}^{N-1}\right] \Bigg |\\{} & {} \quad =\lim _{\textbf{n}\rightarrow \infty }\left|\int _{\Delta _w(S\times A)}\left( \widetilde{\left( \theta ^N\right) }_\textbf{n}(\tau _\textbf{n}^{N-1}) -\widehat{\theta ^N}(\tau ^{*N-1})\right) \Phi _\textbf{n}^{N-1}\left( d\tau _\textbf{n}^{N-1}\right) \right|\\{} & {} \qquad \le \lim _{\textbf{n}\rightarrow \infty } \left[ \int _{\Delta _w(S\times A)}\left|\widetilde{\left( \theta ^N\right) }_\textbf{n}(\tau _\textbf{n}^{N-1}) -\widetilde{\left( \theta ^N\right) }_\textbf{n}(\tau ^{*N-1})\right|\Phi _\textbf{n}^{N-1}\left( d\tau _\textbf{n}^{N-1}\right) \right. \\{} & {} \qquad \left. +\left|\widetilde{\left( \theta ^N\right) }_\textbf{n}(\tau ^{*N-1}) -\widehat{\theta ^N}(\tau ^{*N-1})\right|\right] \end{aligned}$$

The second term goes to zero by the inductive hypothesis. To show that the same is true for the first one, note that $\theta ^N$ is a continuous function of $N-1$ variables with a finite $\Vert \cdot \Vert _w^*$ norm, hence, for any $\textbf{n}\in \mathbb {N}^N$, $\widetilde{\left( \theta ^N\right) }_\textbf{n}\in C_w(\Delta _w(D^{N-1}))$. Now we can apply Lemma 4.3 from [11] to show that the first term also goes to zero, ending the proof. $\square $

In the second lemma, we show that the sequences of random measures $\widehat{e}_i^\textbf{n}$ converge in some sense to the mean-field equilibrium state-action distributions $\tau ^{*i}_t$ as $\textbf{n}\rightarrow \infty $.

Lemma 3

For $i=1,\ldots ,N$ and any $t\ge 0$,

$$\begin{aligned} \lim _{\textbf{n}\rightarrow \infty }W_1\left( \mathcal {L}\left( \widehat{e}_{i,t}^\textbf{n}\right) ,\delta _{\tau ^{*i}_t}\right) =0 \end{aligned}$$

in $\Delta _1^i(\Delta _w(S\times A))$.

Proof

By Lemma 4.3 in [11], to prove the thesis we only need to show that for any i and t,

$$\begin{aligned} \lim _{\textbf{n}\rightarrow \infty }\mathbb {E}\left[ \int _{S\times A}f(s,a)\widehat{e}_{i,t}^\textbf{n}(ds\times da)-\int _{S\times A}f(s,a)\tau ^{*i}_t(ds\times da)\right] =0 \end{aligned}$$

for any $f\in C_w(S\times A)$. We do it by induction on t.

Suppose $t=0$. If $i\ne 1$, then $\{ (\hat{s}_{0,k}^{\textbf{n},i},\hat{a}_{0,k}^{\textbf{n},i})\}_{1\le k\le n_i}\sim \Pi _{k=1}^{n_i}\tau _0^{*i}$. As any $f\in C_w(S\times A)$ is w-integrable by assumption (A1’), the claim holds in this case. Next, suppose that $i=1$. Then

$$\begin{aligned} \int _{S\times A}f(s,a)\widehat{e}_{i,t}^\textbf{n}(ds\times da)=\frac{1}{n_1}f(\hat{s}_{0,1}^{\textbf{n},1},\hat{a}_{0,1}^{\textbf{n},1})+\frac{n_1-1}{n_1}\int _{S\times A}f(s,a)\widehat{e}_{1,t}^{\textbf{n}-\textbf{e}_1}(ds\times da) \end{aligned}$$

with^{Footnote 4}$\widehat{e}_{1,t}^{\textbf{n}-\textbf{e}_1}=\frac{1}{n_1-1}\sum _{k=2}^{n_1}\delta _{(\hat{s}_{t,k}^{\textbf{n},1},\hat{a}_{t,k}^{\textbf{n},1})}$. Now note that the expectation of the first term converges to zero when $n_1\rightarrow \infty $ as

$$\begin{aligned} \mathbb {E}\left[ \frac{1}{n_1}f(\hat{s}_{0,1}^{\textbf{n},1},\hat{a}_{0,1}^{\textbf{n},1})\right] \le \frac{1}{n_1}\Vert f\Vert _wM \end{aligned}$$

by assumption (A1’). On the other hand, the expected value of the second term goes to $\int _{S\times A}f(s,a)\tau ^{*1}_0(ds\times da)$ by the argument used for $i\ne 1$.

Now suppose the claim holds for t and consider $t+1$. The claim will only be proved for $i=1$. The proof for $i\ne 1$ goes along the same lines (we only do not need to consider the first term on the RHS below in that case). Let us fix $f\in C_w(S\times A)$. Then

$$\begin{aligned}{} & {} \left|\int _{S\times A}f(s,a)\widehat{e}_{1,t+1}^\textbf{n}(ds\times da)-\int _{S\times A}f(s,a)\tau ^{*1}_{t+1}(ds\times da)\right|\nonumber \\{} & {} \quad \le \frac{1}{n_1}\left|f(\hat{s}_{t+1,1}^{\textbf{n},1},\hat{a}_{t+1,1}^{\textbf{n},1})-\int _{S\times A}f(s',a')\pi _{t+1}^1(da'\mid s')Q^1(ds'\mid \hat{s}_{t,1}^{\textbf{n},1},\hat{a}_{t,1}^{\textbf{n},1},\overline{\widehat{e}_{t}^\textbf{n}})\right|\nonumber \\{} & {} \qquad +\frac{n_1-1}{n_1}\Bigg |\int _{S\times A}f(s,a)\widehat{e}_{1,t+1}^\mathbf {n-\textbf{e}_1}(ds\times da)\nonumber \\{} & {} \qquad -\int _{S\times A}\int _{S\times A}f(s',a')\pi _{t+1}^1(da'\mid s')Q^1(ds'\mid s,a,\overline{\widehat{e}_{t}^\textbf{n}})\widehat{e}_{1,t}^{\textbf{n}-\textbf{e}_1}(ds\times da)\Bigg |\nonumber \\{} & {} \qquad +\Bigg |\int _{S\times A}\int _{S\times A}f(s',a')\pi _{t+1}^1(da'\mid s')Q^1(ds'\mid s,a,\overline{\widehat{e}_{t}^\textbf{n}})\widehat{e}_{1,t}^\textbf{n}(ds\times da)\nonumber \\{} & {} \qquad -\int _{S\times A}f(s,a)\tau ^{*1}_{t+1}(ds\times da)\Bigg |\end{aligned}$$

(9)

To finish the proof, we need to show that the expected values of each term on the RHS of (9) go to zero as $\textbf{n}\rightarrow \infty $.

First, let us consider the first term.

$$\begin{aligned}{} & {} \mathbb {E}\left[ \frac{1}{n_1}\left|f(\hat{s}_{t+1,1}^{\textbf{n},1},\hat{a}_{t+1,1}^{\textbf{n},1})-\int _{S\times A}f(s',a')\pi _{t+1}^1(da'\mid s')Q^1(ds'\mid \hat{s}_{t,1}^{\textbf{n},1},\hat{a}_{t,1}^{\textbf{n},1},\overline{\widehat{e}_{t}^\textbf{n}})\right|\right] \\{} & {} \quad \le \frac{1}{n_1}\mathbb {E}\left[ \left|f(\hat{s}_{t+1,1}^{\textbf{n},1},\hat{a}_{t+1,1}^{\textbf{n},1})\right|\right] +\frac{1}{n_1}\mathbb {E}\left[ \int _{S\times A}|f(s',a')|\pi _{t+1}^1(da'\mid s')Q^1(ds'\mid \hat{s}_{t,1}^{\textbf{n},1},\hat{a}_{t,1}^{\textbf{n},1},\overline{\widehat{e}_{t}^\textbf{n}})\right] \\{} & {} \quad \le \frac{\Vert f\Vert _w}{n_1}\mathbb {E}\left[ w(\hat{s}_{t+1,1}^{\textbf{n},1})\right] +\frac{\Vert f\Vert _w}{n_1}\mathbb {E}\left[ \int _{S}w(s')Q^1(ds'\mid \hat{s}_{t,1}^{\textbf{n},1},\hat{a}_{t,1}^{\textbf{n},1},\overline{\widehat{e}_{t}^\textbf{n}})\right] \le 2\frac{\Vert f\Vert _w}{n_1}\alpha ^{t+1}M, \end{aligned}$$

where the last inequality follows from assumption (A1’) and a repeated application of (b) of assumption (A2’). Clearly, the last expression goes to zero as $n_1\rightarrow \infty $.

Next, let us consider the expectation of the second term on the RHS of (9). We can write it as

$$\begin{aligned}{} & {} \frac{n_1-1}{n_1}\mathbb {E}\left[ \mathbb {E}\left[ \Bigg |\int _{S\times A}f(s,a)\widehat{e}_{1,t+1}^\mathbf {n-\textbf{e}_1}(ds\times da)\right. \right. \\{} & {} \quad -\left. \left. \int _{S\times A}\int _{S\times A}f(s',a')\pi _{t+1}^1(da'\mid s')Q^1(ds'\mid s,a,\overline{\widehat{e}_{t}^\textbf{n}})\widehat{e}_{1,t}^{\textbf{n}-\textbf{e}_1}(ds\times da)\Bigg |\Bigg |\overline{\hat{s}_{t}^{\textbf{n}}},\overline{\hat{a}_{t}^{\textbf{n}}}\right] \right] \end{aligned}$$

The square of it can be bounded above as follows:

$$\begin{aligned}{} & {} \left( \frac{n_1-1}{n_1}\mathbb {E}\left[ \mathbb {E}\left[ \Bigg |\int _{S\times A}f(s,a) \widehat{e}_{1,t+1}^\mathbf {n-\textbf{e}_1}(ds\times da)\right. \right. \right. \\{} & {} \qquad -\left. \left. \left. \int _{S\times A}\int _{S\times A}f(s',a')\pi _{t+1}^1 (da'\mid s')Q^1(ds'\mid s,a,\overline{\widehat{e}_{t}^\textbf{n}})\widehat{e}_{1,t}^{\textbf{n} -\textbf{e}_1}(ds\times da)\Bigg |\Bigg |\overline{\hat{s}_{t}^{\textbf{n}}},\overline{\hat{a}_{t}^{\textbf{n}}}\right] \right] \right) ^2\\{} & {} \quad \le \frac{(n_1-1)^2}{n_1^2}\mathbb {E}\left[ \left( \mathbb {E}\left[ \Bigg |\int _{S\times A}f(s,a)\widehat{e}_{1,t+1}^\mathbf {n-\textbf{e}_1}(ds\times da)\right. \right. \right. \\{} & {} \qquad -\left. \left. \left. \int _{S\times A}\int _{S\times A}f(s',a') \pi _{t+1}^1(da'\mid s')Q^1(ds'\mid s,a,\overline{\widehat{e}_{t}^\textbf{n}}) \widehat{e}_{1,t}^{\textbf{n}-\textbf{e}_1}(ds\times da)\Bigg |\Bigg |\overline{\hat{s}_{t}^{\textbf{n}}},\overline{\hat{a}_{t}^{\textbf{n}}}\right] \right) ^2\right] \\{} & {} \quad \le \frac{1}{n_1^2}\mathbb {E}\left[ \sum _{k=2}^{n_1}\left[ \int _{S\times A}f^2(s',a') \pi _{t+1}^1(da'\mid s')Q^1(ds'\mid \hat{s}_{t,k}^{\textbf{n},1},\hat{a}_{t,k}^{\textbf{n},1}, \overline{\widehat{e}_{t}^\textbf{n}})\right. \right. \\{} & {} \qquad +\left. \left. \left( \int _{S\times A}f(s',a')\pi _{t+1}^1(da'\mid s')Q^1(ds'\mid \hat{s}_{t,k}^{\textbf{n},1},\hat{a}_{t,k}^{\textbf{n},1},\overline{\widehat{e}_{t}^\textbf{n}})\right) ^2\right] \right] \\{} & {} \quad \le \frac{\Vert f\Vert _w^2}{n_1^2}\mathbb {E}\left[ \sum _{k=2}^{n_1} \left[ \int _{S\times A}w^2(s',a')\pi _{t+1}^1(da'\mid s')Q^1(ds'\mid \hat{s}_{t,k}^{\textbf{n},1},\hat{a}_{t,k}^{\textbf{n},1},\overline{\widehat{e}_{t}^\textbf{n}})\right. \right. \\{} & {} \qquad +\left. \left. \left( \int _{S\times A}w(s',a')\pi _{t+1}^1(da'\mid s')Q^1(ds'\mid \hat{s}_{t,k}^{\textbf{n},1},\hat{a}_{t,k}^{\textbf{n},1},\overline{\widehat{e}_{t}^\textbf{n}})\right) ^2\right] \right] \\{} & {} \quad \le \frac{\Vert f\Vert _w^2}{n_1^2}\sum _{k=2}^{n_1}\mathbb {E} \left[ Bw^2(\hat{s}_{t,k}^{\textbf{n},1})+\alpha ^2w^2(\hat{s}_{t,k}^{\textbf{n},1})\right] \end{aligned}$$

with the second inequality following from Lemma 6.2 in [11], the third one from the definition of the w-norm and the last one from (b) of assumption (A2’) and (b) of assumption (A5). As by assumption (A1’) and (b) of assumption (A5), for each k and any $\textbf{n}$, $\mathbb {E}[ w^2(\hat{s}_{t,k}^{\textbf{n},1})]\le B^tB_0$, this implies that the expectaton of the second term on the RHS of (9) also converges to zero when $\textbf{n}\rightarrow 0$.

We finish the proof by showing that the same is true for the third term. In order to do it, let us introduce the function $\phi _t:\Pi _{i=1}^N\Delta _w(D^i)\rightarrow \mathbb {R}$ with the formula

$$\begin{aligned} \phi _t(\overline{\tau }):=\int _{S\times A}\int _{S\times A}f(s',a')\pi _{t+1}^1(da'\mid s)Q^1(ds'\mid s,a,\overline{\tau })\tau ^1(ds\times da). \end{aligned}$$

Note that the third term on the RHS of (9) can be rewritten using $\phi _t$ as

$$\begin{aligned} \left|\phi _t\left( \overline{\widehat{e}_{t}^\textbf{n}}\right) -\phi _t\left( \tau ^{*1}_{t}\right) \right|. \end{aligned}$$

(10)

As by the induction hypothesis $\lim _{\textbf{n}\rightarrow \infty }W_1\left( \mathcal {L}\left( \widehat{e}_{i,t}^\textbf{n}\right) ,\delta _{\tau ^{*i}_t}\right) =0$, $i=1,\ldots ,N$, by Lemma 2 showing that the expected value of (10) goes to zero as $\textbf{n}\rightarrow \infty $ only requires proving that $\phi _t\in C_w\left( \Pi _{i=1}^N\Delta _w(D^i)\right) $. We start by showing that it has a finite $\Vert \cdot \Vert _w^*$ norm:

$$\begin{aligned} \left\| \phi _t\right\| _w^*= & {} \sup _{\overline{\tau }\in \Pi _{i=1}^N\Delta _w(D^i)}\frac{\left|\phi _t(\overline{\tau })\right|}{\sum _{i=1}^N\int _{S\times A}w(s)\tau ^i(ds\times da)}\\= & {} \sup _{\overline{\tau }\in \Pi _{i=1}^N\Delta _w(D^i)}\frac{\left|\int _{S\times A}\int _{S\times A}f(s',a')\pi _{t+1}^1(da'\mid s)Q^1(ds'\mid s,a,\overline{\tau })\tau ^1(ds\times da)\right|}{\sum _{i=1}^N\int _{S\times A}w(s)\tau ^i(ds\times da)}\\\le & {} \sup _{\overline{\tau }\in \Pi _{i=1}^N\Delta _w(D^i)}\frac{ \int _{S\times A}\int _{S}\Vert f\Vert _ww(s')Q^1(ds'\mid s,a,\overline{\tau })\tau ^1(ds\times da)}{\sum _{i=1}^N\int _{S\times A}w(s)\tau ^i(ds\times da)}\\< & {} \sup _{\overline{\tau }\in \Pi _{i=1}^N\Delta _w(D^i)}\frac{ \Vert f\Vert _w\alpha \int _{S\times A}w(s)\tau ^1(ds\times da)}{\int _{S\times A}w(s)\tau ^1(ds\times da)}=\Vert f\Vert _w\alpha <\infty , \end{aligned}$$

with the penultimate inequality following from part (b) of the assumption (A2’) and the fact that $w\ge 1$.

We next show that $\phi _t$ is continuous. Let $\{ \overline{\tau }_k\}_{k\ge 1}\subset \Pi _{i=1}^N\Delta _w(D^i)$ be a sequence converging to $\overline{\tau }\in \Pi _{i=1}^N\Delta _w(D^i)$. Let us further define the functions $\widetilde{\phi _t^k}:S\times A\rightarrow \mathbb {R}$ where $k=1,2,\ldots $ and $\widetilde{\phi _t}:S\times A\rightarrow \mathbb {R}$ by

$$\begin{aligned} \widetilde{\phi _t^k}(s,a):= & {} \int _{S\times A}f(s',a')\pi _{t+1}^1(da'\mid s)Q^1(ds'\mid s,a,\overline{\tau }_k)\\ \widetilde{\phi _t}(s,a):= & {} \int _{S\times A}f(s',a')\pi _{t+1}^1(da'\mid s)Q^1(ds'\mid s,a,\overline{\tau }) \end{aligned}$$

We will show that $\widetilde{\phi _t^k}$ converges continuously to $\widetilde{\phi _t}$. Let $\{ s_k\}_{k\ge 1}\subset S$ and $\{ a_k\}_{k\ge 1}\subset A$ be sequences converging to $s^*$ and $a^*$ respectively. Then

$$\begin{aligned}{} & {} \left|\widetilde{\phi _t^k}(s_k,a_k)-\widetilde{\phi _t}(s^*,a^*)\right|\\{} & {} \quad =\Bigg |\int _{S\times A}f(s',a')\pi _{t+1}^1(da'\mid s)Q^1(ds'\mid s_k,a_k,\overline{\tau }_k)\\{} & {} \qquad - \int _{S\times A}f(s',a')\pi _{t+1}^1(da'\mid s)Q^1(ds'\mid s^*,a^*,\overline{\tau })\Bigg |\\{} & {} \quad \le \int _{S\times A}|f(s',a')\pi _{t+1}^1(da'\mid s)|\left|Q^1(ds'\mid s_k,a_k,\overline{\tau }_k)-Q^1(ds'\mid s^*,a^*,\overline{\tau })\right|\\{} & {} \quad \le \Vert f\Vert _w^*\int _Sw(s')\left|Q^1(ds'\mid s_k,a_k,\overline{\tau }_k)-Q^1(ds'\mid s^*,a^*,\overline{\tau })\right|, \end{aligned}$$

but the last expression goes to zero as $k\rightarrow \infty $ by (a) of assumption (A2’), proving that $\widetilde{\phi _t^k}$ converges continuously to $\widetilde{\phi _t}$. Moreover, clearly, this time by part (b) of the assumption (A2’), for each k,

$$\begin{aligned} \left|\widetilde{\phi _t^k}(s,a)\right|= & {} \int _{S\times A}f(s',a')\pi _{t+1}^1(da'\mid s)Q^1(ds'\mid s,a,\overline{\tau }_k)\\\le & {} \Vert f\Vert _w^*\int _{S}w(s'Q^1(ds'\mid s,a,\overline{\tau }_k)\le \Vert f\Vert _w^*\alpha w(s), \end{aligned}$$

which makes the absolute values of each $\widetilde{\phi _t^k}$ bounded above by a $\tau ^1$-integrable function.

Now we can apply Theorem 3.3 in [15] to the sequence of functions $\{\widetilde{\phi _t^k}\}_{k\ge 1}$ and the sequence of measures $\{\tau ^1_k\}_{k\ge 1}$ obtaining

$$\begin{aligned} \phi _t(\overline{\tau }_k)=\int _{S\times A}\widetilde{\phi _t^k}(s,a)\tau ^1_k(ds\times da)\rightarrow _{k\rightarrow \infty }\int _{S\times A}\widetilde{\phi _t}(s,a)\tau ^1(ds\times da)=\phi _t(\overline{\tau }) \end{aligned}$$

ending the proof of the continuity of $\phi _t$, which also ends the proof that the expectation of the third term on the RHS of (9) goes to zero. $\square $

In the third lemma, we prove an auxiliary result used to show that the rewards of the first player in the first population from using strategy vector $\widehat{\pi }$ in the n-person counterparts of the mean-field game converge to that in the mean-field limit.

Lemma 4

Fix any $t\ge 0$ and $i\in \{ 1,\ldots ,N\}$ and suppose that

$$\begin{aligned} \lim _{\textbf{n}\rightarrow \infty }\left|\int _{D^i}g_{\textbf{n}}(s,a)\mathcal {L}\left( \hat{s}_{t,1}^{\textbf{n},i},\hat{a}_{t,1}^{\textbf{n},i}\right) (ds\times da) -\int _{D^i}g_{\textbf{n}}(s,a)\mathcal {L}\left( \hat{s}_{t,1}^{i},\hat{a}_{t,1}^{i}\right) (ds\times da)\right|=0 \end{aligned}$$

for any function $g_\textbf{n}\in C_w(D^i)$, $\textbf{n}\in \mathbb {N}^N$ satisfying $\sup _{\textbf{n}\in \mathbb {N}^N}\Vert g_\textbf{n}\Vert _w<\infty $. Moreover, suppose that the family $\{ h_\textbf{n}:\; \textbf{n}\in \mathbb {N}^N\}$ of real-valued functions defined on $D^i\times \Pi _{j=1}^N\Delta _w(D^j)$ satisfying

(a)
The family $\{ h_\textbf{n}(s^i,a^i,\cdot ),\; (s^i,a^i)\in D^i, \textbf{n}\in \mathbb {N}^N\}$ is equicontinuous with respect to the product w-topology.
(b)
$h_\textbf{n}(\cdot ,\cdot ,\overline{\tau })\in C_w(D^i)$ for any $\overline{\tau }\in \Pi _{j=1}^N\Delta _w(D^j)$ and $\textbf{n}\in \mathbb {N}^N$.
(c)
$\sup _{\textbf{n}\in \mathbb {N}^N}\Vert h_\textbf{n}(\cdot ,\cdot ,\overline{\tau })\Vert _w<\infty $ for $\overline{\tau }\in \Pi _{j=1}^N\Delta _w(D^j)$.
(d)
The function
$$F^i_t(\overline{\tau }):=\sup _{(s,a)\in D^i,\textbf{n}\in \mathbb {N}^N}\left|h_\textbf{n}(s,a,\overline{\tau })-h_\textbf{n}(s,a,\overline{\tau _t^*})\right|$$
defined for $\overline{\tau }\in \Pi _{j=1}^N\Delta _w(D^j)$ is real-valued and $\left\| F^i_t\right\| _w^*<\infty $.

Then

$$\begin{aligned}{} & {} \lim _{\textbf{n}\rightarrow \infty }\Bigg |\int _{D^i\times \Pi _{j=1}^N\Delta _w(D^j)}h_{\textbf{n}}(s,a,\overline{\tau })\mathcal {L}\left( \hat{s}_{t,1}^{\textbf{n},i},\hat{a}_{t,1}^{\textbf{n},i},\overline{\widehat{e}_{t}^\textbf{n}}\right) (ds\times da\times d\overline{\tau })\\{} & {} \quad -\int _{D^i\times \Pi _{j=1}^N\Delta _w(D^j)}h_{\textbf{n}}(s,a,\overline{\tau })\mathcal {L}\left( \hat{s}_{t,1}^{i},\hat{a}_{t,1}^{i},\delta _{\overline{\tau _t^*}}\right) (ds\times da\times d\overline{\tau })\Bigg |=0 \end{aligned}$$

Proof

Let us fix an arbitrary family $\{ h_\textbf{n}:\; \textbf{n}\in \mathbb {N}^N\}$ satisfying the hypothesis of the lemma. Then we have

$$\begin{aligned}{} & {} \Bigg |\int _{D^i\times \Pi _{j=1}^N\Delta _w(D^j)}h_{\textbf{n}}(s,a,\overline{\tau }) \mathcal {L}\left( \hat{s}_{t,1}^{\textbf{n},i},\hat{a}_{t,1}^{\textbf{n},i}, \overline{\widehat{e}_{t}^\textbf{n}}\right) (ds\times da\times d\overline{\tau })\\{} & {} \qquad -\int _{D^i\times \Pi _{j=1}^N\Delta _w(D^j)}h_{\textbf{n}}(s,a,\overline{\tau }) \mathcal {L}\left( \hat{s}_{t,1}^{i},\hat{a}_{t,1}^{i},\delta _{\overline{\tau _t^*}}\right) (ds\times da\times d\overline{\tau })\Bigg |\\{} & {} \quad \le \Bigg |\int _{D^i\times \Pi _{j=1}^N\Delta _w(D^j)}h_{\textbf{n}} (s,a,\overline{\tau })\mathcal {L}\left( \hat{s}_{t,1}^{\textbf{n},i},\hat{a}_{t,1}^{\textbf{n},i}, \overline{\widehat{e}_{t}^\textbf{n}}\right) (ds\times da\times d\overline{\tau })\\{} & {} \qquad -\int _{D^i\times \Pi _{j=1}^N\Delta _w(D^j)}h_{\textbf{n}}(s,a,\overline{\tau }) \mathcal {L}\left( \hat{s}_{t,1}^{\textbf{n},i},\hat{a}_{t,1}^{\textbf{n},i}, \delta _{\overline{\tau _t^*}}\right) (ds\times da\times d\overline{\tau })\Bigg |\\{} & {} \qquad +\Bigg |\int _{D^i\times \Pi _{j=1}^N\Delta _w(D^j)}h_{\textbf{n}} (s,a,\overline{\tau })\mathcal {L}\left( \hat{s}_{t,1}^{\textbf{n},i},\hat{a}_{t,1}^{\textbf{n},i}, \delta _{\overline{\tau _t^*}}\right) (ds\times da\times d\overline{\tau })\\{} & {} \qquad -\int _{D^i\times \Pi _{j=1}^N\Delta _w(D^j)}h_{\textbf{n}}(s,a,\overline{\tau }) \mathcal {L}\left( \hat{s}_{t,1}^{i},\hat{a}_{t,1}^{i},\delta _{\overline{\tau _t^*}}\right) (ds\times da\times d\overline{\tau })\Bigg |\\ \end{aligned}$$

The second term on the RHS can be rewritten as

$$\begin{aligned} \left|\int _{D^i}h_{\textbf{n}}(s,a,\overline{\tau _t^*})\mathcal {L}\left( \hat{s}_{t,1}^{\textbf{n},i},\hat{a}_{t,1}^{\textbf{n},i}\right) (ds\times da) -\int _{D^i}h_{\textbf{n}}(s,a,\overline{\tau _t^*})\mathcal {L}\left( \hat{s}_{t,1}^{i},\hat{a}_{t,1}^{i}\right) (ds\times da)\right|, \end{aligned}$$

which goes to zero as $\textbf{n}\rightarrow \infty $ by the assumptions of the lemma. Next, we show that the same is true for the first term. Note that

$$\begin{aligned}{} & {} \Bigg |\int _{D^i\times \Pi _{j=1}^N\Delta _w(D^j)}h_{\textbf{n}}(s,a,\overline{\tau })\mathcal {L}\left( \hat{s}_{t,1}^{\textbf{n},i},\hat{a}_{t,1}^{\textbf{n},i},\overline{\widehat{e}_{t}^\textbf{n}}\right) (ds\times da\times d\overline{\tau })\nonumber \\{} & {} \qquad -\int _{D^i\times \Pi _{j=1}^N\Delta _w(D^j)}h_{\textbf{n}}(s,a,\overline{\tau })\mathcal {L}\left( \hat{s}_{t,1}^{\textbf{n},i},\hat{a}_{t,1}^{\textbf{n},i},\delta _{\overline{\tau _t^*}}\right) (ds\times da\times d\overline{\tau })\Bigg |\nonumber \\{} & {} \quad \le \mathbb {E}\left[ \mathbb {E}\left[ \left|h_{\textbf{n}}(s,a,\overline{\widehat{e}_{t}^\textbf{n}}) -h_{\textbf{n}}(s,a,\overline{\tau _t^*})\right|\mid (s,a)=\left( \hat{s}_{t,1}^{\textbf{n},i},\hat{a}_{t,1}^{\textbf{n},i}\right) \right] \right] \le \mathbb {E}\left[ F^i_t(\overline{\widehat{e}_{t}^\textbf{n}})\right] \end{aligned}$$

(11)

We next show that $F^i_t$ is continuous with respect to the product w-topology. Suppose $\{\overline{\tau }_k\}_{k\ge 1}\subset \Pi _{i=1}^N\Delta _w(D^i)$ is a sequence converging to $\overline{\tau }\in \Pi _{i=1}^N\Delta _w(D^i)$. Then

$$\begin{aligned} \Bigg |F^i_t\left( \overline{\tau }_k\right) -F^i_t\left( \overline{\tau }\right) \Bigg |= & {} \Bigg |\sup _{(s,a)\in D^i,\textbf{n}\in \mathbb {N}^N}\Bigg |h_\textbf{n}(s,a,\overline{\tau }_k)-h_\textbf{n}(s,a,\overline{\tau _t^*})\Bigg |\\{} & {} \quad -\sup _{(s,a)\in D^i,\textbf{n}\in \mathbb {N}^N}\Bigg |h_\textbf{n}(s,a,\overline{\tau })-h_\textbf{n}(s,a,\overline{\tau _t^*})\Bigg |\Bigg |\\\le & {} \sup _{(s,a)\in D^i,\textbf{n}\in \mathbb {N}^N}\Bigg |h_\textbf{n}(s,a,\overline{\tau }_k)-h_\textbf{n}(s,a,\overline{\tau })\Bigg |\rightarrow _{k\rightarrow \infty }0 \end{aligned}$$

by the equicontinuity of the family $\{ h_\textbf{n}(s^i,a^i,\cdot ),\; (s^i,a^i)\in D^i, \textbf{n}\in \mathbb {N}^N\}$. Since by the assumption of the lemma also $\Vert F^i_t\Vert _w^*<0$, this implies that $F^i_t\in C_w\left( \Pi _{i=1}^N\Delta _w(D^i)\right) $, by Lemma 2 the RHS in (11) goes to zero as $\textbf{n}\rightarrow \infty $, ending the proof of the lemma $\square $

In the penultimate lemma we show that the expected rewards obtained by the first player in any population in n-person counterparts of the discounted-reward mean-field game converge to his expected reward in the mean-field game for all moments of time t.

Lemma 5

For any $t\ge 0$ and $i\in \{ 1,\ldots ,N\}$ we have

$$\begin{aligned} \lim _{\textbf{n}\rightarrow \infty }\left|\mathbb {E}\left[ r^i\left( \hat{s}_{t,1}^{\textbf{n},i},\hat{a}_{t,1}^{\textbf{n},i},\overline{\widehat{e}_{t}^\textbf{n}}\right) \right] -\mathbb {E}\left[ r^i\left( \hat{s}_{t,1}^{i},\hat{a}_{t,1}^{i},\overline{\tau _t^*}\right) \right] \right|=0. \end{aligned}$$

Proof

We start by showing that for any family $\{ g_\textbf{n}\}_{\textbf{n}\in \mathbb {N}^N}\subset C_w(D^i)$, satisfying $\sup _{\textbf{n}\in \mathbb {N}^N}\Vert g_\textbf{n}\Vert _w<\infty $,

$$\begin{aligned} \lim _{\textbf{n}\rightarrow \infty }\left|\int _{D^i}g_{\textbf{n}}(s,a)\mathcal {L}\left( \hat{s}_{t,1}^{\textbf{n},i},\hat{a}_{t,1}^{\textbf{n},i}\right) (ds\times da) -\int _{D^i}g_{\textbf{n}}(s,a)\mathcal {L}\left( \hat{s}_{t,1}^{i},\hat{a}_{t,1}^{i}\right) (ds\times da)\right|=0 \end{aligned}$$

We will show it by induction on t. The claim holds trivially for $t=0$, as in this case $\mathcal {L}\left( \hat{s}_{t,1}^{\textbf{n},i},\hat{a}_{t,1}^{\textbf{n},i}\right) =\mathcal {L}\left( \hat{s}_{t,1}^{i},\hat{a}_{t,1}^{i}\right) =\Pi _0^i(\pi ^i,\mu _0^{*i})$. Suppose the claim holds for t and consider $t+1$. The first assumption of Lemma 4 is satisfied at time t by the induction hypothesis. Let us next define the family of real-valued functions defined on $D^i\times \Pi _{j=1}^N\Delta _w(D^j)$, $\{ h_\textbf{n}:\; \textbf{n}\in \mathbb {N}^N\}$ using the formula

$$\begin{aligned} h_\textbf{n}(s,a,\overline{\tau }):=\int _{D^i }g_{\textbf{n}}(s',a')\widehat{\pi _{t+1,1}^i}(da'\mid s')Q^i(ds'\mid s,a,\overline{\tau }). \end{aligned}$$

We will next show that it satisfies the assumptions of Lemma 4. Let $L:=\sup _{\textbf{n}\in \mathbb {N}^N}\Vert g_\textbf{n}\Vert _w$. Then for any $\overline{\tau },\overline{\eta }\in \Pi _{j=1}^N\Delta _w(D^j)$, we have

$$\begin{aligned}{} & {} \sup _{(s,a)\in D^i,\textbf{n}\in \mathbb {N}^N}\left|h_\textbf{n}(s,a,\overline{\tau })-h_\textbf{n}(s,a,\overline{\eta })\right|\\{} & {} \quad \le L\sup _{(s,a)\in D^i,\textbf{n}\in \mathbb {N}^N}\left\| Q^i(\cdot \mid s,a,\overline{\tau })-Q^i(\cdot \mid s,a,\overline{\eta })\right\| _w\le L\omega _{Q^i}\left( \sum _{j=1}^N\tilde{\rho }_w(\tau ^j,\eta ^j)\right) . \end{aligned}$$

As by (a) of assumption (A5), $\omega _{Q^i}(\delta )\rightarrow 0$ when $\delta \rightarrow 0$, this implies that the family $\{ h_\textbf{n}(s,a,\cdot ),\; (s,a)\in D^i, \textbf{n}\in \mathbb {N}^N\}$ is equicontinuous. Moreover, the function

$$\begin{aligned}{} & {} F^i_t(\overline{\tau })=\sup _{(s,a)\in D^i,\textbf{n}\in \mathbb {N}^N}\left|h_\textbf{n}(s,a,\overline{\tau })-h_\textbf{n}(s,a,\overline{\tau _t^*})\right|\\{} & {} \quad \le L\omega _{Q^i}\left( \sum _{j=1}^N\tilde{\rho }_w(\tau ^j,\tau ^{*j})\right) \le L\Omega _Q^{\overline{\tau ^*_t}}\left( \overline{\eta }\right) , \end{aligned}$$

hence, it is real-valued and $\left\| F^i_t\right\| _w^*\le L\left\| \Omega _Q^{\overline{\tau ^*_t}}\right\| _w^*<\infty $, again by (a) of assumption (A5).

Next, note that for any $\overline{\tau }\in \Pi _{j=1}^N\Delta _w(D^j)$, $\sup _{\textbf{n}\in \mathbb {N}^N}\Vert h_\textbf{n}(\cdot ,\cdot ,\overline{\tau })\Vert _w\le \alpha L$ by (b) of assumption (A2’). Finally, we need to check that for any $\overline{\tau }$, $h_\textbf{n}(\cdot ,\cdot ,\overline{\tau })$ is continuous. To this end let us first define the function

$$\begin{aligned} l(s):=\int _A g_{\textbf{n}}(s,a)\widehat{\pi _{t+1,1}^i}(da\mid s). \end{aligned}$$

Clearly, l is a continuous function, given that for any sequence $\{ s^k\}_{k\ge 1}\subset S$ converging to some $s^*$, we have

$$\begin{aligned} l(s^k)=\int _A g_{\textbf{n}}(s^k,a)\widehat{\pi _{t+1,1}^i}(da\mid s^k)\rightarrow _{k\rightarrow \infty }\int _A g_{\textbf{n}}(s^*,a)\widehat{\pi _{t+1,1}^i}(da\mid s^*)=l(s^*) \end{aligned}$$

by Theorem 3.3 in [15] as $g_{\textbf{n}}$ is continuous and $\widehat{\pi _{t,1}^i}$ is weakly continuous by the assumption we have made about the strategy $\widehat{\pi }$ (remember also that A is compact, so $g_{\textbf{n}}$ is bounded on the set $\left( \{ s^k\}_{k\ge 1}\cup \{ s^*\}\right) \times A$). Moreover, $\Vert l\Vert _w\le \Vert g_\textbf{n}\Vert _w\le L$.

Next, let us take a sequence $\{ (s^k,a^k)\}_{k\ge 1}\subset D^i$ converging to some $(s^*,a^*)$. Clearly,

$$\begin{aligned} h_\textbf{n}(s^k,a^k,\overline{\tau })=\int _{S^i}l(s')Q^i(ds'\mid s^k,a^k,\overline{\tau }) \end{aligned}$$

which, again by Theorem 3.3 in [15], converges to

$$\begin{aligned} h_\textbf{n}(s^*,a^*,\overline{\tau })=\int _{S^i}l(s')Q^i(ds'\mid s^*,a^*,\overline{\tau }), \end{aligned}$$

as $l\in C_w(S^i)$ and $Q^i$ is weakly continuous by assumption (A2’).

As we have shown that the family $\{ h_\textbf{n}:\; \textbf{n}\in \mathbb {N}^N\}$ satisfies all the assumptions given in Lemma 4, we can conclude as follows:

$$\begin{aligned}{} & {} \lim _{\textbf{n}\rightarrow \infty }\Bigg |\int _{D^i}g_{\textbf{n}}(s,a)\mathcal {L}\left( \hat{s}_{t+1,1}^{\textbf{n},i},\hat{a}_{t+1,1}^{\textbf{n},i}\right) (ds\times da)\\{} & {} \qquad -\int _{D^i}g_{\textbf{n}}(s,a)\mathcal {L}\left( \hat{s}_{t+1,1}^{i},\hat{a}_{t+1,1}^{i}\right) (ds\times da)\Bigg |\\{} & {} \quad =\lim _{\textbf{n}\rightarrow \infty }\Bigg |\int _{D^i\times \Pi _{j=1}^N\Delta _w(D^j)}h_{\textbf{n}}(s,a,\overline{\tau })\mathcal {L}\left( \hat{s}_{t,1}^{\textbf{n},i},\hat{a}_{t,1}^{\textbf{n},i},\overline{\widehat{e}_{t}^\textbf{n}}\right) (ds\times da\times d\overline{\tau })\\{} & {} \qquad -\int _{D^i\times \Pi _{j=1}^N\Delta _w(D^j)}h_{\textbf{n}}(s,a,\overline{\tau })\mathcal {L}\left( \hat{s}_{t,1}^{i},\hat{a}_{t,1}^{i},\delta _{\overline{\tau _t^*}}\right) (ds\times da\times d\overline{\tau })\Bigg |=0, \end{aligned}$$

showing the induction hypothesis.

The last step of the proof is showing that the function $r^i$ satisfies all the assumptions of Lemma 4 (when taking $h_{\textbf{n}}\equiv r^i$ for $\textbf{n}\in \mathbb {N}^N$). Obviously, $r^i(\cdot ,\cdot ,\overline{\tau })\in C_w(D^i)$ for any $\overline{\tau }\in \Pi _{j=1}^N\Delta _w(D^j)$ by assumption (A1’). Then for any $\overline{\tau },\overline{\eta }\in \Pi _{j=1}^N\Delta _w(D^j)$, we have

$$\begin{aligned} \sup _{(s,a)\in D^i}\left|r^i(s,a,\overline{\tau })-r^i(s,a,\overline{\eta })\right|\le \omega _{r^i}\left( \sum _{j=1}^N\tilde{\rho }_w(\tau ^j,\eta ^j)\right) . \end{aligned}$$

As by (a) of assumption (A5), $\omega _{r^i}(\delta )\rightarrow 0$ when $\delta \rightarrow 0$, this implies that the family $\{ r^i(s,a,\cdot ),\; (s,a)\in D^i\}$ is equicontinuous. Moreover, the function

$$\begin{aligned} \widetilde{F}^i_t(\overline{\tau }):=\sup _{(s,a)\in D^i}\left|r^i(s,a,\overline{\tau })-r^i(s,a,\overline{\tau _t^*})\right|\le \omega _{r^i}\left( \sum _{j=1}^N\tilde{\rho }_w(\tau ^j,\tau ^{*j})\right) \le \Omega _r^{\overline{\tau ^*_t}}\left( \overline{\eta }\right) , \end{aligned}$$

hence, it is real-valued and $\left\| \widetilde{F}^i_t\right\| _w^*\le \left\| \Omega _Q^{\overline{\tau ^*_t}}\right\| _w^*<\infty $, again by (a) of assumption (A5). Now we can apply Lemma 4 to $r^i$, obtaining the thesis of the lemma. $\square $

The last lemma can be treated as a counterpart of Theorem 2.3 in [11] tailored to our model. It shows that in our considerations we can restrict ourselves to the weakly continuous deviations from the mean-field equilibrium strategy.

Lemma 6

For any i, under the assumptions of Theorem 1,

$$\begin{aligned} \sup _{\widehat{\pi _1^i}\in \mathcal {M}^i}\mathbb {E}\left[ J^{1,i}_{\beta ,n}(\overline{s_0},[\overline{\pi }_{-i,1},\widehat{\pi _1^i}])\mid \overline{s_0}\sim \overline{\mu }_0^*\right] \end{aligned}$$

can be obtained using policies $\widehat{\pi _1^i}$ such that $\widehat{\pi _{1,t}^i}$ is weakly continuous for any time $t\ge 0$.

Proof

Without loss of generality, we may show the result only for $i=1$. Let us choose an arbitrary policy $\widetilde{\pi _1^1}\in \mathcal {M}^1$. We will show that for any $\varepsilon >0$ there exists $\widehat{\pi _1^1}\in \mathcal {M}^1$ which is weakly continuous and satisfies

$$\begin{aligned} \mathbb {E}\left[ J^{1,1}_{\beta ,n}(\overline{s_0},[\overline{\pi }_{-1,1},\widehat{\pi _1^1}])\mid \overline{s_0}\sim \overline{\mu }_0^*\right] \ge \mathbb {E}\left[ J^{1,1}_{\beta ,n}(\overline{s_0},[\overline{\pi }_{-1,1},\widetilde{\pi _1^1}])\mid \overline{s_0}\sim \overline{\mu }_0^*\right] -\varepsilon , \end{aligned}$$

(12)

which will complete the proof.

We start by noting that the problem the first player in first population is facing is that of finding an optimal policy in a Markov decision process with state space $\Pi _{j=1}^N(S^j)^{n_j}$, time-dependent transition probability

$$\begin{aligned} \widetilde{Q}_t(d\overline{x}\mid \overline{s},a_1^1)= & {} \int _{A^1(s_2^1)}\!\!\!\!\!\!\!\!\ldots \int _{A^1(s_{n_1}^1)}\!\!\!\ldots \int _{A^N(s_1^N)}\!\!\!\!\!\!\!\!\ldots \int _{A^N(s_{n_N}^N)} \Pi _{j=1}^N\Pi _{k=1}^{n_j}Q^j\left( dx^j_k\mid s_k^j,a_k^j,\overline{\tau }(\overline{s},\overline{a})\right) \\{} & {} \Pi _{k=2}^{n_1}\pi ^j_{t,k}(da_k^j\mid s_k^j)\Pi _{j=2}^N\Pi _{k=1}^{n_j}\pi ^j_{t,k}(da_k^j\mid s_k^j) \end{aligned}$$

and one-stage reward

$$\begin{aligned} \widetilde{r}_t(\overline{s},a_1^1)= & {} \int _{A^1(s_2^1)}\!\!\!\!\!\!\!\!\ldots \int _{A^1(s_{n_1}^1)}\!\!\!\ldots \int _{A^N(s_1^N)}\!\!\!\!\!\!\!\!\ldots \int _{A^N(s_{n_N}^N)}r^i\left( s_1^1,a_1^1,\overline{\tau }(\overline{s},\overline{a})\right) \\{} & {} \Pi _{k=2}^{n_1}\pi ^j_{t,k}(da_k^j\mid s_k^j)\Pi _{j=2}^N\Pi _{k=1}^{n_j}\pi ^j_{t,k}(da_k^j\mid s_k^j). \end{aligned}$$

By Lusin’s theorem (see Theorem 7.5.2 in [4]), for any $\delta >0$, there exists a closed set $F_0^\delta \in S^1$ such that $\mu _0^{*1}(F_0^\delta )<\delta $ and $\widetilde{\pi ^1_{0,1}}$ is weakly continuous on $F_0^\delta $. As $\Delta (A)$ is a convex subset of a locally convex vector space of finite signed measures on A, by Dugundji’s extension theorem (see Theorem 7.4 in [5]), we can extend $\widetilde{\pi ^1_{0,1}}$ limited to $F_0^\delta $ continuously to $S^1$. Let $\widetilde{\pi ^{1,\delta }_{0,1}}$ denote this extension. We then apply the same method to $\widetilde{\pi ^1_{1,1}}$, that is, we define the measure $\widetilde{\mu }_1$ on $S^1$ with the formula

$$\begin{aligned} \widetilde{\mu }_1(B):= & {} \int _B\int _{(S^1)^{n_1-1}}\ldots \int _{(S^N)^{n_N}}\int _A\widetilde{Q}_0\left( B\times (S^1)^{n_1-1}\times \Pi _{j=2}^N(S^j)^{n_j}\mid \overline{s},a_1^1\right) \\{} & {} \widetilde{\pi ^1_{0,1}}(da_1^1\mid s_1^1)\mu _0^{*1}(ds_1^1)\ldots \mu _0^{*1}(ds_{n_1}^1)\ldots \mu _0^{*N}(ds_1^N)\ldots \mu _0^{*1}(ds_{n_N}^N) \end{aligned}$$

and construct a continuous $\widetilde{\pi ^{1,\delta }_{1,1}}$ that agrees with $\widetilde{\pi ^1_{1,1}}$ on a closed subset $F_1^\delta $ of $S^1$ satisfying $\widetilde{\mu }_1(F_1^\delta )<\delta $. We continue in the same manner until time $t^*$, constructing measures $\widetilde{\mu }_t$ and weakly continuous stochastic kernels $\widetilde{\pi ^{1,\delta }_{t,1}}$ for $t=2,\ldots ,t^*$ and define $\widehat{\pi _1^1(\delta ,t^*)}$ as a Markov strategy for the first player in the first population of the form $\left( \widetilde{\pi ^{1,\delta }_{0,1}},\widetilde{\pi ^{1,\delta }_{1,1}},\ldots ,\widetilde{\pi ^{1,\delta }_{t^*,1}},\pi ^1_{t^*+1,1},\ldots \right) $ (remember that the kernel $\pi ^1_{t,1}$ is weakly continuous for any t by the assumption of the theorem). Now we can conclude as follows:

$$\begin{aligned}{} & {} \left|\mathbb {E}\left[ J^{1,1}_{\beta ,n}(\overline{s_0},[\overline{\pi }_{-1,1},\widehat{\pi _1^1(\delta ,t^*)}])\mid \overline{s_0}\sim \overline{\mu }_0^*\right] - \mathbb {E}\left[ J^{1,1}_{\beta ,n}(\overline{s_0},[\overline{\pi }_{-1,1},\widetilde{\pi _1^1}])\mid \overline{s_0}\sim \overline{\mu }_0^*\right] \right|\nonumber \\{} & {} \quad =\left|\mathbb {E}^{ \overline{\mu _0^*},Q_n,[\overline{\pi }_{-1,1},\widehat{\pi _1^1(\delta ,t^*)}]}\sum _{t=0}^\infty \beta ^tr^{1,1}_n(\overline{s_t},\overline{a_t})-\mathbb {E}^{ \overline{\mu _0^*},Q_n,[\overline{\pi }_{-1,1},\widetilde{\pi _1^1}]}\sum _{t=0}^\infty \beta ^tr^{1,1}_n(\overline{s_t},\overline{a_t})\right|\nonumber \\{} & {} \quad \le \left|\mathbb {E}^{ \overline{\mu _0^*},Q_n,[\overline{\pi }_{-1,1},\widehat{\pi _1^1(\delta ,t^*)}]}\sum _{t=0}^{t^*}\beta ^tr^{1,1}_n(\overline{s_t},\overline{a_t})-\mathbb {E}^{ \overline{\mu _0^*},Q_n,[\overline{\pi }_{-1,1},\widetilde{\pi _1^1}]}\sum _{t=0}^{t^*}\beta ^tr^{1,1}_n(\overline{s_t},\overline{a_t})\right|\nonumber \\{} & {} \qquad +\left|\mathbb {E}^{ \overline{\mu _0^*},Q_n,[\overline{\pi }_{-1,1},\widehat{\pi _1^1(\delta ,t^*)}]}\!\!\!\sum _{t=t^*+1}^\infty \beta ^tr^{1,1}_n(\overline{s_t},\overline{a_t})-\mathbb {E}^{ \overline{\mu _0^*},Q_n,[\overline{\pi }_{-1,1},\widetilde{\pi _1^1}]}\!\!\!\sum _{t=t^*+1}^\infty \beta ^tr^{1,1}_n(\overline{s_t},\overline{a_t})\right|\nonumber \\ \end{aligned}$$

(13)

Note that by assumption (A1’), (b) of assumption (A2’) and the definition of $r^{1,1}_n$, for any t we have

$$\begin{aligned} R\ge \mathbb {E}^{\overline{\mu _0^*},Q_n,\cdot }\left[ r^{1,1}_n(\overline{s_t},\overline{a_t})\right] \ge -R\gamma ^t\mathbb {E}^{\overline{\mu _0^*},Q_n,\cdot }\left[ w(s_{t,1}^1)\right] \ge -R\gamma ^t\alpha ^tM, \end{aligned}$$

(14)

regardless of the strategy used. Therefore, the second term on the RHS of (13) can be bounded above by

$$\begin{aligned} \sum _{t=t^*+1}^\infty \beta ^t\left( R+R\gamma ^t\alpha ^tM\right) =\beta ^{t^*+1}\left( \frac{R}{1-\beta }+\frac{RM\alpha ^{t^*}\gamma ^{t^*}}{1-\alpha \beta \gamma }\right) , \end{aligned}$$

which goes to 0 as $t^*\rightarrow \infty $. Obviously, this implies that there exists a value of $t^*$, call it $\widehat{t^*}$, for which the second term on the RHS of (13) is smaller than $\frac{\varepsilon }{2}$.

Next, note that, by (14) and (b) of assumption (A2’), the first term on the RHS of (13) is bounded above by

$$\begin{aligned}{} & {} \sum _{t=0}^{t^*}\beta ^t\left|R\gamma ^t\alpha ^t\int _{F_0^\delta }w(s_{0,1}^1)\mu _0^{*1}(ds_{0,1}^1) +R\delta \right|\\{} & {} \quad + \sum _{t=1}^{t^*}\beta ^t\left|R\gamma ^t\alpha ^{t-1}\int _{F_0^\delta }w(s_{1,1}^1)\widetilde{\mu }_1(ds_{1,1}^1)+R\delta \right|\\{} & {} \quad +\ldots +\sum _{t=t^*}^{t^*}\beta ^t\left|R\gamma ^t\alpha ^{t-t^*}\int _{F_0^\delta }w(s_{t^*,1}^1)\widetilde{\mu }_{t^*}(ds_{t^*,1}^1)+R\delta \right|\end{aligned}$$

This sum can be made arbitrarily small, say smaller than $\frac{\varepsilon }{2}$ for $t^*=\widehat{t^*}$ by taking appropriate $\delta =\delta ^*$.

This however implies that $\widehat{\pi ^1_k}=\widehat{\pi _1^1(\delta ^*,\widehat{t^*})}$ satisfies (12). $\square $

Proof of Theorem 1

Take $\varepsilon >0$ and note that by Lemma 6 for each population i there exists a weakly continuous policy $\widehat{\pi _1^i}\in \mathcal {M}^i$ such that

$$\begin{aligned} \mathbb {E}\left[ J^{1,i}_{\beta ,n}(\overline{s_0},[\overline{\pi }_{-i,1},\widehat{\pi _1^i}])\mid \overline{s_0}\sim \overline{\mu }_0^*\right] \ge \sup _{\widetilde{\pi _1^i}\in \mathcal {M}^i}\mathbb {E}\left[ J^{1,i}_{\beta ,n}(\overline{s_0},[\overline{\pi }_{-i,1},\widetilde{\pi _1^i}])\mid \overline{s_0}\sim \overline{\mu }_0^*\right] -\frac{\varepsilon }{5}. \end{aligned}$$

(15)

Next note that for any $t^*\ge 0$ we have

$$\begin{aligned}{} & {} \left|\mathbb {E}\left[ J^{1,i}_{\beta ,n}(\overline{s_0},[\overline{\pi }_{-i,1},\widehat{\pi _1^i}])\mid \overline{s_0}\sim \overline{\mu }_0^*\right] -\mathbb {E}\left[ J^i_\beta (s_0^i,\overline{\mu }_0^*,\widehat{\pi _1^i},\overline{\pi })\mid s_0^i\sim \mu _0^{*i}\right] \right|\\{} & {} \quad \le \sum _{t=0}^{t^*}\beta ^t\left|\mathbb {E}\left[ r^i\left( \hat{s}_{t,1}^{\textbf{n},i},\hat{a}_{t,1}^{\textbf{n},i},\overline{\widehat{e}_{t}^\textbf{n}}\right) \right] -\mathbb {E}\left[ r^i\left( \hat{s}_{t,1}^{i},\hat{a}_{t,1}^{i},\overline{\tau _t^*}\right) \right] \right|\\{} & {} \qquad +\left|\mathbb {E}\left[ \sum _{t=t^*+1}^\infty \beta ^t\left( r^i\left( \hat{s}_{t,1}^{\textbf{n},i},\hat{a}_{t,1}^{\textbf{n},i},\overline{\widehat{e}_{t}^\textbf{n}}\right) - r^i\left( \hat{s}_{t,1}^{i},\hat{a}_{t,1}^{i},\overline{\tau _t^*}\right) \right) \right] \right|\end{aligned}$$

The first $t^*$ terms on the RHS go to zero as $\textbf{n}\rightarrow \infty $ by Lemma 5, while (14) implies that the last term can be bounded above by

$$\begin{aligned} \sum _{t=t^*+1}^\infty \beta ^t\left( R+R\gamma ^t\alpha ^tM\right) =\beta ^{t^*+1}\left( \frac{R}{1-\beta }+\frac{RM\alpha ^{t^*+1}\gamma ^{t^*+1}}{1-\alpha \beta \gamma }\right) \rightarrow _{t^*\rightarrow \infty }0. \end{aligned}$$

Hence, we can fix $t^*$ such that $\beta ^{t^*+1}\left( \frac{R}{1-\beta }+\frac{RM\alpha ^{t^*+1}\gamma ^{t^*+1}}{1-\alpha \beta \gamma }\right) <\frac{\varepsilon }{5}$ and $n_1^i(\varepsilon )$, ..., $n_N^i(\varepsilon )$ such that for $n_1\ge n_1^i(\varepsilon )$, ..., $n_N\ge n_N^i(\varepsilon )$, we have

$$\begin{aligned} \left|\mathbb {E}\left[ J^{1,i}_{\beta ,n}(\overline{s_0},[\overline{\pi }_{-i,1},\widehat{\pi _1^i}])\mid \overline{s_0}\sim \overline{\mu }_0^*\right] -\mathbb {E}\left[ J^i_\beta (s_0^i,\overline{\mu }_0^*,\widehat{\pi _1^i},\overline{\pi })\mid s_0^i\sim \mu _0^{*i}\right] \right|<\frac{2\varepsilon }{5}. \end{aligned}$$

(16)

Using similar reasoning, we may find $\widehat{n_1^i}(\varepsilon )$, ..., $\widehat{n_N^i}(\varepsilon )$ such that for $n_1\ge \widehat{n_1^i}(\varepsilon )$, ..., $n_N\ge \widehat{n_N^i}(\varepsilon )$,

$$\begin{aligned} \left|\mathbb {E}\left[ J^{1,i}_{\beta ,n}(\overline{s_0},\overline{\pi })\mid \overline{s_0}\sim \overline{\mu }_0^*\right] -\mathbb {E}\left[ J^i_\beta (s_0^i,\overline{\mu }_0^*,\pi _1^i,\overline{\pi })\mid s_0^i\sim \mu _0^{*i}\right] \right|<\frac{2\varepsilon }{5}. \end{aligned}$$

(17)

If we take $n_i(\varepsilon ):=\max \{ \max \{ n_i^j(\varepsilon ), j=1,\ldots ,N\}, \max \{ \widehat{n_i^j}(\varepsilon ), j=1,\ldots ,N\}\}$, $i=1,\ldots ,N$, the definition of the Markov mean-field equilibrium, (16) and (17) imply that

$$\begin{aligned}{} & {} \mathbb {E}\left[ J^{1,i}_{\beta ,n}(\overline{s_0},\overline{\pi })\mid \overline{s_0}\sim \overline{\mu }_0^*\right] -\mathbb {E}\left[ J^{1,i}_{\beta ,n}(\overline{s_0},[\overline{\pi }_{-i,1},\widehat{\pi _1^i}])\mid \overline{s_0}\sim \overline{\mu }_0^*\right] \\{} & {} \quad \ge \mathbb {E}\left[ J^i_\beta (s_0^i,\overline{\mu }_0^*,\pi _1^i,\overline{\pi })\mid s_0^i\sim \mu _0^{*i}\right] -\mathbb {E}\left[ J^i_\beta (s_0^i,\overline{\mu }_0^*,\widehat{\pi _1^i},\overline{\pi })\mid s_0^i\sim \mu _0^{*i}\right] \\{} & {} \qquad -\left|\mathbb {E}\left[ J^{1,i}_{\beta ,n}(\overline{s_0},[\overline{\pi }_{-i,1},\widehat{\pi _1^i}])\mid \overline{s_0}\sim \overline{\mu }_0^*\right] -\mathbb {E}\left[ J^i_\beta (s_0^i,\overline{\mu }_0^*,\widehat{\pi _1^i},\overline{\pi })\mid s_0^i\sim \mu _0^{*i}\right] \right|\\{} & {} \qquad -\left|\mathbb {E}\left[ J^{1,i}_{\beta ,n}(\overline{s_0},\overline{\pi })\mid \overline{s_0}\sim \overline{\mu }_0^*\right] -\mathbb {E}\left[ J^i_\beta (s_0^i,\overline{\mu }_0^*,\pi _1^i,\overline{\pi })\mid s_0^i\sim \mu _0^{*i}\right] \right|>-\frac{4\varepsilon }{5} \end{aligned}$$

for $n_1\ge n_1(\varepsilon )$,..., $n_1\ge n_1(\varepsilon )$. Combining it with (15) we get that

$$\begin{aligned} \mathbb {E}\left[ J^{1,i}_{\beta ,n}(\overline{s_0},\overline{\pi })\mid \overline{s_0}\sim \overline{\mu }_0^*\right] > \sup _{\widetilde{\pi _1^i}\in \mathcal {M}^i}\mathbb {E}\left[ J^{1,i}_{\beta ,n}(\overline{s_0},[\overline{\pi }_{-i,1},\widetilde{\pi _1^i}])\mid \overline{s_0}\sim \overline{\mu }_0^*\right] -\varepsilon \end{aligned}$$

for any $i\in \{ 1,\ldots ,N\}$ and $n_1\ge n_1(\varepsilon )$,..., $n_1\ge n_1(\varepsilon )$. As all the players within each population are symmetric, this implies that the profile of strategies $\overline{\pi }$ is an $\varepsilon $-equilibrium in the n-person counterpart of the discounted-payoff mean-field game in this case. $\square $

5.2 Results for the Total Payoff Case

In the remaining results we address the n-person counterparts of total-payoff game.

Theorem 7

Suppose assumptions (A1”), (A2”), (A3), (A4”) and (A5) hold and suppose $\overline{\pi }$ and $(\overline{\mu }^*_0,\overline{\mu }^*_1,\ldots )$ form a Markov mean-field equilibrium in the multi-population discrete-time mean-field game existing by Theorem 8 in [18]. If in addition, for each $t\ge 0$ and $i=1,\ldots ,N$, $\pi ^i_t$ is weakly continuous, then for any $\varepsilon >0$ and any $T\ge 0$ there exist positive integers $n_i(\varepsilon ,T)$, $i=1,\ldots ,N$ such that the vector of strategies where each player from population i uses policy $\pi ^i$ is an $(\varepsilon ,T)$-Markov–Nash equilibrium in any n-person stochastic counterpart of the total-payoff mean-field game if $n^i\ge n^i(\varepsilon ,T)$, $i=1,\ldots ,N$.

Remark 2

As in the case of discounted payoff, stationary mean-field equilibrium existing according to Theorem 5 in [18] is a specific case of Markov mean-field equilibria with stationarity condition imposed on global states of the game at subsequent stages. Hence, the result provided by Theorem 7 holds in this case as well.

Before we pass to the actual proof of Theorem 7, we present an auxiliary result that can be seen as a variant of Lemma 6 for the total payoff game.

Lemma 8

For any i and any $t_0\in \mathbb {N}$ under the assumptions of Theorem 7,

$$\begin{aligned} \sup _{\widehat{\pi _1^i}\in \mathcal {M}^i}\mathbb {E}\left[ J^{1,i}_{*n}(\overline{\mu _{t_0}},[{}^{t_0}\overline{\pi }_{-i,1},{}^{t_0}\widehat{\pi _1^i}])\mid \overline{s_0}\sim \overline{\mu }_0^*\right] \end{aligned}$$

can be attained using policies $\widehat{\pi _1^i}$ such that $\widehat{\pi _{1,t}^i}$ is weakly continuous for any time $t\ge 0$. Moreover, these policies do not depend on $t_0$ as long as $t_0\le T$ for some fixed $T\in \mathbb {N}$.

Proof

Without loss of generality we may only consider $i=1$. As in the case of Lemma 6 what we need to prove is that for an arbitrary policy $\widetilde{\pi _1^1}\in \mathcal {M}^1$ and any $\varepsilon >0$ there exists $\widehat{\pi _1^1}\in \mathcal {M}^1$ which is weakly continuous and satisfies for any fixed $t_0\le T$

$$\begin{aligned} \mathbb {E}\left[ J^{1,1}_{*n}(\overline{\mu _{t_0}},[{}^{t_0}\overline{\pi }_{-1,1},{}^{t_0}\widehat{\pi _1^1}])\mid \overline{s_0}\sim \overline{\mu }_0^*\right] \ge \mathbb {E}\left[ J^{1,1}_{*n}(\overline{\mu _{t_0}},[{}^{t_0}\overline{\pi }_{-1,1},{}^{t_0}\widetilde{\pi _1^1}])\mid \overline{s_0}\sim \overline{\mu }_0^*\right] -\varepsilon . \end{aligned}$$

(18)

The beginning of the proof is the same as for Lemma 6: we construct a Markov strategy for the first player in the first population $\widehat{\pi _1^1(\delta ,t^*)}$ (with $t^*>T$) of the form $\left( \widetilde{\pi ^{1,\delta }_{0,1}},\widetilde{\pi ^{1,\delta }_{1,1}},\ldots ,\widetilde{\pi ^{1,\delta }_{t^*,1}},\pi ^1_{t^*+1,1},\ldots \right) $, where each $\widetilde{\pi ^{1,\delta }_{t,1}}$ is weakly continuous and agrees with $\widetilde{\pi ^1_{t,1}}$ with probability $1-\delta $.

Next, we define modified (non-time homogeneous) transition probability $Q_n^{*t_0}$ as

$$\begin{aligned} Q_{n,t}^{*t_0}(\cdot \mid s,a,\overline{\tau }):=\left\{ \begin{array}{ll} Q_n(\cdot \mid \overline{s},\overline{a}),&{} \text{ if } s_1^1\ne s^* \text{ or } t<t_0\\ \delta _{(s^*)^n},&{} \text{ if } s=s^* \text{ and } t\ge t_0\end{array}\right. \end{aligned}$$

Let $\left( Q_n^{*t_0}\right) ^t(\cdot \mid \overline{s},\overline{\sigma })$ denote the transition in t steps when the initial state of the n-person game is $\overline{s}$ and the players use Markov strategy vector $\overline{\sigma }$. It can be checked that under assumption (A4”), $Q_n^{*t_0}$ satisfies

$$\begin{aligned} \lim _{t^*\rightarrow \infty }\!\!\!\!\!\!\sup _{\begin{array}{c} \overline{\sigma }\in \Pi _{j=1}^N\left( \mathcal {M}^j\right) ^{n_j},\\ (\overline{s}_0,\overline{a}_0,\overline{s}_1,\overline{a}_1,\ldots )\in \Pi _{t=0}^\infty \Pi _{j=1}^N(D^j)^{n_j} \end{array}}\left\| \sum _{t=t^*+1}^\infty \int _{S^1\setminus \{ s^*\}}w(x_1^{1})\alpha ^{-t}\left( Q_n^{*t_0}\right) ^t(d\overline{x}\mid \overline{s},\overline{\sigma })\right\| _w=0, \end{aligned}$$

which can shortly be written as

$$\begin{aligned} \lim _{t^*\rightarrow \infty }L_{t^*}=0 \end{aligned}$$

(19)

with $L_{t^*}$ denoting the supremum under the limit.

Now we can proceed as follows:

$$\begin{aligned}{} & {} \left|\mathbb {E}\left[ J^{1,1}_{*n}(\overline{\mu _{t_0}},[{}^{t_0}\overline{\pi }_{-1,1},{}^{t_0}\widehat{\pi _1^1(\delta ,t^*)}])\mid \overline{s_0}\sim \overline{\mu }_0^*\right] - \mathbb {E}\left[ J^{1,1}_{*n}(\overline{\mu _{t_0}},[{}^{t_0}\overline{\pi }_{-1,1},{}^{t_0}\widetilde{\pi _1^1}])\mid \overline{s_0}\sim \overline{\mu }_0^*\right] \right|\\{} & {} \quad =\left|\mathbb {E}^{\overline{\mu _0}^*,Q_n,[{}^{t_0}\overline{\pi }_{-1,1},{}^{t_0}\widehat{\pi _1^1}(\delta ,t^*)]}\sum _{t=t_0}^{\mathcal {T}^1_1-1}r^{1,1}_n(\overline{s_t},\overline{a_t})-\mathbb {E}^{\overline{\mu _0}^*,Q_n,[{}^{t_0}\overline{\pi }_{-1,1},{}^{t_0}\widetilde{\pi _1^1}]}\sum _{t=t_0}^{\mathcal {T}^1_1-1}r^{1,1}_n(\overline{s_t},\overline{a_t}) \right|\\{} & {} \quad =\left|\mathbb {E}^{\overline{\mu _0}^*,Q_n^{*t_0},[{}^{t_0}\overline{\pi }_{-1,1},{}^{t_0}\widehat{\pi _1^1}(\delta ,t^*)]}\sum _{t=t_0}^{\infty }r^{1,1}_n(\overline{s_t},\overline{a_t})-\mathbb {E}^{\overline{\mu _0}^*,Q_n^{*t_0},[{}^{t_0}\overline{\pi }_{-1,1},{}^{t_0}\widetilde{\pi _1^1}]}\sum _{t=t_0}^{\infty }r^{1,1}_n(\overline{s_t},\overline{a_t}) \right|\\{} & {} \quad \le \left|\mathbb {E}^{\overline{\mu _0}^*,Q_n^{*t_0},[{}^{t_0}\overline{\pi }_{-1,1},{}^{t_0}\widehat{\pi _1^1}(\delta ,t^*)]}\sum _{t=t_0}^{t^*}r^{1,1}_n(\overline{s_t},\overline{a_t})-\mathbb {E}^{\overline{\mu _0}^*,Q_n^{*t_0},[{}^{t_0}\overline{\pi }_{-1,1},{}^{t_0}\widetilde{\pi _1^1}]}\sum _{t=t_0}^{t^*}r^{1,1}_n(\overline{s_t},\overline{a_t}) \right|\\{} & {} \qquad +\left|\mathbb {E}^{\overline{\mu _0}^*,Q_n^{*t_0},[{}^{t_0}\overline{\pi }_{-1,1},{}^{t_0}\widehat{\pi _1^1}(\delta ,t^*)]}\sum _{t=t^*+1}^{\infty }r^{1,1}_n(\overline{s_t},\overline{a_t})-\mathbb {E}^{\overline{\mu _0}^*,Q_n^{*t_0},[{}^{t_0}\overline{\pi }_{-1,1},{}^{t_0}\widetilde{\pi _1^1}]}\sum _{t=t^*+1}^{\infty }r^{1,1}_n(\overline{s_t},\overline{a_t}) \right|\end{aligned}$$

As assumptions (A1”) and (A2”) are stronger than (A1’) and (A2’), bounds given in (14) still hold here. Hence, the first term on the RHS can be bounded above by

$$\begin{aligned}{} & {} \sum _{t=\max \{ 0,t_0\}}^{t^*}\left|R\gamma ^t\alpha ^t\int _{F_0^\delta }w(s_{0,1}^1)\mu _0^{*1}(ds_{0,1}^1) +R\delta \right|\\{} & {} \quad + \sum _{t=\max \{ 1,t_0\}}^{t^*}\left|R\gamma ^t\alpha ^{t-1}\int _{F_0^\delta }w(s_{1,1}^1)\widetilde{\mu }_1(ds_{1,1}^1)+R\delta \right|\\{} & {} \quad +\ldots +\sum _{t=t^*}^{t^*}\left|R\gamma ^t\alpha ^{t-t^*}\int _{F_0^\delta }w(s_{t^*,1}^1)\widetilde{\mu }_{t^*}(ds_{t^*,1}^1)+R\delta \right|. \end{aligned}$$

As far as the second term is concerned, (14) and the fact that $r^{1,1}_n(\overline{s},\overline{a})$ equals zero whenever $s_1^1=s^*$ imply that it can be bounded above by

$$\begin{aligned}{} & {} \mathbb {E}\left[ \sup _{\overline{\sigma }\in \Pi _{j=1}^N\left( \mathcal {M}^j\right) ^{n_j}}\sum _{t=t^*+1}^\infty R\gamma ^t\alpha ^t\int _{S^1\setminus \{ s^*\}}w(x_1^1)\alpha ^{-t}\left( Q_n^{*t_0}\right) ^t(d\overline{x}\mid \overline{s},\overline{\sigma })\right. \\{} & {} \quad \left. +\sup _{\overline{\sigma }\in \Pi _{j=1}^N\left( \mathcal {M}^j\right) ^{n_j}}\sum _{t=t^*+1}^\infty R\left( Q_n^{*t_0}\right) ^t\left( S^1\setminus \{ s^*\}\mid \overline{s},\overline{\sigma }\right) \mid \overline{s_0}\sim \overline{\mu }_0^*\right] \le 2RML_{t^*} \end{aligned}$$

with the last inequality following from the definition of $L_{t^*}$, (A1”) (in particular the fact that $\alpha \gamma <1$) and the inequality $w\ge 1$.

Both boundaries can be made arbitrarily small by taking $t^*$ big enough (by (19)) in the second case and $\delta $ small enough in the first one. In particular, if both are less than $\frac{\varepsilon }{2}$, $\widehat{\pi ^1_k}=\widehat{\pi _1^1(\delta ^*,\widehat{t^*})}$ satisfies (18). As boundaries do not depend on $t_0$, the last statement of the lemma has also been proved. $\square $

Proof of Theorem 7

Take $\varepsilon >0$ and $T\in \mathbb {N}$. Note that by Lemma 8 for each population i there exists a weakly continuous policy $\widehat{\pi _1^i}\in \mathcal {M}^i$ such that

$$\begin{aligned}{} & {} \mathbb {E}\left[ J^{1,i}_{*n}(\overline{\mu _{t_0}},[{}^{t_0}\overline{\pi }_{-i,1},{}^{t_0}\widehat{\pi _1^i}])\mid \overline{s_0}\sim \overline{\mu }_0^*\right] \nonumber \\{} & {} \quad \ge \sup _{\widetilde{\pi _1^i}\in \mathcal {M}^i}\mathbb {E}\left[ J^{1,i}_{*n}(\overline{\mu _{t_0}},[{}^{t_0}\overline{\pi }_{-i,1},{}^{t_0}\widetilde{\pi _1^i}])\mid \overline{s_0}\sim \overline{\mu }_0^*\right] -\frac{\varepsilon }{5}. \end{aligned}$$

(20)

for any $t_0\le T$. Next note that for any $t^*\ge T$ we have

$$\begin{aligned}{} & {} \left|\mathbb {E}\left[ J^{1,i}_{*n}(\overline{\mu _{t_0}},[{}^{t_0}\overline{\pi }_{-i,1},{}^{t_0}\widehat{\pi _1^i}])\mid \overline{s_0}\sim \overline{\mu }_0^*\right] -\mathbb {E}\left[ J^i_*(s_{t_0}^i,\overline{\mu }_{t_0}^*,{}^{t_0}\widehat{\pi _1^i},{}^{t_0}\overline{\pi })\mid {s_0^i}\sim \mu _0^{*i}\right] \right|\\{} & {} \quad \le \left|\mathbb {E}\left[ \sum _{t=t_0}^{\min \{ {\mathcal {T}^i_1-1},t^*\}} \left( r^i\left( \hat{s}_{t,1}^{\textbf{n},i},\hat{a}_{t,1}^{\textbf{n},i},\overline{\widehat{e}_{t}^\textbf{n}}\right) - r^i\left( \hat{s}_{t,1}^{i},\hat{a}_{t,1}^{i},\overline{\tau _t^*}\right) \right) \right] \right|\\{} & {} \qquad +\left|\mathbb {E}\left[ \sum _{t=t^*+1}^{\mathcal {T}^i_1}\left( r^i\left( \hat{s}_{t,1}^{\textbf{n},i},\hat{a}_{t,1}^{\textbf{n},i},\overline{\widehat{e}_{t}^\textbf{n}}\right) - r^i\left( \hat{s}_{t,1}^{i},\hat{a}_{t,1}^{i},\overline{\tau _t^*}\right) \right) \right] \right|\end{aligned}$$

The first term on the RHS goes to zero as $\textbf{n}\rightarrow \infty $ by Lemma 5. As far as the second term is concerned, it can be bounded above by $2RML_{t^*}$ in a similar way as in the proof of Lemma 8. By (19) this can be made arbitrarily small by taking $t^*$ big enough. In particular, we can find $t^*$ such that $2RML_{t^*}<\frac{\varepsilon }{5}$ and $n_1^i(\varepsilon ,T)$, ..., $n_N^i(\varepsilon ,T)$ such that for $n_1\ge n_1^i(\varepsilon ,T)$, ..., $n_N\ge n_N^i(\varepsilon ,T)$, for each $t_0\ge T$ we have

$$\begin{aligned} \left|\mathbb {E}\left[ J^{1,i}_{*n}(\overline{\mu _{t_0}},[{}^{t_0}\overline{\pi }_{-i,1},{}^{t_0}\widehat{\pi _1^i}])\mid \overline{s_0}\sim \overline{\mu }_0^*\right] -\mathbb {E}\left[ J^i_*(s_{t_0}^i,\overline{\mu }_{t_0}^*,{}^{t_0}\widehat{\pi _1^i},{}^{t_0}\overline{\pi })\mid {s_0^i}\sim \mu _0^{*i}\right] \right|<\frac{2\varepsilon }{5}.\nonumber \\ \end{aligned}$$

(21)

Similarly we find $\widehat{n_1^i}(\varepsilon ,T)$, ..., $\widehat{n_N^i}(\varepsilon ,T)$ such that for $n_1\ge \widehat{n_1^i}(\varepsilon ,T)$, ..., $n_N\ge \widehat{n_N^i}(\varepsilon ,T)$ and any $t_0\le T$,

$$\begin{aligned} \left|\mathbb {E}\left[ J^{1,i}_{*n}(\overline{\mu _{t_0}},{}^{t_0}\overline{\pi })\mid \overline{s_0}\sim \overline{\mu }_0^*\right] -\mathbb {E}\left[ J^i_*(s_{t_0}^i,\overline{\mu }_{t_0}^*,{}^{t_0}{\pi _1^i},{}^{t_0}\overline{\pi })\mid {s_0^i}\sim \mu _0^{*i}\right] \right|<\frac{2\varepsilon }{5}. \end{aligned}$$

(22)

If we take $n_i(\varepsilon ,T):=\max \{ \max \{ n_i^j(\varepsilon ,T), j=1,\ldots ,N\}, \max \{ \widehat{n_i^j}(\varepsilon ,T), j=1,\ldots ,N\}\}$, $i=1,\ldots ,N$, the definition of the Markov mean-field equilibrium, (21) and (22) imply that

$$\begin{aligned}{} & {} \mathbb {E}\left[ J^{1,i}_{*n}(\overline{\mu _{t_0}},{}^{t_0}\overline{\pi })\mid \overline{s_0}\sim \overline{\mu }_0^*\right] -\mathbb {E}\left[ J^{1,i}_{*n}(\overline{\mu _{t_0}},[{}^{t_0}\overline{\pi }_{-i,1},{}^{t_0}\widehat{\pi _1^i}])\mid \overline{s_0}\sim \overline{\mu }_0^*\right] \\{} & {} \quad \ge \mathbb {E}\left[ J^i_*(s_{t_0}^i,\overline{\mu }_{t_0}^*,{}^{t_0}{\pi _1^i},{}^{t_0}\overline{\pi })\mid {s_0^i}\sim \mu _0^{*i}\right] -\mathbb {E}\left[ J^i_*(s_{t_0}^i,\overline{\mu }_{t_0}^*,{}^{t_0}\widehat{\pi _1^i},{}^{t_0}\overline{\pi })\mid {s_0^i}\sim \mu _0^{*i}\right] \\{} & {} \qquad -\left|\mathbb {E}\left[ J^{1,i}_{*n}(\overline{\mu _{t_0}},[{}^{t_0}\overline{\pi }_{-i,1},{}^{t_0}\widehat{\pi _1^i}])\mid \overline{s_0}\sim \overline{\mu }_0^*\right] -\mathbb {E}\left[ J^i_*(s_{t_0}^i,\overline{\mu }_{t_0}^*,{}^{t_0}\widehat{\pi _1^i},{}^{t_0}\overline{\pi })\mid {s_0^i}\sim \mu _0^{*i}\right] \right|\\{} & {} \qquad -\left|\mathbb {E}\left[ J^{1,i}_{*n}(\overline{\mu _{t_0}},{}^{t_0}\overline{\pi })\mid \overline{s_0}\sim \overline{\mu }_0^*\right] -\mathbb {E}\left[ J^i_*(s_{t_0}^i,\overline{\mu }_{t_0}^*,{}^{t_0}{\pi _1^i},{}^{t_0}\overline{\pi })\mid {s_0^i}\sim \mu _0^{*i}\right] \right|>-\frac{4\varepsilon }{5} \end{aligned}$$

for $n_1\ge n_1(\varepsilon ,T)$,..., $n_1\ge n_1(\varepsilon ,T)$ and $t_0\le T$. Combining it with (20) we get that

$$\begin{aligned} \mathbb {E}\left[ J^{1,i}_{*n}(\overline{\mu _{t_0}},{}^{t_0}\overline{\pi })\mid \overline{s_0}\sim \overline{\mu }_0^*\right] > \sup _{\widetilde{\pi _1^i}\in \mathcal {M}^i}\mathbb {E}\left[ J^{1,i}_{*n}(\overline{\mu _{t_0}},[{}^{t_0}\overline{\pi }_{-i,1},{}^{t_0}\widetilde{\pi _1^i}])\mid \overline{s_0}\sim \overline{\mu }_0^*\right] -\varepsilon \end{aligned}$$

for any $i\in \{ 1,\ldots ,N\}$, $n_1\ge n_1(\varepsilon ,T)$,..., $n_1\ge n_1(\varepsilon ,T)$ and $t_0\le T$. As all the players within each population are symmetric, this implies that the profile of strategies $\overline{\pi }$ is an $(\varepsilon ,T)$-equilibrium in the n-person counterpart of the total-payoff mean-field game in this case. $\square $

6 Concluding Remarks

The paper is the continuation of our previous article [18], where we have presented the conditions under which multiple-population discrete-time mean-field games admit Markov (or stationary) equilibria. These results were presented for two payoff criteria: $\beta $-discounted payoff and total expected payoff. In this article, we have presented the theorems showing that under some rather unrestrictive assumptions equilibria obtained in the mean-field models are approximate equilibria in their n-person counterparts when n is large enough. All of them are presented for both payoffs considered. As games with total payoff have only been studied in finite state space case, the approximation results presented here also extend those for total-payoff mean-field games with a single population. The article is a part of ongoing research on discrete-time mean-field games with multiple populations of players. The next step should be extending the results presented in this paper to the case of long-run average reward. This is an especially interesting case as standard ergodicity assumptions applied in this kind of models to show the existence of an equilibrium do not translate well to the case with multiple populations.

Notes

As it can be clearly seen, the model encompasses in particular the situation when the state space for each population is the same and equal to S.
Here and in the sequel, for any set X, $\Delta (X)$ denotes the set of probability distributions over the $\sigma $-algebra of Borel subsets of X, $\mathcal {B}(X)$.
Here we replace the superscript $\alpha $ used to define the measure $\mathbb {P}^{s,\overline{\mu _0},\overline{Q},\pi ^\alpha ,\overline{\sigma }}$ by i, as the situation is symmetric within the population.
Here and in the sequel $\textbf{e}_j$ stands for a versor with one on its j-th coordinate.

References

Aliprantis CD, Border KC (1999) Infinite dimensional analysis. A Hitchhiker’s guide. Springer-Verlag, Berlin Heidelberg
Book Google Scholar
Bertsekas DP, Shreve SE (1978) Stochastic optimal control: the discrete time case. Academic Press, New York
Google Scholar
Bolley F. Applications du transport optimal à des problèmes de limites de champ moyen. Thèse de doctorat, Ecole Normale Supérieure de Lyon. Available at https://theses.hal.science/tel-00011462
Dudley RM (2004) Real analysis and probability. Cambridge University Press, Cambridge
Google Scholar
Granas A, Dugundji J (2003) Fixed point theory. Springer, Berlin
Book Google Scholar
Haurie A, Krawczyk JB, Zaccour G (2012) Games and dynamic games. World Scientific, Singapore
Book Google Scholar
Hernández-Lerma O, Lasserre JB (1996) Discrete-time Markov control processes: basic optimality criteria. Springer, Berlin
Book Google Scholar
Hernández-Lerma O, Lasserre JB (1999) Further topics on discrete-time Markov control processes. Springer, Berlin
Book Google Scholar
Parthasarathy KR (1967) Probability measures on metric spaces. AMS Bookstore
Saldi N (2020) Discrete-time average-cost mean-field games on Polish spaces. Turk J Math 44:463–480
MathSciNet Google Scholar
Saldi N, Başar T, Raginsky M (2018) Markov-Nash equilibria in mean-field games with discounted cost. SIAM J Control Optim 56(6):4256–4287
Article MathSciNet Google Scholar
Saldi N, Başar T, Raginsky M (2019) Approximate Nash equilibria in partially observed stochastic games with mean-field interactions. Math Oper Res 44(3):1006–1033
Article MathSciNet Google Scholar
Saldi N, Başar T, Raginsky M (2020) Approximate Markov-Nash equilibria for discrete-time risk-sensitive mean-field games. Math Oper Res 45(4):1596–1620
Article MathSciNet Google Scholar
Saldi N, Başar T, Raginsky M (2022) Partially observed discrete-time risk-sensitive mean field games. Dyn Games Appl. https://doi.org/10.1007/s13235-022-00453-z
Article Google Scholar
Serfozo R (1982) Convergence of Lebesgue integrals with varying measures. Sankhya Indian J Stat Ser A 44(3):380–402
MathSciNet Google Scholar
Więcek P, Altman E (2015) Stationary anonymous sequential games with undiscounted rewards. J Optim Theory Appl 166(2):686–710
Article MathSciNet Google Scholar
Więcek P (2020) Discrete-time ergodic mean field games with average reward on compact spaces. Dyn Games Appl 10:222–256
Article MathSciNet Google Scholar
Więcek P (2024) Multiple-population discrete-time mean field games with discounted and total payoffs: the existence of equilibria. Dyn Games Appl. https://doi.org/10.1007/s13235-024-00567-6

Download references

Funding

This work was supported by the NCN Grant no 2016/23/B/ST1/00425.

Author information

Authors and Affiliations

Faculty of Pure and Applied Mathematics, Wrocław University of Science and Technology, Wybrzeże Wyspiańskiego 27, 50-370, Wrocław, Poland
Piotr Więcek

Authors

Piotr Więcek
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

The single author did everything.

Corresponding author

Correspondence to Piotr Więcek.

Ethics declarations

Conflict of interest

The author declares that he has no Conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Więcek, P. Multiple-Population Discrete-Time Mean Field Games with Discounted and Total Payoffs: Approximation of Games with Finite Populations. Dyn Games Appl (2024). https://doi.org/10.1007/s13235-024-00570-x

Download citation

Accepted: 06 June 2023
Published: 15 June 2024
DOI: https://doi.org/10.1007/s13235-024-00570-x

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Multiple-Population Discrete-Time Mean Field Games with Discounted and Total Payoffs: Approximation of Games with Finite Populations

Abstract

Similar content being viewed by others

Multiple-Population Discrete-Time Mean Field Games with Discounted and Total Payoffs: The Existence of Equilibria

Total Reward Semi-Markov Mean-Field Games with Complementarity Properties

Stationary Equilibria of Mean Field Games with Finite State and Action Space

1 Introduction

2 The Model

2.1 Multi-population Mean-Field Game Model

Definition 1

Definition 2

2.2 n-Person Counterparts of a Mean-Field Game

Definition 3

3 Preliminaries

4 Assumptions

5 Main Results

5.1 Results for the Discounted Payoff Case

Theorem 1

Remark 1

Lemma 2

Proof

Lemma 3

Proof

Lemma 4

Proof

Lemma 5

Proof

Lemma 6

Proof

Proof of Theorem 1

5.2 Results for the Total Payoff Case

Theorem 7

Remark 2

Lemma 8

Proof

Proof of Theorem 7

6 Concluding Remarks

Notes

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation