Appendix A: The Risk-Sensitive Maximum Principle
This appendix proves the maximum principle for the risk-sensitive optimal control problem. We note that the risk-sensitive maximum principle in [42] considered the one-dimensional Brownian motion with the maximization of the Hamiltonian. Here, we extend this to the general r-dimensional Brownian motion with the minimization of the Hamiltonian (but we still call it the “maximum principle”).Footnote 6
Consider the SDE
$$\begin{aligned} \mathrm{d}x(t)&= f(t,x,u)\mathrm{d}t + \sigma (t) \mathrm{d}B(t),~ x(0)=x_0, \end{aligned}$$
and the risk-sensitive cost function
$$\begin{aligned} J(u)&= \gamma \log {\mathbb {E}} \left[ \exp \left\{ \frac{1}{\gamma } \int _0^T l(t,x(t),u(t))\mathrm{d}t + \frac{1}{\gamma }m(x(T))\right\} \right] , \end{aligned}$$
(A.1)
where B is the r-dimensional standard Brownian motion. Let \(\{\mathcal {F} \}_{t \ge 0}\) be the filtration generated by B.
We first state the risk-sensitive maximum principle, which is similar to [42, Theorem 3.1]. See also [65, Theorem 3.2, Chapter 3] for the risk-neutral stochastic maximum principle.
Theorem A1
Let \((x,\bar{u})\) be an optimal pair for the risk-sensitive optimal control problem in (A.1). Then there exists a unique pair \((p,q) \in \mathcal {L}_{\mathcal {F}}^2(0,T;{\mathbb {R}}^n) \times \mathcal {L}_{\mathcal {F}}^2(0,T;{\mathbb {R}}^{n \times r})\) such that it is the solution of the following BSDE:
$$\begin{aligned} \mathrm{d}p(t)&= -\left[ f_x^\top (t,x,\bar{u}) p(t) + l_x(t,x,\bar{u}) + \frac{1}{\gamma } q(t) \sigma ^\top (t) p(t) \right] \mathrm{d}t + q(t) \mathrm{d}B(t)\nonumber \\ p(T)&= m_x(x(T)), \end{aligned}$$
(A.2)
Also, the following optimality condition holds:
$$\begin{aligned} H(t,x,\bar{u},p,q) = \min _{u \in U} H(t,x,u,p,q), \end{aligned}$$
(A.3)
where the Hamiltonian H is given by
$$\begin{aligned} H(t,x,u,p,q) = p^\top f + l + {{\,\mathrm{Tr}\,}}\left( q^\top \sigma \right) + \frac{1}{\gamma }p^\top \sigma \sigma ^\top p. \end{aligned}$$
(A.4)
\(\square \)
Remark A1
Note that the term \(\frac{1}{\gamma } q(t) \sigma ^\top (t) p(t)\) in the BSDE p in (A.2) is different from that of the one-dimensional Brownian motion case in [42]. \(\square \)
We now prove Theorem A1.
Proof of Theorem A1
First, it is easy to see that the risk-sensitive optimal control problem in (A.1) can be converted into the following Mayer form:
$$\begin{aligned} J(u)&= \gamma \log {\mathbb {E}} \left[ \exp \{ \frac{1}{\gamma } (m(x(T)) + y(T))\} \right] \end{aligned}$$
(A.5)
$$\begin{aligned} \mathrm{d}x(t)&= f(t,x,u)\mathrm{d}t + \sigma (t) \mathrm{d}B(t),~ x(0)=x_0 \end{aligned}$$
(A.6)
$$\begin{aligned} \mathrm{d}y(t)&= l(t,x,u)\mathrm{d}t,~ y(0) = 0, \end{aligned}$$
(A.7)
Note that with this reformulation, we can apply the risk-neutral maximum principle in [65, Theorem 3.2, Chapter 3].
From [65, Theorem 3.2, Chapter 3], the Hamiltonian for the optimal control problem in (A.5) is given by
$$\begin{aligned} \bar{H}(t,x,u,p,q)&= \left\langle p,\begin{pmatrix} f \\ l \end{pmatrix} \right\rangle + {{\,\mathrm{Tr}\,}}(q^\top \begin{pmatrix} \sigma \\ 0 \end{pmatrix}), \end{aligned}$$
(A.8)
where p is the adjoint process satisfying
$$\begin{aligned} \mathrm{d}p(t)&= - \begin{pmatrix} f_x(t,x,\bar{u}) &{} 0 \\ l_x^\top (t,x,\bar{u}) &{} 0 \end{pmatrix}^\top p(t) + q(t) \mathrm{d}B(t) \nonumber \\ p(T)&= \frac{1}{\gamma }\exp \left\{ \frac{1}{\gamma } (m(x(T)) + y(T))\right\} \begin{pmatrix} m_x(x(T)) \\ 1 \end{pmatrix}. \end{aligned}$$
(A.9)
Note that p is an \((n+1)\)-dimensional adjoint process with \(p^\top =(p_1^\top ,p_2)^\top \), where \(p_1\) is an n-dimensional stochastic process associated with the constraint (A.6). Moreover, q is an \((n+1)\times r\) dimensional matrix stochastic process.
We define the associated value function for (A.5):
$$\begin{aligned} v(t) = \inf _{u} {\mathbb {E}} \left[ \exp \left\{ \frac{1}{\gamma } (m(x(T)) + y(T))\right\} \right] , \end{aligned}$$
where \(v(t) >0\) and \(v(T) = \exp \{ \frac{1}{\gamma } (m(x(T)) + y(T))\}\). Due to the relationship between the maximum principle and dynamic programming, the associated value function logarithmic transformation (see [29, Chapter VI], [5, 42]) leads to
$$\begin{aligned} p(t) = v_{(x,y)}(t),~~V = \gamma \log v, \end{aligned}$$
(A.10)
where p(T) satisfies the terminal condition in (A.9). The gradient of V can be written as
$$\begin{aligned} \tilde{p}(t) = \gamma \frac{p(t)}{v(t)}, \end{aligned}$$
(A.11)
where \(\tilde{p} = (\bar{p}^\top , \tilde{p}_2)^\top \in {\mathbb {R}}^{n+1}\), in which \(\bar{p}\) is an n-dimensional backward stochastic process.
We now obtain the expression of \(\tilde{p}\) in (A.11). Under the non-degeneracy assumption (stated in A4) in Sect. 2), v is the smooth value function of the optimal control problem in (A.5) as mentioned in [65, Chapter 4] and [42]. Also, we can see that there is no running cost. Then in view of the proof in [65, Theorem 4.1, Chapter 5] and [42], and by using the Itô formula, we have
$$\begin{aligned} \mathrm{d}v(t)&= p_1^\top (t) \sigma (t) \mathrm{d}B(t). \end{aligned}$$
(A.12)
By using the Itô formula again with (A.12), and from (A.11), we have
$$\begin{aligned} \mathrm{d}\frac{1}{v(t)}&= - \frac{1}{\gamma v(t)} \bar{p}^\top (t) \sigma (t) \mathrm{d}B(t) + \frac{1}{\gamma ^2 v(t)} \bar{p}^\top (t) \sigma (t) \sigma ^\top (t) \bar{p}(t) \mathrm{d}t. \end{aligned}$$
(A.13)
By using the Itô formula for (A.11) with (A.13) and (A.9), we have
$$\begin{aligned} \mathrm{d}\tilde{p}(t)&= \mathrm{d} \left( \gamma \frac{p(t)}{v(t)}\right) \nonumber \\&= - \begin{pmatrix} f_x(t,x,\bar{u}) &{} 0 \\ l_x^\top (t,x,\bar{u}) &{} 0 \end{pmatrix}^\top \tilde{p}(t)\mathrm{d}t - \frac{1}{\gamma } \tilde{q}(t) \sigma ^\top (t) \bar{p}(t) \mathrm{d}t + \tilde{q}(t) \mathrm{d}B(t), \end{aligned}$$
(A.14)
where
$$\begin{aligned} \tilde{p}(T) = \begin{pmatrix} m_x(x(T)) \\ 1 \end{pmatrix}, \end{aligned}$$
and \(\tilde{q}\) is an \((n+1) \times r\) dimensional stochastic process:
$$\begin{aligned} \tilde{q}(t)&= \frac{\gamma q(t)}{v(t)} - \frac{1}{\gamma } \tilde{p}(t) \bar{p}^\top (t) \sigma (t). \end{aligned}$$
In view of the value function transformation in (A.10), it is easy to see that \(\tilde{p}_2(t) = 1\) for all \(t \in [0,T]\), and \(\tilde{q}(t)\) satisfies
$$\begin{aligned} \tilde{q}(t)&= \frac{\gamma q(t)}{v(t)} - \frac{1}{\gamma } \tilde{p}(t) \bar{p}^\top (t) \sigma (t) =: \begin{pmatrix} \bar{q}(t) \\ \tilde{q}_2(t) \end{pmatrix}, \end{aligned}$$
(A.15)
where \(\tilde{q}(t)\) is an \((n \times r)\)-dimensional stochastic process, whereas \(\tilde{q}_2(t)\) is an \((1 \times r)\)-dimensional stochastic process with \(\tilde{q}_2(t) = 0\) a.s. for all \(t \in [0,T]\).
We note that \(\tilde{p} = (\bar{p}^\top ,\tilde{p}_2)^\top \) and \(\bar{p}(T) = m_x(x(T))\). Then expanding (A.14) and together with (A.15), we can easily show that \(\bar{p}\) satisfies the backward SDE in (A.2). Moreover, by substituting the relationships of (A.11) and (A.15) into the Hamiltonian in (A.8), one can arrive at the Hamiltonian of the risk-sensitive optimal control problem in (A.4) with the optimality condition given in (A.3). Since our derivation can be reversed, this completes the proof of the risk-sensitive maximum principle for the r-dimensional Brownian motion. The proof of Theorem A1 is done. \(\square \)
Appendix B: Proof of Theorem 1
First, we note that from the risk-sensitive maximum principle in Theorem A1 in “Appendix A” section, with the optimal control \(\bar{u}\), there exists a unique solution of the FBSDE in (7). Then under (A2)–(A5), by applying Four Step Scheme introduced in [25] (see also [53] and [44, Chapter 4] for Four Step Scheme under the strong regularity assumptions), the BSDE for p can be expressed in terms of x as follows:
$$\begin{aligned} \theta (t,x(t)) = p(t),~ \theta (T,x) = p(T) \end{aligned}$$
(B.1)
almost surely for \(t \in [0,T]\) [25, Corollary 1.5]. In fact, in view of Four Step Scheme and Itô formula, one can show that \(\theta (t,x)\) is a classical solution of the particular quasi-linear parabolic partial differential equation with the terminal condition \(\theta (T,x) = p(T) = m_x(x(T),\mu (T))\) [25, (\(\hbox {E}^\prime \))], [44, Chapter 4] and [53]. Also, from [25, Corollary 1.5], we have
$$\begin{aligned} |\theta (t,x_1) - \theta (t,x_2) | \le c |x_1 - x_2 |,~ \forall x_1,x_2 \in {\mathbb {R}}^n, \end{aligned}$$
(B.2)
for some constant \(c \ge 0\). Hence, with (B.1), the SDE for x in (7) can be written as follows:
$$\begin{aligned} \mathrm{d}x(t)&=f(t,x,\mu ,w(t,x,\mu ,\theta (t,x)))\mathrm{d}t + \sigma (t) \mathrm{d}B(t), \end{aligned}$$
(B.3)
where \(x(0) = x_0\). Note that (B.3) is now decoupled with the BSDE p in (7).
We now use Schauder’s fixed-point theorem to complete the proof. Its statement is given as follows: LetXbe a nonempty closed and bounded convex subset of a normed spaceS. LetTbe a continuous mapping ofXinto a compact subset\(K \subset X\). ThenThas a fixed point. [58, Theorem 4.1.1].
We first note that the 1-Wasserstein metric on \(\mathcal {P}_1(\mathcal {C}([0,T];{\mathbb {R}}^n))\) is equivalent to the Kantorovich–Rubinstein distance [16, Theorem 5.5]
$$\begin{aligned} W_1(\mu ^*(t),\mu ^\prime (t))&= \sup \left\{ \int _{\mathcal {C}([0,T];{\mathbb {R}}^n)} f(x) \mathrm{d}\mu ^*(t,x) - \int _{\mathcal {C}([0,T];{\mathbb {R}}^n)} f(x) \mathrm{d}\mu ^\prime (t,x) \right\} , \end{aligned}$$
where \(\mu ^*,\mu ^\prime \in \mathcal {P}_1(\mathcal {C}([0,T];{\mathbb {R}}^n))\), and the supremum is taken over the set of all 1-Lipschitz continuous maps f. Indeed, it can be seen that the 1-Wasserstein metric is induced by the Kantorovich–Rubinstein norm [57], which, together with the fact that \(\mathcal {C}([0,T];{\mathbb {R}}^n)\) is a normed space with the norm \(|\cdot |_{\infty } := \sup _{0 \le t \le T} |\cdot |\) [43], implies that \(\mathcal {P}_1(\mathcal {C}([0,T];{\mathbb {R}}^n))\) is a normed space [15].
We define the following set for \(c > 0\):
$$\begin{aligned} \mathcal {E}=\{\mu \in \mathcal {P}_4(\mathcal {C}\left( [0,T];{\mathbb {R}}^n\right) )~:~M_4(\mu ) \le c\}, \end{aligned}$$
where we have the inclusion \(\mathcal {E} \subset \mathcal {P}_2(\mathcal {C}([0,T];{\mathbb {R}}^n)) \subset \mathcal {P}_1(\mathcal {C}([0,T];{\mathbb {R}}^n)) \). Then, it is easy to check that \(\mathcal {E}\) is bounded and convex and is closed with respect to the 1-Wasserstein metric. The latter follows from the fact that for any convergent sequence of measures \(\mu _k \in \mathcal {E}\), \(k \ge 1\), to \(\mu \) with the 1-Wasserstein metric, we have \(W_4(\mu _k,\mu ) \le W_1(\mu _k,\mu )\) for \(k \ge 1\) [16, Section 5], which implies \(\mu \in \mathcal {E}\). In the proof below, a constant c can vary from line to line.
Now, note that \(\bar{u} \in {\mathcal {U}}\) is the optimal control that minimizes the Hamiltonian in (8). Then with (A2), (A5) and (B.2), the standard estimate of the SDEs in [65, Theorem 6.3, Chapter 1] implies that there exists a constant c, depending on \(x_0\), \(\beta \) and T, such that \({\mathbb {E}}[\sup _{0 \le t \le T} |x(t)|^4] \le c\). Hence, by considering the mapping \(\varPsi \) on \(\mathcal {E}\), and noticing that \(\mathcal {E} \subset \mathcal {P}_1(\mathcal {C}([0,T];{\mathbb {R}}^n))\), we have \(\varPsi : \mathcal {E} \rightarrow \mathcal {E}\), i.e., \(\varPsi \mu \in \mathcal {E}\), for any \(\mu \in \mathcal {E}\).
To prove compactness of \(\varPsi (\mathcal {E})\), we show tightness of a sequence of measures \(\varPsi \mu _k\) with respect to the 1-Wasserstein metric, where \(\mu _k \in \mathcal {E}\), \(k \ge 1\). Note that the initial condition of the SDE is not random and \(\sigma \) is uniformly bounded in \(t \in [0,T]\) from (A4). Then, from [65, Theorem 6.3, Chapter 1], for any \(\delta > 0\) and \(s \in [t,t+\delta ]\), \({\mathbb {E}}[|x(s) - x(t)|^2] \le c\), where c depends on the initial condition of the SDE and \(\delta \). Hence, in view of [14, Theorem 7.3] and [14, Corollary, page 83], \(\{\varPsi \mu _k\}\) is tight, which implies that \(\varPsi (\mathcal {E})\) is relatively compact with respect to the 1-Wasserstein metric [14, Theorem 5.1].
It remains to show that \(\varPsi \) is continuous on \(\mathcal {E}\) with respect to the 1-Wasserstein metric. That is, for every \(\epsilon > 0\), there exists \(\eta > 0\) such that with \(\mu ^*,\mu ^\prime \in \mathcal {E}\), \(W_1(\mu ^*,\mu ^\prime ) < \eta \) implies \(W_1(\varPsi \mu ^*,\varPsi \mu ^\prime )< \epsilon \). Note that \(\mu ^*\) is not a fixed point of \(\varPsi \). Let \(x^*\) and \(x^\prime \) be generated by two SDEs corresponding to \(\mu ^*\) and \(\mu ^\prime \), respectively. From the definition of \(W_1\), for any \(\mu ^*,\mu ^\prime \in \mathcal {E} \subset \mathcal {P}_2(\mathcal {C}([0,T];{\mathbb {R}}^n)) \subset \mathcal {P}_1(\mathcal {C}([0,T];{\mathbb {R}}^n))\),
$$\begin{aligned} W_1(\varPsi \mu ^*,\varPsi \mu ^\prime ) \le {\mathbb {E}} \left[ \sup _{0 \le t \le T} |x^*(t) - x^\prime (t)| \right] . \end{aligned}$$
(B.4)
From Gronwall’s lemma, (A2), (A5) and (B.2) and by following the proof in [19, Proposition 3.8], there exists a constant \(c > 0\) such that \({\mathbb {E}} [ \sup _{0 \le t \le T} |x^*(t) - x^\prime (t)|^2 ] \le c (\int _0^T W_2^2(\mu ^*(t),\mu ^\prime (t)) \mathrm{d}t )^{1/2}\). Then by using Jensen’s inequality and the fact that \(W_2(\mu ^*(t),\mu ^\prime (t)) \le W_1(\mu ^*(t),\mu ^\prime (t))\) [16, Section 5], we have
$$\begin{aligned} {\mathbb {E}} \left[ \sup _{0 \le t \le T} |x^*(t) - x^\prime (t) | \right] \le c \left( \int _0^T W_1^2(\mu ^*(t),\mu ^\prime (t)) \mathrm{d}t \right) ^{1/4}. \end{aligned}$$
This, together with (B.4), implies continuity of \(\varPsi \) on \(\mathcal {E}\) with respect to the 1-Wasserstein metric. This completes the proof of the theorem. \(\square \)
Appendix C: Proof of Theorem 2
To prove Theorem 2, we first need the following lemma:
Lemma C1
There exists a constant \(c>0\), dependent on n, \(M_{5+n}< \infty \) and T, such that
$$\begin{aligned} {\mathbb {E}}\left[ W_2^2\left( \nu _N^*(t),\mu ^*(t)\right) \right] \le \frac{c}{N^{2/(n+4)}},~ \forall t \in [0,T]. \end{aligned}$$
Moreover, \(W_2(\nu _N^*(t),\mu ^*(t)) \rightarrow 0\) as \(N \rightarrow \infty \) almost surely for all \(t \in [0,T]\).
A proof of this lemma can be found in [57, Theorem 10.2.1] and [59, Proposition 5.1], or [31, 38]. In fact, the proof relies on Gronwall’s lemma with the Lipschitz property, the strong law of large numbers of the empirical distribution, and exchangeability of the stochastic processes \(x_i^*\). The second part of Lemma C1 follows from [57, page 323].
We now proceed with the proof of Theorem 2.
Proof of Theorem 2
Since the players are symmetric, that is, the players are invariant under arbitrary permutations, we only need to consider the case when \(i=1\). In the proof below, the constant c can vary from line to line.
We note that \(x_1^*\) defined in (10) is the SDE for player \(i=1\) with the optimal distributed control \(u_1^*\) given in (11). As mentioned, \(x_1^*\) is decoupled with other players since f does not depend on the mean field from (A6), which implies that it is statistically independent from other players. Furthermore, we note that \(x_1\) is the SDE of player \(i=1\) with an arbitrary control \(u_1 \in {\mathcal {U}}_{\mathcal {F}}^1\). Then it should be clear from the definitions of \(x_1\) and \(x_1^*\) that \(x_1\) is identical to \(x_1^*\) when \(u_1 = u_1^*\).
In the proof below, note that the empirical distribution \(\nu _N^*\) in (12) is obtained when the N players are under the optimal distributed control in (11). Then in view of Lemma C1, we have for \(t \in [0,T]\),
$$\begin{aligned} {\mathbb {E}} \left[ W_2^2(\nu _N^*(t),\mu ^*(t)) \right] = O\left( \frac{1}{N^{2/(n+4)}} \right) . \end{aligned}$$
We also note that
$$\begin{aligned} \nu _N(t) = \frac{1}{N}\delta _{x_1(t)} + \frac{1}{N}\sum _{i=2}^N \delta _{x_i^*(t)}, \end{aligned}$$
which is the empirical distribution when \(x_1\) is under an arbitrary control \(u_1\), while other players are with the optimal distributed control in (11).
Due to boundedness of f and \(\sigma \) in t, one can show that by using Itô isometry, there exists a constant \(c > 0\) (dependent on \(\beta \) and T) such that
$$\begin{aligned} {\mathbb {E}} \left[ \sup _{0 \le t \le T} |x_1(t)|^2 \right] \le c + c {\mathbb {E}} \left[ \int _0^T |u_1(t)|^2 \mathrm{d}t \right] . \end{aligned}$$
(C.1)
Since \(u_i^*\) satisfies \({\mathbb {E}}[\int _0^T |u_i^*(t)|^2 \mathrm{d}t] < \infty \), \(2 \le i \le N\) and f and \(\sigma \) are bounded, from (A2), (A5), (B.2) and [65, Theorem 6.3, Chapter 1], we can show the estimate \({\mathbb {E}} [ \sup _{0 \le t \le T} |x_i^*(t)|^2 ] \le c\) for \(2 \le i \le N\). This, together with (C.1) and Itô isometry leads to the following inequality:
$$\begin{aligned}&\frac{1}{N}\left( {\mathbb {E}} \left[ \sup _{0 \le t \le T} |x_1(t)|^2 \right] + \sum _{i=2}^N {\mathbb {E}} \left[ \sup _{0 \le t \le T} |x_i^*(t)|^2 \right] \right) \le c + \frac{c}{N}{\mathbb {E}} \left[ \int _0^T |u_1(t)|^2 \mathrm{d}t \right] , \end{aligned}$$
(C.2)
which is also bounded since we have \(u_1 \in {\mathcal {U}}_{\mathcal {F}}^1\) with \({\mathbb {E}}[\int _0^T |u_1(t)|^2 \mathrm{d}t ] < \infty \).
Consider the following inequality:
$$\begin{aligned}&{\mathbb {E}} \left[ W_2^2(\nu _N(t),\mu ^*(t)) \right] \end{aligned}$$
(C.3)
$$\begin{aligned}&\quad \le c {\mathbb {E}} \left[ W_2^2 \left( \nu _N(t),\frac{1}{N-1} \sum _{i=2}^N \delta _{x_i^*(t)} \right) \right] \end{aligned}$$
(C.4)
$$\begin{aligned}&\qquad + c {\mathbb {E}} \left[ W_2^2 \left( \frac{1}{N-1} \sum _{i=2}^N \delta _{x_i^*(t)},\mu ^*(t) \right) \right] . \end{aligned}$$
(C.5)
We now show boundedness of \({\mathbb {E}} [ W_2^2(\nu _N(t),\mu ^*) ]\) in (C.3) with respect to N for \(t \in [0,T]\).
First, from Lemma C1, we have
$$\begin{aligned} {\mathbb {E}} \left[ W_2^2 \left( \frac{1}{N-1} \sum _{i=2}^N \delta _{x_i^*(t)},\mu ^*(t) \right) \right] \le \frac{c}{N^{2/(n+4)}}, \end{aligned}$$
(C.6)
which is the bound for (C.5).
Also, for (C.4), by the definition of \(W_2\), we have
$$\begin{aligned}&{\mathbb {E}} \left[ W_2^2 \left( \nu _N(t),\frac{1}{N-1} \sum _{i=2}^N \delta _{x_i^*(t)} \right) \right] \le \frac{c}{N(N-1)} \sum _{i=2}^n {\mathbb {E}} \left[ |x_1(t) - x_i^*(t)|^2 \right] \le \frac{c}{N}, \end{aligned}$$
(C.7)
where the last inequality follows from (C.1) and (C.2). Hence, for (C.3), in view of (C.6), and (C.7), we have
$$\begin{aligned} {\mathbb {E}} \left[ W_2^2(\nu _N(t),\mu ^*(t)) \right] \le \frac{c}{N^{2/(n+4)}},~ t \in [0,T]. \end{aligned}$$
(C.8)
By applying Jensen’s inequality, we have (see also [29, Chapter VI])
$$\begin{aligned} J_1^N\left( u^{N*}\right)&\ge {\mathbb {E}} \left[ \int _0^T l(t,x_1^*(t),\nu _N^*(t),u_1^*(t))\mathrm{d}t + m(x_1^*(T),\nu _N^*(T)) \right] =: L_1^N\left( u^{N*}\right) \\ \bar{J}_1\left( u_1^*,\mu ^*\right)&\ge {\mathbb {E}} \left[ \int _0^T l(t,x_1^*(t),\mu ^*(t),u_1^*(t))\mathrm{d}t + m(x_1^*(T),\mu ^*(T)) \right] =: \bar{L}_1\left( u_1^*,\mu ^*\right) . \end{aligned}$$
Note that \(L_1^N\) and \(\bar{L}_1\) are risk-neutral cost functions. Then, there exists a constant c, dependent on T and the Lipschitz constant \(\beta \) in (A2), such that
$$\begin{aligned} \left| J_1^N\left( u^{N*}\right) - \bar{J}_1\left( u_1^*,\mu ^*\right) \right|&\le c \left| L_1^N\left( u^{N*}\right) - \bar{L}_1\left( u_1^*,\mu ^*\right) \right| . \end{aligned}$$
(C.9)
Therefore, by using Cauchy–Schwarz inequality, the Lipschitz properties of l and m in (A2), and the fact that \(u_1^* \in {\mathcal {U}}_{\mathcal {F}^1}^1\), we can show that
$$\begin{aligned}&\left| J_1^N\left( u^{N*}\right) - \bar{J}_1\left( u_1^*,\mu ^*\right) \right| \\&\quad \le c {\mathbb {E}} \left[ W_2^2(\mu ^*(T),\nu _N^*(T)) \right] ^{1/2} + c \int _0^T {\mathbb {E}} \left[ W_2^2(\mu ^*(t),\nu _N^*(t)) \right] ^{1/2} \mathrm{d}t\le \frac{c}{N^{1/(n+4)}}, \end{aligned}$$
where the first inequality follows from (C.9) and (A2), and the second inequality is due to Lemma C1. In the first inequality, we have used the fact that \({\mathbb {E}} \left[ W_2(\mu ^*(t),\nu _N^*(t)) \right] \le {\mathbb {E}} \left[ W_2^2(\mu ^*(t),\nu _N^*(t)) \right] ^{1/2}\) due to Jensen’s inequality.
In view of the above inequality, we have
$$\begin{aligned}&\left| J_1^N\left( u^{N*}\right) - \bar{J}_1\left( u_1^*,\mu ^*\right) \right| = O\left( \frac{1}{N^{1/(n+4)}} \right) , \end{aligned}$$
(C.10)
which shows that for any i, \(1 \le i \le N\),
$$\begin{aligned} J_i^N\left( u^{N*}\right) - \frac{c}{N^{1/(n+4)}} \le \bar{J}_i\left( u_i^*,\mu ^*\right) . \end{aligned}$$
This implies that for sufficiently large N, the cost difference between \(J_i^N(u^{N*})\) and \(\bar{J}(u_i^*,\mu ^*)\) is negligible as a consequence of Lemma C1. This result can also be explained by the law of large numbers of the empirical distribution of \(x_i^*\), \(1 \le i \le N\), due to Lemma C1.
Furthermore, with a similar reasoning as in (C.9) and (C.10), and due to the empirical estimate obtained in (C.8), we have \(J_1^N(u_1,u_2^*,\ldots ,u_N^*) \ge \bar{J}_1(u_1,\mu ^*) - \frac{c}{N^{1/(n+4)}} \ge \bar{J}_1(u_1^*,\mu ^*) - \frac{c}{N^{1/(n+4)}}\). Note that the second inequality follows from step (i) and (15), since \(\bar{J}_1(u_1^*,\mu ^*) \le \bar{J}_1(u_1,\mu ^*)\) for \(u_1 \in {\mathcal {U}}_{\mathcal {F}}^1\). This, together with (C.10), implies that the set of the optimal distributed controls, \(u^{N*}=\{u_1^*,\ldots ,u_N^*\}\), where \(u_i^*\) is given in (11), constitutes an \(\epsilon _N\)-Nash equilibrium. Also, we have \(\epsilon _N \rightarrow 0\) as \(N \rightarrow \infty \) with the convergence rate of \(O(1/N^{1/(n+4)})\). This completes the proof of the theorem. \(\square \)
Appendix D: Lemma for Sect. 5
We have the following lemma for Sect. 5, which is a modified version of Lemma C1 in Appendix C.
Lemma D1
Suppose that the conditions in Theorem 3 hold. Then the following estimate holds: for \(t \in [0,T]\),
$$\begin{aligned} {\mathbb {E}}\left[ W_2^2\left( \nu _N^*(t),\mu ^*(t)\right) \right] = O\left( \frac{1}{N^{2/(n+4)}} + \sup _{ k \in \mathcal {K}} | \pi _k^N - \pi _k|^2 \right) . \end{aligned}$$
Moreover, \(W_2(\nu _N^*(t),\mu ^*(t)) \rightarrow 0\) as \(N \rightarrow \infty \) almost surely for all \(t \in [0,T]\).
Proof
First, observe that the empirical distribution of \(x_i\), \( 1 \le i \le N\), \(\nu _N^*\), with the individual optimal controls and the associated fixed point, that is, \(\nu _N^*\), has the following relationship:
$$\begin{aligned} \nu _N^*(t)&= \frac{1}{N}\sum _{i=1}^N \delta _{x_i(t)} = \frac{1}{N} \sum _{k=1}^K \sum _{i \in \mathcal {N}_k} \delta _{x_i(t)} = \frac{1}{N} \sum _{k=1}^K N_k \bar{\nu }_k^{N,*}(t) = \sum _{k=1}^K \pi _k^N \bar{\nu }_k^{N,*}(t), \end{aligned}$$
where \(\bar{\nu }_k^{N,*}(t) = \frac{1}{N_k} \sum _{i \in \mathcal {N}_k} \delta _{x_i(t)}\). Note that \(\bar{\nu }_k^{N,*}(t) \rightarrow \mu _k\) almost surely as \(N_k \rightarrow \infty \) for each \(k \in \mathcal {K}\).
Since \(W_2\) is a distance, we have
$$\begin{aligned} {\mathbb {E}} \left[ W_2^2(\nu _N^*(t),\mu ^*(t)) \right]&\le c {\mathbb {E}} \left[ W_2^2 \left( \sum _{k=1}^K \pi _k^N \bar{\nu }_k^{N,*}(t),\sum _{k=1}^K \pi _k^N \mu _k^*(t) \right) \right] \end{aligned}$$
(D.1)
$$\begin{aligned}&\quad + c {\mathbb {E}} \left[ W_2^2 \left( \sum _{k=1}^K \pi _k^N \mu _k^*(t),\sum _{k=1}^K \pi _k \mu _k^*(t) \right) \right] . \end{aligned}$$
(D.2)
For (D.2), we first show that there exists a constant c, independent of N, such that
$$\begin{aligned}&{\mathbb {E}} \left[ W_2^2 \left( \sum _{k=1}^K \pi _k^N \mu _k^*(t),\sum _{k=1}^K \pi _k \mu _k^*(t) \right) \right] \le c \sup _{ k \in \mathcal {K}} | \pi _k^N - \pi _k|^2. \end{aligned}$$
(D.3)
By (18), we have \(\sum _{k=1}^K \pi _k^N \mu _k^*(t) \rightarrow \sum _{k=1}^K \pi _k \mu _k^*(t)\) as \(N \rightarrow \infty \), which is equivalent to saying that \(W_2 (\sum _{k=1}^K \pi _k^N \mu _k^*(t),\sum _{k=1}^K \pi _k \mu _k^*(t) ) \rightarrow 0\) as \(N \rightarrow \infty \) [57, Chapter 10.2]. This implies (D.3).
We now consider (D.1). It satisfies the following inequality:
$$\begin{aligned}&{\mathbb {E}} \left[ W_2^2 \left( \sum _{k=1}^K \pi _k^N \bar{\nu }_k^{N,*}(t),\sum _{k=1}^K \pi _k^N \mu _k^*(t) \right) \right] \le {\mathbb {E}} \left[ W_2^2 \left( \sum _{k=1}^K \pi _k^N \bar{\nu }_k^{N,*}(t),\sum _{k=1}^K \pi _k \bar{\nu }_k^{N,*}(t) \right) \right] \nonumber \\&\qquad + {\mathbb {E}} \left[ W_2^2 \left( \sum _{k=1}^K \pi _k \bar{\nu }_k^{N,*}(t),\sum _{k=1}^K \pi _k \mu _k^*(t) \Bigr ) \right) \right] + {\mathbb {E}} \left[ W_2^2 \left( \sum _{k=1}^K \pi _k \mu _k^*(t),\sum _{k=1}^K \pi _k^N \mu _k^*(t) \Bigr ) \right) \right] . \end{aligned}$$
(D.4)
In view of the assumption in (18) and (D.3), the first and last terms in (D.4) are bounded above by \(c \sup _{ k \in \mathcal {K}} | \pi _k^N - \pi _k|^2\). For the second term in (D.4), from Lemma C1 in “Appendix C” section, we have
$$\begin{aligned}&{\mathbb {E}} \left[ W_2^2 \left( \sum _{k=1}^K \pi _k \bar{\nu }_k^{N,*}(t),\sum _{k=1}^K \pi _k \mu _k^*(t) \Bigr ) \right) \right] = O\left( \frac{1}{N^{2/(n+4)}} \right) ,~ \forall t \in [0,T]. \end{aligned}$$
This yields the desired result, thus completing the proof. \(\square \)