Theoretical solutions for the benchmark examples
In this appendix, the solutions of the following benchmark problems are presented for the linear-quadratic models given by (15).
-
A.1
Non-asymptotic Mean Field Game,
-
A.2
Asymptotic Mean Field Game,
-
A.3
Stationary Mean Field Game,
-
A.4
Non-asymptotic Mean Field Control,
-
A.5
Asymptotic Mean Field Control.
-
A.6
Stationary Mean Field Control.
In particular, we check that the relations (3) and (4) are satisfied. The explicit formulas for the optimal controls (AMFG and AMFC) are used as benchmarks for our algorithm.
1.1 Solution for non-asymptotic MFG
We present the solution for the following MFG problem
-
1.
Fix \({\varvec{m}}=(m_t)_{t\ge 0} \subset {\mathbb {R}}\) and solve the stochastic control problem:
$$\begin{aligned} \min _{\varvec{ \alpha }}J^{{\varvec{m}}}(\varvec{ \alpha })&=\min _{\varvec{ \alpha }} {\mathbb {E}}\left[ \int _0^{\infty } e^{-\beta t}f(X^{\varvec{ \alpha }}_t,\alpha _t,m_t)dt \right] \\&=\min _{\varvec{ \alpha }}{\mathbb {E}}\left[ \int _0^{+\infty }e^{-\beta t }\left( \frac{1}{2}\alpha _t^2 + c_1 \left( X_t^{\varvec{ \alpha }}- c_2 m_t \right) ^2 + c_3 \left( X_t^{\varvec{ \alpha }}- c_4 \right) ^2+c_5 m_t^2 \right) \hbox {d}t \right] , \\ \text {subject to }&\\ dX^{\varvec{ \alpha }}_t&=\alpha _t \hbox {d}t +\sigma dW_t, \\ X^{\varvec{ \alpha }}_0&\sim \mu _0. \end{aligned}$$
-
2.
Find the fixed point, \(\varvec{{{\hat{m}}}}=({{\hat{m}}}_t)_{t\ge 0}\), such that \({\mathbb {E}}\left[ X_t^{\varvec{ {{\hat{\alpha }}}}}\right] ={{\hat{m}}}_t\) for all \( t\ge 0\).
This problem can be solved by two equivalent approaches: PDE and FBSDEs. Both approaches start by solving the problem defined by a finite horizon T. Then, the solution to the infinite horizon problem is obtained by taking the limit T goes to infinity. Let \(V^{{\varvec{m}}^T,T}(t,x)\) be the optimal value function for the finite horizon problem conditioned on \(X_0=x\), i.e.,
$$\begin{aligned} V^{{\varvec{m}}^T,T}(t,x)= & {} \inf _{\varvec{ \alpha }}J^{{\varvec{m}},x}(\varvec{ \alpha })\\&=\inf _{\varvec{ \alpha }}{\mathbb {E}}\left[ \int _t^{T}e^{-\beta s }f(X_s^{\varvec{\alpha }},\alpha _s,m^T_s) ds\Big |X_0^{\varvec{ \alpha }}=x\right] , \quad V^{{\varvec{m}}^T,T}(T,x)=0. \end{aligned}$$
where \({\varvec{m}}^T=\{ m_t^T \}_{0 \le t \le T}\subset {\mathbb {R}}.\) Let us consider the following ansatz with its derivatives
$$\begin{aligned} \begin{aligned} V^{{\varvec{m}}^T,T}(t,x)&= \varGamma _2^T (t) x^2 + \varGamma _1^T (t) x + \varGamma _0^T (t), \\ \partial _t V^{{\varvec{m}}^T,T}(t,x)&= {\dot{\varGamma }}_2^T (t) x^2 + {\dot{\varGamma }}_1^T (t) x + {\dot{\varGamma }}_0^T (t), \\ \partial _x V^{{\varvec{m}}^T,T}(t,x)&= 2 \varGamma _2^T (t) x + \varGamma _1^T (t) , \\ \partial _{xx}V^{{\varvec{m}}^T,T}(t,x)&=2 \varGamma _2^T (t), \end{aligned} \end{aligned}$$
(17)
Then, the HJB equation for the value function reads:
$$\begin{aligned}&\partial _t V^{{\varvec{m}}^T,T} - \beta V^{{\varvec{m}}^T,T} + \inf _{\alpha } \{{\mathcal {A}}^X V^{{\varvec{m}}^T,T} + f(x,\alpha ,m^T)\}\\&\qquad \qquad =\partial _t V^{{\varvec{m}}^T,T} - \beta V^{{\varvec{m}}^T,T} + \inf _{\alpha } \left\{ \alpha \partial _x V^{{\varvec{m}}^T,T}\right. \\&\qquad \qquad \left. \quad +\frac{1}{2}\sigma ^2 \partial _{xx} V^{{\varvec{m}}^T,T} + \frac{1}{2}\alpha ^2 + c_1 (x- c_2 m^T)^2 + c_3 (x - c_4)^2 +c_5 (m^T)^2\right\} \\&\qquad \qquad =\partial _t V^{{\varvec{m}}^T,T} - \beta V^{{\varvec{m}}^T,T} + \left\{ - {\partial _x V^{{\varvec{m}}^T,T}}^2 +\frac{1}{2}\sigma ^2 \partial _{xx}V^{{\varvec{m}}^T,T} \right. \\&\qquad \qquad \left. \quad + \frac{1}{2}{\partial _x V^{{\varvec{m}}^T,T}}^2 + c_1 (x- c_2 m^T)^2+ c_3 (x - c_4 )^2+ c_5 (m^T)^2\right\} \\&\qquad \qquad =\partial _t V^{{\varvec{m}}^T,T} - \beta V^{{\varvec{m}}^T,T} - \frac{1}{2}{\partial _x V^{{\varvec{m}}^T,T}}^2\\&\qquad \qquad \quad +\frac{1}{2}\sigma ^2 \partial _{xx} V^{{\varvec{m}}^T,T} + c_1 (x-c_2 m^T)^2 + c_3 (x - c_4)^2 + c_5 (m^T)^2= 0, \end{aligned}$$
where in the third line we evaluated the infimum at \({{\hat{\alpha }}}^{T}= -V^{{\varvec{m}}^T,T}_x\). The following ODEs system is obtained by replacing the ansatz and its derivatives in the HJB equation:
$$\begin{aligned} {\left\{ \begin{array}{ll} {{{\dot{\varGamma }}}}_2^T -2({{ \varGamma }^T_2})^2 - \beta { \varGamma }^T_2 + c_1 + c_3 =0, \quad &{}{ \varGamma }_2^T(T) = 0, \\ {{{\dot{\varGamma }}}}^T_1 = (2 { \varGamma }^T_2 + \beta ) { \varGamma }^T_1 + 2 c_1 c_2 m^T + 2 c_3 c_4, \quad &{}{ \varGamma }^T_1(T)=0, \\ {{{\dot{\varGamma }}}}^T_0 = \beta { \varGamma }^T_0 + \frac{1}{2}({{ \varGamma }^T_1})^2\\ - \sigma ^2 { \varGamma }^T_2 -c_3 {c_4}^2 - (c_1 {c_2}^2 +c_5) ({m^T})^2 , \quad &{}{ \varGamma }^T_0(T) = 0,\\ \dot{m}^T = - 2 {\varGamma }^T_2 m^T - { \varGamma }^T_1, \quad &{}m^T(0)= {\mathbb {E}}\left[ \mu _0\right] =m_0,\\ \end{array}\right. } \end{aligned}$$
(18)
where the last equation is obtained by considering the expectation of \(X_t^{\varvec{\alpha }}\) after replacing \({{\hat{\alpha }}}^{T} = -\partial _x V^{{\varvec{m}}^T,T} = - (\varGamma ^T_2 x + \varGamma ^T_1)\). The first equation is a Riccati equation. In particular, the solution \(\varGamma ^T_2\) converges to \({{\hat{\varGamma }}}_2=\frac{-\beta + \sqrt{\beta ^2+8(c_1 + c_3)}}{4}\) as T goes to infinity. The second and fourth ODEs are coupled, and they can be written in matrix notation as
We start by solving the homogeneous equation, i.e.,
We introduce the propagator \(P^T\), i.e.,
$$\begin{aligned} {\begin{pmatrix} m^T \\ \varGamma _1^T \end{pmatrix}} = P^T_t \begin{pmatrix} m^T(0) \\ \varGamma _1^T(0) \end{pmatrix}. \end{aligned}$$
(21)
By deriving \(\begin{pmatrix} m^T \\ \varGamma _1^T \end{pmatrix}\) and expressing the initial conditions in terms of the inverse of \(P^T\) and \(\begin{pmatrix} m^T \\ \varGamma _1^T \end{pmatrix}\), we obtain
By comparing the last system with (20), we obtain
$$\begin{aligned} {\left\{ \begin{array}{ll} \dot{P^T_t} &{}= K^T_t P^T_t\\ P^T_0 &{}= \mathbb {I}_2 \end{array}\right. } \end{aligned}$$
(23)
where \(\mathbb {I}_2\) is the identity matrix in dimension 2. The solution is given by \(P^T_t=e^{\int _0^t K^T_s ds } :=e^{L^T_t}.\) In particular, the exponent is equal to
$$\begin{aligned} L^T_t= \int _0^t K^T_sds = \begin{bmatrix} - 2 \int _0^t \varGamma _2^T(s)ds &{} -t \\ 2 c_1 c_2 t &{} 2 \int _0^t \varGamma _2^T(s)ds + \beta t \end{bmatrix}= \begin{bmatrix} g_t^T &{} d_t \\ b_t &{} a_t^T \end{bmatrix}. \end{aligned}$$
(24)
We evaluate the exponential \(P^T(t)= e^{L^T_t}\) by using the Taylor’s expansion and diagonalizing the matrix \(L^T_t\). The eigenvalues/eigenvectors of \(L^T_t\) are given by
$$\begin{aligned}&\lambda ^T_{1\backslash 2,t} :=\frac{a_t^T+g_t^T \pm \sqrt{(a_t^T-g_t^T)^2 + 4b_t d_t}}{2},\nonumber \\&\quad v^T_{1,t}:=\begin{pmatrix} d_t \\ \lambda ^T_{1,t} - g_t^T \end{pmatrix}, \quad v^T_{2,t}:=\begin{pmatrix} d_t \\ \lambda ^T_{2,t} - g^T_t \end{pmatrix}. \end{aligned}$$
(25)
\(P_t\) is obtained by
$$\begin{aligned} P^T_t= & {} \begin{pmatrix} p^T_t(1,1) &{}\quad p^T_t(1,2) \\ p^T_t(2,1) &{}\quad p^T_t(2,2) \end{pmatrix}\nonumber \\= & {} e^{L^T_t}= \sum _{k=0}^{\infty } \begin{bmatrix} v^T_{1,t}&v^T_{2,t} \end{bmatrix} \frac{\begin{pmatrix} \lambda ^T_{1,t} &{} 0 \\ 0 &{} \lambda ^T_{2,t} \end{pmatrix}^k}{k! }\begin{bmatrix} v^T_{1,t}&v^T_{2,t} \end{bmatrix}^{-1} \nonumber \\:= & {} S^T_t \sum _{k=0}^{\infty } \frac{{D^T_t}^k}{k!} ({S^T_t})^{-1}\nonumber \\= & {} S^T_t \begin{pmatrix} e^{\lambda ^T_{1,t}} &{} 0 \\ 0 &{} e^{\lambda ^T_{2,t}} \end{pmatrix}({S^T_t})^{-1}\nonumber \\= & {} \frac{1}{d_t(\lambda ^T_{2,t} - \lambda ^T_{1,t})}\nonumber \\&\quad \begin{pmatrix} d_t e^{\lambda ^T_{1,t}}(\lambda ^T_{2,t} - g^T_t)+ d_t e^{\lambda ^T_{2,t}}(g^T_t-\lambda ^T_{1,t}) &{} d_t^2(e^{\lambda ^T_{2,t}}-e^{\lambda ^T_{1,t}}) \\ (\lambda ^T_{1,t} - g^T_t) (\lambda ^T_{2,t} - g^T_t) (e^{\lambda ^T_{1,t}}-e^{\lambda ^T_{2,t}}) &{} d_t e^{\lambda ^T_{2,t}}(\lambda ^T_{2,t} - g^T_t) + d_t e^{\lambda ^T_{1,t}}(g^T_t-\lambda ^T_{1,t}) \end{pmatrix}.\nonumber \\ \end{aligned}$$
(26)
In order to solve the non-homogeneous case, we introduce an extra term \(\begin{pmatrix} h_1^T \\ h_2^T \end{pmatrix}\), i.e.,
$$\begin{aligned} {\begin{pmatrix} m^T \\ \varGamma _1^T \end{pmatrix}} = P^T_t \begin{pmatrix} h^T_1 \\ h^T_2 \end{pmatrix}. \end{aligned}$$
(27)
By deriving \( {\begin{pmatrix} m^T \\ \varGamma _1^T \end{pmatrix}}\), we obtain
By comparing (19) with (28), we obtain
By integration, we obtain
$$\begin{aligned} \begin{aligned} h_1^T(t)&=h_1^T(0)-2c_3c_4\int _0^t \frac{p_s^T(1,2)}{|P_s^T|}ds,\\ h_2^T(t)&=h_2^T(0)+2c_3c_4\int _0^t \frac{p_s^T(1,1)}{|P_s^T|}ds, \end{aligned} \end{aligned}$$
(30)
where \(h_1^T(0)=m_0\) and \(h_2^T(0)=\varGamma _1^T(0)\).
We use the terminal condition \(\varGamma _1^T(T)=0\) to obtain an evaluation of \(h_2^T(0)=\varGamma _1^T(0)\) in terms of \(P^T_T\) and \(m_0\), i.e.,
$$\begin{aligned} \begin{aligned} \varGamma _1^T(T)&=p^T_T(2,1)h^T_1(T) + p^T_T(2,2) h^T_2(T)=0, \\ \varGamma _1^T(T)&=p^T_T(2,1)\\&\qquad \left( m_0-2c_3c_4\int _0^T \frac{p_s^T(1,2)}{|P_s^T|}ds \right) \\&\qquad + p^T_T(2,2) \left( \varGamma _1^T(0)+2c_3c_4\int _0^T \frac{p_s^T(1,1)}{|P_s^T|}ds \right) =0, \\ \varGamma _1^T(0)&= -\frac{p^T_T(2,1)}{p^T_T(2,2)} \left( m_0-2c_3c_4\int _0^T \frac{p_s^T(1,2)}{|P_s^T|}ds \right) -2c_3c_4\int _0^T \frac{p_s^T(1,1)}{|P_s^T|}ds . \end{aligned} \end{aligned}$$
(31)
In order to evaluate the limit of \(\varGamma _1^T(0)\) as T goes to infinity, we analyze the different terms separately. First, we evaluate the following limit:
$$\begin{aligned} \lim _{T\rightarrow \infty }\frac{1}{T}\int _0^T \varGamma _2^T(s)ds =\lim _{T\rightarrow \infty } \varGamma _2^T(s_1)= {{\hat{\varGamma }}}_2, \quad s_1 \in [0,T] , \end{aligned}$$
(32)
where we applied the mean value integral theorem and \({{\hat{\varGamma }}}_2=\frac{-\beta +\sqrt{\beta ^2+8(c_1+c_3)}}{4}\) is the limit of the solution of the Riccati equation obtained previously, i.e., \({{\hat{\varGamma }}}_2=\lim _{T\rightarrow \infty } \varGamma _2^T(s).\) We recall that
$$\begin{aligned}\lambda ^T_{2,T}-\lambda ^T_{1,T}=\sqrt{(a^T_T-g^T_T)^2+4b^T_T d_T}= T \sqrt{\left( \frac{4}{T}\int _0^T \varGamma ^T_2(s)ds + \beta \right) ^2 - 8 c_1 c_2 }>0\end{aligned}$$
which goes to infinity as T goes to \(\infty \) when the term under square root is well defined. We observe that
$$\begin{aligned} \begin{aligned} {\hat{g}}_t&:=\lim _{T\rightarrow \infty }g^T_t = \lim _{T\rightarrow \infty }- 2 \int _0^t \varGamma _2^T(s)ds = - 2 {{\hat{\varGamma }}}_2 t :=g t,\\ b_t&=2c_1c_2 t,\\ {\hat{a}}_t&:=\lim _{T\rightarrow \infty }a^T_t = \lim _{T\rightarrow \infty } 2 \int _0^t \varGamma _2^T(s)ds + \beta t= 2 {{\hat{\varGamma }}}_2 t + \beta t ,\\ d_t&= - t,\\ {{\hat{\lambda }}}_{1\backslash 2,t}&:=\lim _{T\rightarrow \infty }\lambda ^T_{1\backslash 2,t} = \frac{{\hat{a}}_t+{\hat{g}}_t \pm \sqrt{({\hat{a}}_t-{\hat{g}}_t)^2 + 4b_t d_t}}{2} \\&= t \frac{\beta \pm \sqrt{(4 {{\hat{\varGamma }}}_2 + \beta )^2-8c_1c_2}}{2}:=t \lambda _{1\backslash 2} , \\ {\hat{P}}_t&:=\lim _{T\rightarrow \infty }P^T_t \\&=\frac{1}{d_t({{\hat{\lambda }}}_{2,t} - {{\hat{\lambda }}}_{1,t})}\\&\quad \begin{pmatrix} d_t e^{{{\hat{\lambda }}}_{1,t}}({{\hat{\lambda }}}_{2,t} - {\hat{g}}_t)+ d_t e^{{{\hat{\lambda }}}_{2,t}}({\hat{g}}_t-{{\hat{\lambda }}}_{1,t}) &{} d_t^2(e^{{{\hat{\lambda }}}_{2,t}}-e^{{{\hat{\lambda }}}_{1,t}}) \\ ({{\hat{\lambda }}}_{1,t} - {\hat{g}}_t) ({{\hat{\lambda }}}_{2,t} - {\hat{g}}_t) (e^{{{\hat{\lambda }}}_{1,t}}-e^{{{\hat{\lambda }}}_{2,t}}) &{} d_t e^{{{\hat{\lambda }}}_{2,t}}({{\hat{\lambda }}}_{2,t} - {\hat{g}}_t) + d_t e^{{{\hat{\lambda }}}_{1,t}}({\hat{g}}_t-{{\hat{\lambda }}}_{1,t}) \end{pmatrix}. \end{aligned} \end{aligned}$$
(33)
To evaluate \({{\hat{\varGamma }}}_1(0)=\lim _{T\rightarrow \infty } \varGamma ^T_1(0)\), we study the limit of the remaining terms:
$$\begin{aligned} \lim _{T\mapsto \infty }-\frac{p^T_T(2,1)}{p^T_T(2,2)}= & {} \lim _{T\mapsto \infty }\frac{(\lambda ^T_{1,T} - g^T_T) (\lambda ^T_{2,T} - g^T_T) (e^{\lambda ^T_{2,T}}-e^{\lambda ^T_{1,T}})}{ d_T e^{\lambda ^T_{2,T}}(\lambda ^T_{2,T} - g^T_T) + d_T e^{\lambda ^T_{1,T}}(g^T_T-\lambda ^T_{1,T})}\nonumber \\= & {} \lim _{T\mapsto \infty }\frac{1}{\frac{d_T}{(\lambda ^T_{1,T}-g^T_T)(1-e^{\lambda ^T_{1,T}-\lambda ^T_{2,T}})}+\frac{d_T}{(\lambda ^T_{2,T}-g^T_T)(1-e^{\lambda ^T_{2,T}-\lambda ^T_{1,T}})}}\nonumber \\= & {} -(\lambda _1 -g)\nonumber \\= & {} -(\lambda _1 + 2 {{\hat{\varGamma }}}_2),\nonumber \\ \lim _{T\mapsto \infty }\int _0^T \frac{p_s^T(1,2)}{|P_s^T|}ds= & {} \lim _{T\mapsto \infty }\int _0^T \frac{d_s(e^{\lambda ^T_{2,s}}-e^{\lambda ^T_{1,s}})}{(\lambda ^T_{2,s}-\lambda ^T_{1,s})(e^{\lambda ^T_{1,s}+\lambda ^T_{2,s}})}ds \nonumber \\= & {} \frac{1}{\lambda _2-\lambda _1}\left( \frac{1}{\lambda _2}-\frac{1}{\lambda _1} \right) \nonumber \\ \lim _{T\mapsto \infty }\int _0^T \frac{p_s^T(1,1)}{|P_s^T|}ds= & {} \lim _{T\mapsto \infty } \int _0^T\frac{1}{e^{\lambda ^T_{1,s}+\lambda ^T_{2,s}}} \left( e^{\lambda ^T_{1,s}}\frac{\lambda _{2,s}^T-g_s^T}{\lambda ^T_{2,s}-\lambda ^T_{1,s}} +e^{\lambda ^T_{2,s}} \frac{g_s^T-\lambda _{1,s}^T}{\lambda ^T_{2,s}-\lambda ^T_{1,s}} \right) ds\nonumber \\= & {} \frac{\lambda _2-g}{\lambda _2(\lambda _2-\lambda _1)}+\frac{g-\lambda _1}{\lambda _1(\lambda _2-\lambda _1)}. \end{aligned}$$
(34)
Finally, the value of \({{\hat{\varGamma }}}_1(0)\) is given by
$$\begin{aligned} {{\hat{\varGamma }}}_1(0) = - (\lambda _1 - g) m_0 -2\frac{c_3c_4}{\lambda _2}. \end{aligned}$$
(35)
Given \({{\hat{\varGamma }}}_1(0)\), we evaluate the limit as T goes to \(\infty \) of (30), i.e.,
$$\begin{aligned} \begin{aligned} h_1(t):=\lim _{T\mapsto \infty } h_1^T(t)&=m_0- 2c_3c_4 \lim _{T\mapsto \infty } \int _0^t \frac{ p_s^T(1,2)}{|P_s^T|}ds \\&=m_0+ 2 \frac{c_3 c_4}{\lambda _2 - \lambda _1} \left( \frac{1}{\lambda _2}e^{-t \lambda _2}-\frac{1}{\lambda _1}e^{-t \lambda _1}+\frac{1}{\lambda _1}-\frac{1}{\lambda _2} \right) ,\\ h_2(t):=\lim _{T\mapsto \infty } h_2^T(t)&=\lim _{T\mapsto \infty } \left( \varGamma _1^T(0)+2c_3c_4 \int _0^t \frac{p_s^T(1,1)}{|P_s^T|} \hbox {d}s \right) \\&= {{\hat{\varGamma }}}_1(0)+ 2 \frac{c_3 c_4}{\lambda _2 - \lambda _1} \left( \frac{\lambda _2-g}{\lambda _2}(1-e^{-t\lambda _2})+\frac{g-\lambda _1}{\lambda _1}(1-e^{-t\lambda _1}) \right) . \end{aligned} \end{aligned}$$
(36)
We can conclude that
$$\begin{aligned} \begin{aligned} {\hat{m}}_t&=\lim _{T\rightarrow \infty }m^T_t\\&= {\hat{p}}_t(1,1) h_1(t) + {\hat{p}}_t(1,2)h_2(t)\\&= \left( m_0+2 \frac{c_3c_4}{\lambda _2-\lambda _1}\left( \frac{1}{\lambda _1}-\frac{1}{\lambda _2} \right) \right) e^{t\lambda _{1}}+2 \frac{c_3c_4}{\lambda _2-\lambda _1}\left( \frac{1}{\lambda _2}-\frac{1}{\lambda _1} \right) ,\\ {{\hat{\varGamma }}}_1(t)&=\lim _{T\rightarrow \infty }\varGamma _1^T(t)\\&= {\hat{p}}_t(2,1) h_1(t) + {\hat{p}}_t(2,2)h_2(t)\\&=m_0 (g-\lambda _1) e^{t\lambda _{1}}+2\frac{c_3c_4}{\lambda _2-\lambda _1}\left( \frac{\lambda _2-g}{\lambda _2}-\frac{\lambda _1-g}{\lambda _1} \right) .\\ \end{aligned} \end{aligned}$$
(37)
Finally, the third ODE in (18) can be solved by plugging in the solution of the previous ones and integrating. Since our interest is into the evolution of the mean and the control function, we omit these calculations, but we recall that:
$$\begin{aligned} {{\hat{\alpha }}}_t=-({{\hat{\varGamma }}}_2 x+{{\hat{\varGamma }}}_1(t)), \quad {{\hat{\varGamma }}}_2=\frac{-\beta + \sqrt{\beta ^2+8(c_1 + c_3)}}{4}, \end{aligned}$$
(38)
and we observe that
$$\begin{aligned} \lim _{t\rightarrow \infty }{{\hat{\alpha }}}_t=-({{\hat{\varGamma }}}_2 x+{{\hat{\varGamma }}}_1), \quad {{\hat{\varGamma }}}_1=-\frac{4c_1c_2{{\hat{\varGamma }}}_2}{\lambda _2} =\frac{c_3c_4{{\hat{\varGamma }}}_2}{2(c_1+c_3-c_1c_2)}. \end{aligned}$$
(39)
1.2 Solution for asymptotic MFG
The asymptotic version of the problem presented above is given by:
-
1.
Fix \(m \in {\mathbb {R}}\) and solve the stochastic control problem:
$$\begin{aligned} \min _{\varvec{ \alpha }}J^{m}(\varvec{ \alpha })&=\min _{\varvec{ \alpha }} {\mathbb {E}}\left[ \int _0^{\infty } e^{-\beta t}f(X^{\varvec{ \alpha }}_t,\alpha _t,m)\hbox {d}t \right] \\&=\min _{\varvec{ \alpha }}{\mathbb {E}}\left[ \int _0^{\infty }e^{-\beta t }\left( \frac{1}{2}\alpha _t^2 + c_1 \left( X_t^{\varvec{ \alpha }}-c_2 m \right) ^2 + c_3 \left( X_t^{\varvec{ \alpha }}-c_4 \right) ^2 + c_5 m^2 \right) \hbox {d}t\right] , \\ \text {subject to: }&\quad dX^{\varvec{ \alpha }}_t=\alpha _t \hbox {d}t +\sigma dW_t, \quad X^{\varvec{ \alpha }}_0\sim \mu _0. \end{aligned}$$
-
2.
Find the fixed point, \({{\hat{m}}}\), such that \({{\hat{m}}} = \lim _{t \rightarrow +\infty } {\mathbb {E}}\left[ X^{{{\hat{\alpha }}},{{\hat{m}}}}_t\right] \).
Let \(V^m(x)\) be the optimal value function given \(m \in {\mathbb {R}}\) and conditioned on \(X_0=x\), i.e.,
$$\begin{aligned} V^m(x)= & {} \inf _{\varvec{ \alpha }}J^{m,x}(\varvec{ \alpha })\\= & {} \inf _{\varvec{ \alpha }}{\mathbb {E}}\left[ \int _0^{+\infty }e^{-\beta t }\left( \frac{1}{2}\alpha _t^2 + c_1 \left( X_t^{\varvec{ \alpha }}-c_2 m \right) ^2 + c_3 \left( X_t^{\varvec{ \alpha }}-c_4 \right) ^2 + c_5 m^2 \right) \Big |X_0^{\varvec{ \alpha }}=x\right] . \end{aligned}$$
We consider the following ansatz with its derivatives with respect to x:
$$\begin{aligned} V^m(x)&=\varGamma _2 x^2 + \varGamma _1 x +\varGamma _0, \\ \dot{V}^m(x)&= 2\varGamma _2 x + \varGamma _1, \\ {\ddot{V}}^m(x)&=2\varGamma _2. \end{aligned}$$
Let us consider the HJB equation
$$\begin{aligned}&\beta V^m(x) - \inf _{\alpha } \{{\mathcal {A}}^X V^m(x) + f(x,\alpha ,m)\}\\&\quad =\beta V^m(x) - \inf _{\alpha } \left\{ \alpha \dot{V}(x) +\frac{1}{2}\sigma ^2 {\ddot{V}}^m(x) + \frac{1}{2}\alpha ^2 + c_1 (x- c_2 m)^2 \right. \\&\left. \qquad + c_3 (x - c_4)^2 +c_5 m^2\right\} \\&\quad =\beta V^m(x) - \left\{ - ({\dot{V}^m})^2(x) +\frac{1}{2}\sigma ^2 {\ddot{V}}^m(x) + \frac{1}{2}({\dot{V}^m})^2(x)\right. \\&\left. + c_1 (x- c_2 m)^2+ c_3 (x - c_4 )^2+ c_5 m^2\right\} \\&\quad =\beta V^m(x) + \frac{1}{2}({\dot{V}^m})^2(x)\\&\quad -\frac{1}{2}\sigma ^2 {\ddot{V}}^m(x) - c_1 (x-c_2 m)^2 - c_3 (x - c_4)^2 - c_5 m^2= 0, \end{aligned}$$
where in the third line we evaluated the infimum at \({{\hat{\alpha }}}(x)= -\dot{V}^m(x)\). Replacing the ansatz and its derivatives in the HJB equation, it follows that
$$\begin{aligned}&\left( \beta \varGamma _2 + 2 \varGamma _2^2 - c_1 - c_3 \right) x^2 +(\beta \varGamma _1 +2\varGamma _2\varGamma _1+2c_1c_2 m +2c_3 c_4 )x +\beta \varGamma _0\\&\qquad +\frac{1}{2}\varGamma _1^2-\sigma ^2 \varGamma _2 -( c_1{c_2}^2+c_5) m^2 - c_3 {c_4}^2=0. \end{aligned}$$
An easy computation gives the values
$$\begin{aligned} \varGamma _2&=\frac{-\beta + \sqrt{\beta ^2 +8 (c_1+c_3)}}{4},\\ \varGamma _1&=- \frac{ 2c_1c_2m+2c_3 c_4}{\beta + 2\varGamma _2},\\ \varGamma _0&=\frac{ c_5 m^2 + c_3 {c_4}^2+ c_1 {c_2}^2 m^2 +\sigma ^2 \varGamma _2 -\frac{1}{2}\varGamma _1^2 }{\beta }. \end{aligned}$$
By plugging the control \({{\hat{\alpha }}}(x)=-(2\varGamma _2x+\varGamma _1)\) into the dynamics of \(X_t\) and taking the expected value, we obtain an ODE for \({ m_t}\)
$$\begin{aligned} \dot{m}_t= -(2\varGamma _2 m_t+\varGamma _1). \end{aligned}$$
(40)
The solution of (40) is used to derive m as follows
$$\begin{aligned} \begin{aligned} m&=\lim _{t\mapsto \infty } m_t =\lim _{t\mapsto \infty } -\frac{\varGamma _1}{2\varGamma _2} + \left( m_0 + \frac{\varGamma _1}{\varGamma _2} \right) e^{-2 \varGamma _2 t} =-\frac{\varGamma _1}{2\varGamma _2} =\frac{2c_1 c_2 m+2 c_3 c_4}{2 \varGamma _2 (\beta + 2\varGamma _2)},\\ m&= \frac{c_3 c_4}{\varGamma _2 (\beta + 2\varGamma _2) -c_1 c_2 } \end{aligned} \end{aligned}$$
(41)
To summarize, we derived that \({{\hat{\alpha }}}(x)=-(2\varGamma _2x+\varGamma _1)\) with \(\varGamma _2={{\hat{\varGamma }}}_2\) and \(\varGamma _1={{\hat{\varGamma }}}_1\) obtained in (39). In other words, we have checked that
$$\begin{aligned} \lim _{t\rightarrow \infty }{{\hat{\alpha }}}_t^{MFG}(x) = {{\hat{\alpha }}}^{AMFG}(x), \quad \forall x, \end{aligned}$$
that is the first part of (3) for this LQ MFG.
1.3 Solution for stationary MFG
The only difference with the derivation above in the case of asymptotic MFG is that \(m_t\) should be a constant which, from (40), should satisfy \(2\varGamma _2 m+\varGamma _1=0\). Therefore, m takes the same value as in (41), and we deduce
$$\begin{aligned} {{\hat{\alpha }}}^{SMFG}(x) = {{\hat{\alpha }}}^{AMFG}(x), \quad \forall x, \end{aligned}$$
that is the second part of (3) for this LQ MFG.
1.4 Solution for non-asymptotic MFC
We present the solution for the following non-asymptotic MFC problem
$$\begin{aligned} \min _{\varvec{ \alpha }}J(\varvec{ \alpha })&=\min _{\varvec{ \alpha }} {\mathbb {E}}\left[ \int _0^{\infty } e^{-\beta t}f(X^{\varvec{ \alpha }}_t,\alpha _t,{\mathbb {E}}\left[ X_t^{\varvec{\alpha }}\right] )\hbox {d}t \right] \\&=\min _{\varvec{ \alpha }}{\mathbb {E}}\left[ \int _0^{+\infty }e^{-\beta t }\left( \frac{1}{2}\alpha _t^2 + c_1 \left( X_t^{\varvec{ \alpha }}-c_2 {\mathbb {E}}\left[ X_t^{\varvec{ \alpha }}\right] \right) ^2 + c_3 \left( X_t^{\varvec{ \alpha }}-c_4 \right) ^2 + c_5 {\mathbb {E}}\left[ X_t^{\varvec{ \alpha }}\right] ^2 \right) \hbox {d}t\right] ,\\ \text {subject to: }&\quad dX^{\varvec{ \alpha }}_t=\alpha _t \hbox {d}t +\sigma dW_t ,\quad X^{\varvec{ \alpha }}_0\sim \mu _0. \end{aligned}$$
Note that here the mean \({\mathbb {E}}\left[ X_t^{\varvec{ \alpha }}\right] \) of the population changes instantaneously when \(\varvec{ \alpha }\) changes.
This problem can be solved by two equivalent approaches: PDE and FBSDEs. Both approaches start by solving the problem defined by a finite horizon T. Then, the solution to the infinite horizon problem is obtained by taking the limit for T goes to infinity. Let \(V^T(t,x)\) be the optimal value function for the finite horizon problem conditioned on \(X_0=x\), i.e.,
$$\begin{aligned} V^T(t,x)= & {} \inf _{\varvec{ \alpha }}J^{\varvec{m^\alpha },x}(\varvec{ \alpha })\\= & {} \inf _{\varvec{ \alpha }}{\mathbb {E}}\left[ \int _t^{T}e^{-\beta s }f(X_s^{\varvec{\alpha }},\alpha _s,m_s^{\varvec{\alpha }})ds\Big |X_0^{\varvec{ \alpha }}=x\right] , \quad V^T(T,x)=0. \end{aligned}$$
Let us consider the following ansatz with its derivatives
$$\begin{aligned} \begin{aligned} V^T(t,x)&= \varGamma _2^T (t) x^2 + \varGamma _1^T (t) x + \varGamma _0^T (t), \quad V^T(T,x)=0,\\ \partial _t V^T(t,x)&= {\dot{\varGamma }}_2^T (t) x^2 + {\dot{\varGamma }}_1^T (t) x + {\dot{\varGamma }}_0^T (t), \\ \partial _x V^T(t,x)&= 2 \varGamma _2^T (t) x + \varGamma _1^T (t) , \\ \partial _{xx} V^T(t,x)&=2 \varGamma _2^T (t), \end{aligned} \end{aligned}$$
(42)
Starting by the MFC-HJB equation (4.12) given in [4], we extended it to the asymptotic case as follows
$$\begin{aligned}&\beta V^T -V_t^T - H\left( t,x,\varvec{\mu },\alpha \right) - \int _{{\mathbb {R}}} \frac{\delta H}{\delta \mu }\left( t,h,\varvec{\mu },-\partial _x V^T \right) (x)\mu _t(h)dh=0, \end{aligned}$$
where \(m_t=\int _{{\mathbb {R}}} y \mu _t(dy)\) and \(\alpha ^*=-\partial _x V^T\). We have:
$$\begin{aligned}&H\left( t,x,\varvec{\mu },\alpha \right) \\&\quad :=\inf _{\alpha }\left\{ {\mathcal {A}}^X V^T + f\left( t, x,\alpha ,\varvec{\mu } \right) \right\} \\&\quad =\inf _{\alpha }\left\{ \alpha \partial _x V^T + \frac{1}{2}\sigma ^2 \partial _{xx}V^T+\frac{1}{2}\alpha ^2 +c_1 (x-c_2 m_t)^2 + c_3 (x-c_4)^2 + c_5 {m_t}^2 \right\} \\&\quad =-\frac{1}{2}(\partial _x V^T)^2+ \frac{1}{2}\sigma ^2 \partial _{xx}V^T+c_1 (x-c_2 m_t)^2 + c_3 (x-c_4)^2 + c_5 {m_t}^2, \\&\frac{\delta H\left( t,h,\varvec{\mu },\alpha \right) }{\delta \mu } (x)\\&\quad =\frac{\delta }{\delta \mu }\left( c_1 (h-c_2 m_t)^2 + c_5 {m_t}^2 \right) (x)\\&\quad =\frac{\delta }{\delta \mu }\left( c_1 \left( h- c_2 \int _{{\mathbb {R}}} y \mu _t(dy) \right) ^2 +c_5 \left( \int _{{\mathbb {R}}} y \mu _t(dy) \right) ^2 \right) (x)\\&\quad =-2c_1 c_2 x\left( h-c_2\int _{{\mathbb {R}}} y \mu _t(dy)) \right) +2c_5x\int _{{\mathbb {R}}} y \mu _t(dy)\\&\quad =- 2c_1 c_2 x(h -c_2 m_t) +2c_5xm_t, \end{aligned}$$
$$\begin{aligned} {\int _{{\mathbb {R}}} \frac{\delta H}{\delta \mu }\left( t,h,\varvec{\mu },-\partial _x V^T \right) (x)\mu _t(h)dh}&= - 2c_1 c_2 x(m_t - c_2 m_t) +2c_5xm_t, \end{aligned}$$
and finally
$$\begin{aligned}&\beta V^T -\partial _t V^T +\frac{1}{2}(\partial _x ^T)^2- \frac{1}{2}\sigma ^2 \partial _{xx} V^T- c_1 (x- c_2 m_t)^2 - c_3 (x- c_4)^2 \\&\qquad - c_5 {m_t}^2 + 2 c_1 c_2 x(m_t - c_2 m_t) - 2c_5xm_t=0 . \end{aligned}$$
The following system of ODEs is obtained by replacing the ansatz and its derivatives in the MFC-HJB:
$$\begin{aligned} {\left\{ \begin{array}{ll} {{{\dot{\varGamma }}}}_2^T -2({{ \varGamma }^T_2})^2 - \beta { \varGamma }^T_2 + c_1 + c_3 =0, \quad &{}{ \varGamma }_2^T(T) = 0, \\ {{{\dot{\varGamma }}}}^T_1 = (2 { \varGamma }^T_2 + \beta ) { \varGamma }^T_1\\ + (2 c_1 c_2 ( 2 - c_2) -2c_5)m_t^T + 2 c_3 c_4, \quad &{}{ \varGamma }^T_1(T)=0, \\ {{{\dot{\varGamma }}}}^T_0 = \beta { \varGamma }^T_0 + \frac{1}{2}({{ \varGamma }^T_1})^2 - \sigma ^2 { \varGamma }^T_2\\ -c_3 {c_4}^2 - (c_1 {c_2}^2 +c_5) ({m^T_t})^2 , \quad &{}{ \varGamma }^T_0(T) = 0,\\ {\dot{m}}_t^T = - 2 {\varGamma }^T_2 m^T - { \varGamma }^T_1, \quad &{}m^T(0)= {\mathbb {E}}\left[ X^{\varvec{\alpha }}_0\right] =m_0,\\ \end{array}\right. } \end{aligned}$$
(43)
where the last equation is obtained by considering the expectation of \(X_t^{\varvec{\alpha }}\) after replacing \(\alpha ^*(x) = -\partial _x V^T(x) = - (\varGamma ^T_2 x + \varGamma ^T_1)\). The first equation is a Riccati equation. In particular, the solution \(\varGamma ^T_2\) converges to \(\varGamma ^*_2=\frac{-\beta + \sqrt{\beta ^2+8(c_1 + c_3)}}{4}\) as T goes to infinity. The second and fourth ODEs are coupled, and they can be written in matrix notation as
By similar calculations to the non-asymptotic MFG case, the following solutions can be obtained
$$\begin{aligned} \begin{aligned} m_t^*&=\lim _{T\rightarrow \infty }m^T_t = p^*_t(1,1) h_1(t) + p^*_t(1,2)h_2(t)\\&= \left( m_0+2 \frac{c_3c_4}{\lambda _2-\lambda _1}\left( \frac{1}{\lambda _1}-\frac{1}{\lambda _2} \right) \right) e^{t\lambda _{1}}+2 \frac{c_3c_4}{\lambda _2-\lambda _1}\left( \frac{1}{\lambda _2}-\frac{1}{\lambda _1} \right) ,\\ \varGamma _1^*(t)&=\lim _{T\rightarrow \infty }\varGamma _1^T(t) = p^*_t(2,1) h_1(t) + p^*_t(2,2)h_2(t)\\&=m_0 (g-\lambda _1) e^{t\lambda _{1}}+2\frac{c_3c_4}{\lambda _2-\lambda _1}\left( \frac{\lambda _2-g}{\lambda _2}-\frac{\lambda _1-g}{\lambda _1} \right) ,\\ \end{aligned} \end{aligned}$$
(45)
where
$$\begin{aligned} \begin{aligned} g&:=- 2 \varGamma _2^* ,\\ b&:=2(c_1c_2 (2-c_2) - c_5) ,\\ a&:=2 \varGamma _2^* + \beta ,\\ d&:=- 1,\\ \lambda _{1\backslash 2}&:=\frac{a+g \pm \sqrt{(a-g)^2 + 4b d}}{2} = t \frac{\beta \pm \sqrt{(4 \varGamma _2^* + \beta )^2-8(c_1c_2 (2-c_2) - c_5)}}{2} . \\ \end{aligned} \end{aligned}$$
(46)
As in the MFG case, the third ODE in (43) can be solved by plugging in the solution of the previous ones and integrating. Since our interest is into the evolution of the mean and the control function, we omit the calculation for this ODE.
1.5 Solution for asymptotic MFC
The asymptotic version of the problem presented above is given by:
$$\begin{aligned} \min _{\varvec{ \alpha }}J(\varvec{ \alpha })&=\inf _{\varvec{ \alpha }} {\mathbb {E}}\left[ \int _0^{\infty } e^{-\beta t}f(X^{\varvec{ \alpha }}_t,\alpha _t,m^{\varvec{\alpha }})\hbox {d}t \right] \\&=\inf _{\varvec{ \alpha }}{\mathbb {E}}\left[ \int _0^{+\infty }e^{-\beta t }\left( \frac{1}{2}\alpha _t^2 + c_1 \left( X_t^{\varvec{ \alpha }}-c_2 m^{\varvec{ \alpha }} \right) ^2 + c_3 \left( X_t^{\varvec{ \alpha }}- c_4 \right) ^2 +c_5 (m^{\varvec{ \alpha }})^2 \right) \hbox {d}t\right] ,\\ \text {subject to: }&\quad dX^{\varvec{ \alpha }}_t=\alpha _t \hbox {d}t +\sigma dW_t ,\quad X^{\varvec{ \alpha }}_0\sim \mu _0, \end{aligned}$$
where \(m^{\varvec{ \alpha }} = \lim _{t \rightarrow +\infty } {\mathbb {E}}\left[ X^{\alpha }_t\right] .\)
Let V(x) be the optimal value function conditioned on \(X_0=x\), i.e.,
$$\begin{aligned} V(x)= & {} \inf _{\varvec{ \alpha }}J^{x}(\varvec{ \alpha })\\= & {} \inf _{\varvec{ \alpha }}{\mathbb {E}}\left[ \int _0^{+\infty }e^{-\beta t }\left( \frac{1}{2}\alpha _t^2 + c_1 \left( X_t^{\varvec{ \alpha }}-c_2 m^{\varvec{ \alpha }} \right) ^2 + c_3 \left( X_t^{\varvec{ \alpha }}-c_4 \right) ^2 + c_5 (m^{\varvec{ \alpha }})^2 \right) \hbox {d}t\Big |X_0^{\varvec{ \alpha }}=x\right] . \end{aligned}$$
We consider the following ansatz with its derivative
$$\begin{aligned} V(x)&=\varGamma _2 x^2 + \varGamma _1 x + \varGamma _0, \\ \dot{V}(x)&= 2\varGamma _2 x + \varGamma _1, \\ {\ddot{V}}(x)&=2\varGamma _2. \\ \end{aligned}$$
Starting by the MFC-HJB equation (4.12) given in [4], we extended it to the asymptotic case as follows
$$\begin{aligned}&\beta V(x) - H\left( x,\mu ^{\varvec{ \alpha }},\alpha \right) - \int _{{\mathbb {R}}} \frac{\delta H}{\delta \mu }\left( h,\mu ^{\varvec{ \alpha }},-\dot{V}(h) \right) (x)\mu ^{\varvec{ \alpha }}(h)dh=0, \end{aligned}$$
where \(m^{\varvec{ \alpha }}=\int _{{\mathbb {R}}} y \mu ^{\varvec{ \alpha }}(dy)\). We have:
$$\begin{aligned}&H\left( x,\mu ^{\varvec{ \alpha }},\alpha \right) \\&\quad :=\inf _{\alpha }\left\{ {\mathcal {A}}^X V(x) + f\left( x,\alpha ,\mu ^{\varvec{ \alpha }} \right) \right\} \\&\quad =\inf _{\alpha }\left\{ \alpha \dot{V}(x) + \frac{1}{2}\sigma ^2 \ddot{V}(x)+\frac{1}{2}\alpha ^2 +c_1 (x-c_2 m^{\varvec{ \alpha }})^2 + c_3 (x-c_4)^2 + c_5 (m^{\varvec{ \alpha }})^2 \right\} \\&\quad =-\frac{1}{2}\dot{V}(x)^2+ \frac{1}{2}\sigma ^2 {\ddot{V}}(x)+c_1 (x-c_2 m^{\varvec{ \alpha }})^2 + c_3 (x-c_4)^2 + c_5 (m^{\varvec{ \alpha }})^2, \\&\frac{\delta H\left( h,\mu ^{\varvec{ \alpha }},\alpha \right) }{\delta \mu } (x)\\&\quad =\frac{\delta }{\delta \mu }\left( c_1 (h-c_2 m^{\varvec{ \alpha }})^2 + c_5 (m^{\varvec{ \alpha }})^2 \right) (x)\\&\quad =\frac{\delta }{\delta \mu }\left( c_1 \left( h- c_2 \int _{{\mathbb {R}}} y \mu ^{\varvec{ \alpha }}(dy) \right) ^2 +c_5 \left( \int _{{\mathbb {R}}} y \mu ^{\varvec{ \alpha }}(dy) \right) ^2 \right) (x)\\&\quad =-2c_1 c_2 x\left( h-c_2\int _{{\mathbb {R}}} y \mu ^{\varvec{ \alpha }}(dy)) \right) +2c_5x\int _{{\mathbb {R}}} y \mu ^{\varvec{ \alpha }}(dy)\\&\quad =- 2c_1 c_2 x(h -c_2 m^{\varvec{ \alpha }}) +2c_5xm^{\varvec{ \alpha }}, \end{aligned}$$
$$\begin{aligned}&{ \int _{{\mathbb {R}}} \frac{\delta H}{\delta \mu }\left( h,\mu ^{\varvec{ \alpha }},-\dot{V}(h) \right) (x)\mu ^{\varvec{ \alpha }}(h)dh}= - 2c_1 c_2 x(m^{\varvec{ \alpha }} - c_2 m^{\varvec{ \alpha }}) +2c_5xm^{\varvec{ \alpha }}, \end{aligned}$$
and finally, the HJB equation becomes:
$$\begin{aligned}&\beta V(x) +\frac{1}{2}\dot{V}(x)^2- \frac{1}{2}\sigma ^2 {\ddot{V}}(x)- c_1 (x- c_2 m^{\varvec{ \alpha }})^2 - c_3 (x- c_4)^2 \\&\qquad - c_5 (m^{\varvec{ \alpha }})^2 + 2 c_1 c_2 x(m^{\varvec{ \alpha }} - c_2 m^{\varvec{ \alpha }}) - 2c_5xm^{\varvec{ \alpha }}=0 . \end{aligned}$$
A system of ODEs is obtained by replacing the ansatz and its derivatives in the MFC-HJB and cancelling terms in \(x^2\), and x and constant:
$$\begin{aligned} \begin{aligned} \left( \beta \varGamma _2 + 2 \varGamma _2^2 - c_1 - c_3 \right) x^2&+\left( \beta \varGamma _1 +2\varGamma _2\varGamma _1+2 c_1 c_ 2 m^{\varvec{ \alpha }} (2-c_2) +2c_3 c_4 - 2 c_5 m^{\varvec{ \alpha }} \right) x \\&+\beta \varGamma _0+\frac{1}{2}\varGamma _1^2-\sigma ^2 \varGamma _2 -( c_1 {c_2}^2+c_5) (m^{\varvec{ \alpha }})^2 - c_3 {c_4}^2=0. \end{aligned} \end{aligned}$$
An easy computation gives the values
$$\begin{aligned} \varGamma _2&=\frac{-\beta + \sqrt{\beta ^2 +8 (c_1+c_3)}}{4},\\ \varGamma _1&= \frac{2 c_5 m^{\varvec{ \alpha }} -2c_1 c_2m^{\varvec{ \alpha }}(2-c_2)-2c_3 c_4}{\beta + 2\varGamma _2},\\ \varGamma _0&=\frac{ c_5 (m^{\varvec{ \alpha }})^2 + c_3 {c_4}^2+ c_1 {c_2}^2 (m^{\varvec{ \alpha }})^2 +\sigma ^2 \varGamma _2 -\frac{1}{2}\varGamma _1^2 }{\beta }. \end{aligned}$$
By plugging the control \(\alpha ^*(x)=-(2\varGamma _2x+\varGamma _1)\) into the dynamics of \(X^{\varvec{\alpha }}_t\) and taking the expected value, we obtain an ODE for \(m^{\varvec{ \alpha }}_t\)
$$\begin{aligned} \dot{m}_t^{\varvec{ \alpha }}= -(2\varGamma _2 m_t^{\varvec{ \alpha }}+\varGamma _1). \end{aligned}$$
(47)
The solution of (47) is used to derive m as follows
$$\begin{aligned} \begin{aligned} m^{\varvec{ \alpha }}&=\lim _{t\mapsto \infty } m_t^{\varvec{ \alpha }} =\lim _{t\mapsto \infty } \left( -\frac{\varGamma _1}{2\varGamma _2} + \left( m_0 + \frac{\varGamma _1}{\varGamma _2} \right) e^{-2 \varGamma _2 t}\right) \\&=-\frac{\varGamma _1}{2\varGamma _2} =-\frac{2c_5 m^{\varvec{ \alpha }} -2c_1 c_2m^{\varvec{ \alpha }}(2-c_2)-2c_3 c_4}{2 \varGamma _2 (\beta + 2\varGamma _2)}\\ m^{\varvec{ \alpha }}&= \frac{c_3 c_4}{\varGamma _2 (\beta + 2\varGamma _2)+ c_5 -c_1 c_2 (2-c_2)} \end{aligned} \end{aligned}$$
(48)
We remark that the values of \(m_t^{\varvec{ \alpha }}\) and \(\varGamma _1(t)\) obtained in the non-asymptotic case converge to \(m^{\alpha }\) and \(\varGamma _1\), respectively, as t goes to \(\infty \). Therefore, we have obtained that
$$\begin{aligned} \lim _{t\rightarrow \infty }\alpha _t^{*MFC}(x) = \alpha ^{*AMFG}(x), \quad \forall x, \end{aligned}$$
that is the first part of (4) for this LQ MFC problem.
1.6 Solution for stationary MFC
The only difference with the derivation above in the case of asymptotic MFC is that \(m^\alpha _t\) should be a constant which, from (47), should satisfy \(2\varGamma _2 m^\alpha +\varGamma _1=0\). Therefore, \(m^\alpha \) takes the same value as in (48), and we deduce
$$\begin{aligned} \alpha ^{*SMFG}(x) = \alpha ^{*AMFG}(x), \quad \forall x, \end{aligned}$$
that is the second part of (4) for this LQ MFC problem .
Lipschitz property of the 2 scale operators
1.1 Generic setting
We modify the original operators using the softmin operator on \({\mathbb {R}}^{|{\mathcal {A}}|}\) defined as:
$$\begin{aligned} {{\,\mathrm{soft-min}\,}}(z) = \left( \frac{e^{-z_i}}{\sum _j e^{-z_j}}\right) _{i=1,\dots ,|{\mathcal {A}}|} \in \varDelta ^{|{\mathcal {A}}|}, \qquad z \in {\mathbb {R}}^{|{\mathcal {A}}|}. \end{aligned}$$
Intuitively, it gives a probability distribution on the indices \(i=1,\dots ,|{\mathcal {A}}|\) which has higher values on indices whose corresponding values are closer to be a minimum. In particular, the elements of \(\min \{i=1,\dots ,|{\mathcal {A}}| : z_i = {{\,\mathrm{arg\,min}\,}}_j z_j\}\) have equal weight and this weight is the largest among \(\left( \frac{e^{-z_i}}{\sum _j e^{-z_j}}\right) _{i=1,\dots ,|{\mathcal {A}}|}\). We recall that the function \({{\,\mathrm{soft-min}\,}}\) is Lipschitz continuous for the 2-norm. Denoting by \(L_s\) its Lipschitz constant, it means that
$$\begin{aligned} \Vert {{\,\mathrm{soft-min}\,}}(z) - {{\,\mathrm{soft-min}\,}}(z')\Vert _2 \le L_s \Vert z - z'\Vert _2, \qquad z, z' \in {\mathbb {R}}^{|{\mathcal {A}}|}. \end{aligned}$$
Moreover, since \(|{\mathcal {A}}|\) is finite, all the norms on \({\mathbb {R}}^{|{\mathcal {A}}|}\) are equivalent, so there exists a positive constant \(c_{2,\infty }\) such that
$$\begin{aligned} \Vert {{\,\mathrm{soft-min}\,}}(z) - {{\,\mathrm{soft-min}\,}}(z')\Vert _\infty \le L_s c_{2,\infty } \Vert z - z'\Vert _\infty , \qquad z, z' \in {\mathbb {R}}^{|{\mathcal {A}}|}. \end{aligned}$$
To alleviate the notation, we will write \(Q(x) := (Q(x,a))_{a \in {\mathcal {A}}}\) for any \(Q \in {\mathbb {R}}^{|{\mathcal {X}}| \times |{\mathcal {A}}|}\). We also introduce a more general version \({\underline{p}}\) of the transition kernel p, which can take as an input a probability over actions instead of a single action: for \(x,x' \in {\mathcal {X}}, \nu \in \varDelta ^{|{\mathcal {A}}|}, \mu \in \varDelta ^{|{\mathcal {X}}|}\),
$$\begin{aligned} {\underline{p}}(x'|x,\nu ,\mu ) = \sum _{a} \nu (a) p(x'|x,a,\mu ). \end{aligned}$$
Intuitively, this is the probability for an agent at x to move to \(x'\) when the population distribution is \(\mu \) and the agent picks a random action following the distribution \(\nu \).
We now consider the following iterative procedure, which is a slight modification of (9a)–(9b). Here again, both variables (Q and \(\mu \)) are updated at each iteration but with different rates. Starting from an initial guess \((Q_0, \mu _0) \in {\mathbb {R}}^{|{\mathcal {X}}| \times |{\mathcal {A}}|} \times \varDelta ^{|{\mathcal {X}}|}\), define iteratively for \(k=0,1,\dots \):
where
$$\begin{aligned} {\left\{ \begin{array}{ll} {\mathcal {T}}(Q, \mu )(x,a) = f(x, a, \mu ) + \gamma \sum _{x'} p(x' | x,a,\mu ) \min _{a'}Q(x',a')\\ - Q(x,a), \qquad (x,a) \in {\mathcal {X}} \times {\mathcal {A}}, \\ \underline{{\mathcal {P}}}(Q, \mu )(x) = (\mu {\underline{P}}^{Q,\mu })(x) - \mu (x), \qquad x \in {\mathcal {X}}, \end{array}\right. } \end{aligned}$$
with
$$\begin{aligned}&{\underline{P}}^{Q,\mu }(x, x') = {\underline{p}}(x' | x, {{\,\mathrm{soft-min}\,}}Q(x), \mu ),\\&\qquad \hbox { and } \qquad (\mu {\underline{P}}^{Q,\mu })(x) = \sum _{x_0} \mu (x_0) {\underline{P}}^{Q,\mu }(x_0,x), \end{aligned}$$
is the transition matrix when the population distribution is \(\mu \) and the agent uses an approximately optimal randomized control according to the soft-min of Q.
Lemma 1
Assume that f is Lipschitz continuous with respect to \(\mu \) and that \({\underline{p}}\) is Lipschitz continuous with respect to \(\nu \) and \(\mu \). Then,
-
the operator \({\mathcal {T}}\) is Lipschitz continuous w.r.t. \(\mu \) (with a Lipschitz constant possibly depending on \(\Vert Q\Vert _\infty )\), and Lipschitz continuous in Q (uniformly in \(\mu \));
-
the operator \(\underline{{\mathcal {P}}}\) is Lipschitz continuous in both variables.
If p is independent of \(\mu \), then both \({\mathcal {T}}\) and \(\underline{{\mathcal {P}}}\) are Lipschitz continuous.
Proof
Let us denote by \(L_p\) and \(L_f\) the Lipschitz constants of p and f, respectively. Let \((Q,\mu ),(Q',\mu ') \in {\mathbb {R}}^{|{\mathcal {X}}| \times |{\mathcal {A}}|} \times \varDelta ^{|{\mathcal {X}}|}\). We first consider \({\mathcal {T}}\). We have
$$\begin{aligned}&\Vert {\mathcal {T}}(Q,\mu ) - {\mathcal {T}}(Q',\mu )\Vert _{\infty } \\&\quad \le \gamma \sum _{x'} \max _{x,a} p(x' | x,a,\mu ) \left| \min _{a'}Q(x',a') - \min _{a'}Q'(x',a')\right| + \left\| Q - Q'\right\| _{\infty } \\&\quad \le (\gamma + 1) \left\| Q - Q'\right\| _{\infty }. \end{aligned}$$
Moreover,
$$\begin{aligned} \Vert {\mathcal {T}}(Q,\mu ) - {\mathcal {T}}(Q,\mu ')\Vert _{\infty }&\le |f(x, a, \mu ) - f(x, a, \mu ')| \\&\quad + \gamma \sum _{x'} |p(x' | x,a,\mu ) - p(x' | x,a,\mu ')| \, |\min _{a'}Q(x',a')| \\&\le (L_f + \gamma L_p \Vert Q\Vert _\infty )|{\mathcal {X}}| \Vert \mu - \mu '\Vert _{\infty }, \end{aligned}$$
where \(L_f\) and \(L_p\) are, respectively, the Lipschitz constants of f and p with respect to \(\mu \). If p is independent of \(\mu \), we obtain
$$\begin{aligned} \Vert {\mathcal {T}}(Q,\mu ) - {\mathcal {T}}(Q,\mu ')\Vert _{\infty }&\le L_f \Vert \mu - \mu '\Vert _{\infty }. \end{aligned}$$
We then show that the operator \(\underline{{\mathcal {P}}}\) is Lipschitz continuous. We have
$$\begin{aligned}&\Vert \underline{{\mathcal {P}}}(Q, \mu ) - \underline{{\mathcal {P}}}(Q, \mu ')\Vert _{\infty } \\&\quad \le \Vert \mu {\underline{P}}^{Q,\mu } - \mu ' {\underline{P}}^{Q,\mu '}\Vert _{\infty } + \Vert \mu - \mu '\Vert _{\infty } \\&\quad \le \left\| \sum _x \Big ({\underline{p}}(\cdot | x, {{\,\mathrm{soft-min}\,}}Q(x), \mu )\mu (x) - {\underline{p}}(\cdot | x, {{\,\mathrm{soft-min}\,}}Q(x), \mu ')\mu '(x)\Big ) \right\| _{\infty } \\&\qquad + \Vert \mu - \mu '\Vert _{\infty }. \end{aligned}$$
For the first term, we note that, for every \(x \in {\mathcal {X}}\),
$$\begin{aligned}&\left\| \Big ({\underline{p}}(\cdot | x, {{\,\mathrm{soft-min}\,}}Q(x), \mu )\mu (x) - {\underline{p}}(\cdot | x, {{\,\mathrm{soft-min}\,}}Q(x), \mu ')\mu '(x)\Big )\right\| _{\infty } \\&\quad \le \left\| \Big ({\underline{p}}(\cdot | x, {{\,\mathrm{soft-min}\,}}Q(x), \mu ) - {\underline{p}}(\cdot | x, {{\,\mathrm{soft-min}\,}}Q(x), \mu ') \Big )\mu (x)\right\| _{\infty } \\&\qquad + \left\| {\underline{p}}(\cdot | x, {{\,\mathrm{soft-min}\,}}Q(x), \mu ')\Big (\mu (x) - \mu '(x)\Big )\right\| _{\infty } \\&\quad \le (L_p + 1) \left\| \mu - \mu '\right\| _{\infty }, \end{aligned}$$
where we used the fact that discrete probability measures are non-negative and bounded by 1.
Moreover, we have
$$\begin{aligned} \Vert \underline{{\mathcal {P}}}(Q, \mu ) - \underline{{\mathcal {P}}}(Q', \mu )\Vert _{\infty }&\le \Vert \mu ({\underline{P}}^{Q,\mu } - {\underline{P}}^{Q',\mu '})\Vert _{\infty } \\&\le \sum _x \Vert {\underline{p}}(\cdot | x, {{\,\mathrm{soft-min}\,}}Q(x), \mu )\\&- {\underline{p}}(\cdot | x, {{\,\mathrm{soft-min}\,}}Q'(x), \mu )\Vert _{\infty } \\&\le \sum _x L_p \Vert {{\,\mathrm{soft-min}\,}}Q(x) - {{\,\mathrm{soft-min}\,}}Q'(x)\Vert _{\infty } \\&\le |{\mathcal {X}}| \, L_p \, L_s \, c_{2,\infty } \, \Vert Q - Q'\Vert _{\infty }, \end{aligned}$$
which concludes the proof. \(\square \)
1.2 Application to a discrete model for the LQ problem
Recall that the continuous linear-quadratic model we consider is defined by (15). Here, we propose a finite space MDP which approximates the dynamics of a typical agent in this continuous LQ model. We consider that the action space is given by \(\mathcal {A} = \{ a_0=-1, a_1 = -1+\varDelta _{.}, \dots , a_{N_{\mathcal {A}}}=1-\varDelta _{.}, a_{N_{\mathcal {A}}}=1 \}\) and the state space by \(\mathcal {X} = \{ x_0=x_c-2, x_1=x_c-2-\varDelta _{.}, \dots , x_{N_{\mathcal {X}}-1}=x_c+2-\varDelta _{.}, x_{N_{\mathcal {X}}}=x_c+2 \}\), where \(x_c\) is the center of the state space. The step size for the discretization of the spaces \(\mathcal {X}\) and \(\mathcal {A}\) is given by \(\varDelta _{.} = \sqrt{\varDelta t} = 10^{-1} \).
Consider the transition probability:
$$\begin{aligned} p(x,x',a,\mu )= & {} {\mathbb {P}}(Z^{x+a, \varDelta t} \in [x'-\varDelta _{.}/2, x'+\varDelta _{.}/2]) \\= & {} \varPhi _{x+a, \sigma ^2 \varDelta t}(x'+\varDelta _{.}/2) - \varPhi _{x+a, \sigma ^2 \varDelta t}(x'-\varDelta _{.}/2), \end{aligned}$$
where \(Z \sim {\mathcal {N}}(x+a,\sigma ^2 \varDelta t)\) and \(\varPhi _{x+a,\sigma ^2 \varDelta t}\) is the cumulative distribution function of the \({\mathcal {N}}(x+a,\sigma ^2 \varDelta t)\) distribution. Moreover, consider that the one-step cost function is given by \(f(x,a,\mu ) \varDelta t\) with
$$\begin{aligned} f(x,a,\mu )= & {} \frac{1}{2}a^2 + c_1 \left( x- c_2 \sum _{\xi \in S} \mu (\xi )\right) ^2 + c_3 \left( x- c_4 \right) ^2 \\&+ c_5 \left( \sum _{\xi \in S} \mu (\xi )\right) ^2, \qquad b(x,a,\mu ) = a, \end{aligned}$$
For simplicity, we write \({{\bar{\mu }}} = \sum _{\xi \in S} \mu (\xi )\).
Lemma 2
In this model, f is Lipschitz continuous with respect to \(\mu \) and \({\underline{p}}\) is Lipschitz continuous with respect to \(\nu \) and \(\mu \)
Proof
We start with f. For the \(\mu \) component, we have:
$$\begin{aligned} |f(x,a,\mu ) - f(x,a,\mu ')|&\le c \left| \left( x- c_2 {{\bar{\mu }}}\right) ^2 - \left( x- c_2 {{\bar{\mu }}}'\right) ^2\right| + c \left| \left( {{\bar{\mu }}}\right) ^2 - \left( {{\bar{\mu }}}'\right) ^2\right| \\&\le c \left( {{\bar{\mu }}}' - {{\bar{\mu }}} \right) \cdot \left( 2x + ({{\bar{\mu }}}' - {{\bar{\mu }}})\right) + c ({{\bar{\mu }}} - {{\bar{\mu }}}')({{\bar{\mu }}} + {{\bar{\mu }}}') \\&\le c \max _{x \in S}\Vert x\Vert _\infty \, \left( {{\bar{\mu }}}' - {{\bar{\mu }}} \right) \\&\le c \max _{x \in S}\Vert x\Vert _\infty \, \sum _{x \in S}\left( \mu '(x) - \mu (x) \right) \\&\le c \max _{x \in S}\Vert x\Vert _\infty \, |S| \, \Vert \mu ' - \mu \Vert _\infty , \end{aligned}$$
where \(c>0\) is a constant depending only on the parameters of the model and whose value may change from line to line.
Then, we consider \({\underline{p}}\). It is independent of \(\mu \) in this model. For the action component, we have:
$$\begin{aligned}&|{\underline{p}}(x,x',\nu ,\mu ) - {\underline{p}}(x,x',\nu ',\mu )| \\&\quad = \left| \sum _{a}\nu (a)\Big (\varPhi _{x+a, \sigma ^2 \varDelta t}(x'+\varDelta _{.}/2) - \varPhi _{x+a, \sigma ^2 \varDelta t}(x'-\varDelta _{.}/2) \Big ) \right. \\&\left. \qquad - \sum _{a'}\nu '(a')\Big (\varPhi _{x+a', \sigma ^2 \varDelta t}(x'+\varDelta _{.}/2) - \varPhi _{x+a', \sigma ^2 \varDelta t}(x'-\varDelta _{.}/2)\Big )\right| \\&\quad = \left| \sum _{a} \left( \nu (a)\varPhi _{x+a, \sigma ^2 \varDelta t}(x'+\varDelta _{.}/2) - \nu '(a)\varPhi _{x+a, \sigma ^2 \varDelta t}(x'+\varDelta _{.}/2) \right) \right| \\&\qquad + \left| \sum _{a} \left( \nu (a)\varPhi _{x+a, \sigma ^2 \varDelta t}(x'-\varDelta _{.}/2) \Big ) - \nu '(a)\varPhi _{x+a, \sigma ^2 \varDelta t}(x'-\varDelta _{.}/2) \right) \right| \\&\quad = \int _{-\infty }^{x'+\varDelta _{.}/2} \frac{1}{\sigma \sqrt{2\pi \varDelta t }} \left| \sum _{a}(\nu (a) - \nu '(a))e^{-\frac{(y-(x+a))^2}{2\sigma ^2 \varDelta t}} \right| dy \\&\qquad + \int _{-\infty }^{x'-\varDelta _{.}/2} \frac{1}{\sigma \sqrt{2\pi \varDelta t }} \left| \sum _{a}(\nu (a) - \nu '(a)) e^{-\frac{(y-(x+a))^2}{2\sigma ^2 \varDelta t}} \right| dy \\&\quad \le c \Vert \nu - \nu '\Vert _\infty , \end{aligned}$$
where c is a constant depending only on the model (and in particular on the state space, the action space and \(\varDelta t\)). \(\square \)
The Bellman equation for the optimal Q function in the asymptotic MFC framework
In this appendix, we provide the derivation of the Bellman equation (8) for the modified Q-function presented in Sect. 3.3.
Let \(\mathcal {X}\) and \(\mathcal {A}\) be discrete and finite state and action spaces. Let \(V^\alpha : \mathcal {X} \mapsto \mathcal {R} \) and \(Q^\alpha : \mathcal {X} \times \mathcal {A} \mapsto \mathcal {R} \) be value function relative to the policy \(\alpha \) and the corresponding modified Q-function defined as follows
where
$$\begin{aligned} \mu ^{\alpha }= \lim _{n\mapsto \infty } \mathcal {L}(X^{\alpha }_{n}) \quad \text {and} \quad {\tilde{\alpha }}(s) = {\left\{ \begin{array}{ll} \alpha (s), &{}\quad \forall s\ne x,\\ a, &{}\quad \text {if } s= x.\\ \end{array}\right. } \end{aligned}$$
Theorem 2
The optimal \(Q^*(x,a)=\min _\alpha Q^\alpha (x,a)\) satisfies the Bellman equation
$$\begin{aligned} Q^*(x,a) = f(x, a, {\tilde{\mu }}^*) + \gamma \sum _{x' \in {\mathcal {X}}} p(x' | x, a, {\tilde{\mu }}^*) \min _{a'} Q^*(x',a'), \qquad (x,a) \in {\mathcal {X}} \times {\mathcal {A}}, \end{aligned}$$
(52)
where the optimal control \(\alpha ^*\) is given by \(\alpha ^*(x)={{\,\mathrm{arg\,min}\,}}_a Q^*(x,a)\), the modification \({{\tilde{\alpha }}}^*(x)\) is based on the pair (x, a) and \({\tilde{\mu }}^*:=\mu ^{{\tilde{\alpha }}^*}\).
Remark 3
The population distribution \({\tilde{\mu }}^*\) based on the modification of \(\alpha ^*\) given the pair \((x,\alpha ^*(x))\) is equal to \({\mu }^*\) . Indeed, \({\tilde{\alpha }}^*\) is equal to \(\alpha ^*\) itself, i.e.,
$$\begin{aligned} {\tilde{\alpha }}^*(s) = {\left\{ \begin{array}{ll} \alpha ^*(s), &{}\quad \forall s\ne x,\\ \alpha ^*(s), &{}\quad \text {if } s= x.\\ \end{array}\right. } \end{aligned}$$
Remark 4
The term \(\min _{a'} Q^*(x',a')\) does not depend on \({\tilde{\mu }}^*\) , i.e.,
where step \(\square \) is due to Remark 3. It follows that (52) depends on \({\tilde{\mu }}^*\) only through the cost due to the first step.
In order to prove Theorem 2, the following results are required.
Theorem 3
The Bellman equation for \(Q^{\alpha }\) is given by
$$\begin{aligned} Q^{\alpha }(x,a) = f(x,a,\mu ^{{\tilde{\alpha }}}) +\gamma {\mathbb {E}} \left[ Q^{\alpha }(X_{1},\alpha (X_{1}))\,\Big \vert \, X_{0} = x, A_{0} = a \right] , \end{aligned}$$
(53)
Lemma 3
The value function relative to the policy \(\alpha \) is equivalent to the corresponding Q-function evaluated on the pair \((x,\alpha (x))\), i.e.,
$$\begin{aligned} V^{\alpha }(x) = Q^{\alpha }(x,\alpha (x)). \end{aligned}$$
(54)
Theorem 4
(Policy improvement) Let \({\tilde{\alpha }}\) be a policy derived by \(\alpha \)
$$\begin{aligned} \begin{aligned} {\tilde{\alpha }}(s)&= {\left\{ \begin{array}{ll} \alpha (s), \quad &{}\text {for } s\ne x,\\ a, \quad &{}\text {for } s= x.\end{array}\right. } \end{aligned} \end{aligned}$$
such that
$$\begin{aligned} Q^{\alpha }(x,{\tilde{\alpha }}(x)) > V^{\alpha }(x). \end{aligned}$$
(55)
Then,
$$\begin{aligned} V^{{\tilde{\alpha }}}(x') > V^{\alpha }(x') \quad \forall x' \in {\mathcal {X}}. \end{aligned}$$
(56)
Theorem 5
Let \(V^*:\mathcal {X} \mapsto \mathcal {R}\) be defined as \(V^*(x)=\max _{\alpha } V^{\alpha }(x)\). Then,
$$\begin{aligned} V^*(x)= \max _a \max _{\alpha } Q^{\alpha }(x,a), \end{aligned}$$
(57)
Proof (Theorem 3)
$$\begin{aligned} \begin{aligned}&Q^{\alpha }(x,a) \\&\quad = f(x,a,\mu ^{{\tilde{\alpha }}}) + \\&\quad \quad +\gamma {\mathbb {E}} \left[ {\mathbb {E}} \left[ \sum _{n=1}^{\infty } \gamma ^{n-1} f(X_{n},\alpha (X_{n}),\mu ^{\alpha })\,\Big \vert \, X_{0} = x, A_{0} = \alpha (x), X_{1} \right] \,\Big \vert \, X_{0} = x, A_{0} = a \right] \\&\quad = f(x,a,\mu ^{{\tilde{\alpha }}}) + \gamma {\mathbb {E}} \left[ {\mathbb {E}} \left[ \sum _{n=1}^{\infty } \gamma ^{n-1} f(X_{n},\alpha (X_{n}),\mu ^{\alpha })\,\Big \vert \, X_{1} \right] \,\Big \vert \, X_{0} = x, A_{0} = a\right] \\&\quad = f(x,a,\mu ^{{\tilde{\alpha }}}) +\\&\quad \quad + \gamma {\mathbb {E}} \left[ f(X_{1},\alpha (X_{1}),\mu ^{\alpha }) + \gamma {\mathbb {E}} \left[ \sum _{n=2}^{\infty } \gamma ^{n-2} f(X_{n},\alpha (X_{n}),\mu ^{\alpha })\,\Big \vert \, X_{1} \right] \,\Big \vert \, X_{0} = x, A_{0} = a \right] \\&\quad = f(x,a,\mu ^{{\tilde{\alpha }}}) +\gamma {\mathbb {E}} \left[ Q^{\alpha }(X_{1},\alpha (X_{1}))\,\Big \vert \, X_{0} = x, A_{0} = a \right] , \end{aligned} \end{aligned}$$
\(\square \)
Proof (Lemma 3)
where we used that the modification of \(\alpha \) given the pair \((x,\alpha (x))\) is equal to \(\alpha \) itself and consequently \(\mu ^{\alpha }=\mu ^{{{\tilde{\alpha }}}}\). \(\square \)
Proof (Theorem 4)
Step 1 Show that \(V^{\alpha }(x) < V^{{\tilde{\alpha }}}(x) \).
We observe that
Considering the limit as \(k\rightarrow \infty \), it follows that
$$\begin{aligned} V^{\alpha }(x) <{\mathbb {E}} \left[ \sum _{n=0}^{\infty } \gamma ^n f(X_{n}, {\tilde{\alpha }}(X_{n}),\mu ^{{\tilde{\alpha }}}) \,\Big \vert \, X_{0}=x \right] = V^{{\tilde{\alpha }}}(x) \end{aligned}$$
Step 2 Show that \(V^{\alpha }(x') < V^{{\tilde{\alpha }}}(x') \quad \forall x' \in {\mathcal {X}}\setminus \{x\}\).
Let define \(\tau _x = \min \{ n : X_{n} = x \}\). Then,
We start analyzing the first term observing that \(X_{n} \ne x\) and \(\alpha (X_{n}) = {\tilde{\alpha }}(X_{n})\) for all \(n<=\tau _x - 1\). Then,
$$\begin{aligned} T_1 = {\mathbb {E}} \left[ \sum _{n=0}^{\tau _x-1} \gamma ^n f(X_{n},{\tilde{\alpha }}(X_{n}),\mu ^{{\tilde{\alpha }}} )\,\Big \vert \, X_{0} = x' \right] \end{aligned}$$
The analyses of the term \(T_2\) are based on the tower property (TP), the Markov property (MP) and Step 1 (S1). It follows that
Combining the analyses of \(T_1\) and \(T_2\), it follows that
$$\begin{aligned} \begin{aligned} V^{\alpha }(x')&= T_1 + T_2 \\&< {\mathbb {E}} \left[ \sum _{n=0}^{\tau _x-1} \gamma ^n f(X_{n},{\tilde{\alpha }}(X_{n}),\mu ^{{\tilde{\alpha }}} )\,\Big \vert \, X_{0} = x' \right] +{\mathbb {E}}\left[ \gamma ^{\tau _x} V^{{\tilde{\alpha }}}(X_{\tau _x}) \,\Big \vert \, X_{0} = x' \right] \\&= {\mathbb {E}} \left[ \sum _{n=0}^{\tau _x-1} \gamma ^n f(X_{n},{\tilde{\alpha }}(X_{n}),\mu ^{{\tilde{\alpha }}} )+\gamma ^{\tau _x}\sum _{n=\tau _x}^{\infty } \gamma ^{n-\tau _x} f(X_{n},{\tilde{\alpha }}(X_{n}),\mu ^{{\tilde{\alpha }}})\,\Big \vert \, X_{0} = x' \right] \\&={\mathbb {E}} \left[ \sum _{n=0}^{\infty } \gamma ^n f(X_{n},{\tilde{\alpha }}(X_{n}),\mu ^{{\tilde{\alpha }}} )\,\Big \vert \, X_{0} = x' \right] \\&=V^{{\tilde{\alpha }}}(x') \end{aligned} \end{aligned}$$
\(\square \)
Proof (Theorem 5)
Let \({\mathcal {X}}=\{ x_1, \dots , x_n\}\) and \({\mathcal {A}}=\{ a_0, \dots , a_m\}\) be the state and action spaces.
Step 1 Let \(\alpha ^0\) be an initial policy and define \(\alpha ^1\) as follows
$$\begin{aligned} \alpha ^1(x) = {\left\{ \begin{array}{ll} \arg \max _a Q^{\alpha ^0}(x,a), \quad &{}\text { if } x = x_1,\\ \alpha _0(x), \quad &{}\text { o.w. } \end{array}\right. } \end{aligned}$$
Then,
Step 2 Consider \(\alpha ^2\) defined as follows
$$\begin{aligned} \begin{aligned} \alpha ^2(x)&= {\left\{ \begin{array}{ll} \arg \max _a Q^{\alpha ^1}(x,a), \quad &{}\text { if } x = x_2,\\ \alpha _1(x), \quad &{}\text { o.w. } \end{array}\right. } \\&={\left\{ \begin{array}{ll} \arg \max _a Q^{\alpha ^1}(x,a), \quad &{}\text { if } x = x_2,\\ \arg \max _a Q^{\alpha ^0}(x,a), \quad &{}\text { if } x = x_1,\\ \alpha _0(x), \quad &{}\text { o.w. } \end{array}\right. } \end{aligned} \end{aligned}$$
Then,
Step \({\varvec{n}}\) Consider \(\alpha ^{n}\) defined as follows
$$\begin{aligned} \begin{aligned} \alpha ^n(x)&= {\left\{ \begin{array}{ll} \arg \max _a Q^{\alpha ^{n-1}}(x,a), \quad &{}\text { if } x = x_n,\\ \alpha _{n-1}(x), \quad &{}\text { o.w. } \end{array}\right. } \\&=\arg \max _a Q^{\alpha ^{k-1}}(x,a), \quad \quad \quad \text { if } x = x_k, \text { for } k =1,\dots , n , \end{aligned} \end{aligned}$$
Then,
Step \({\varvec{N}}\) Since the state and action spaces are finite, the policy can be improved only a finite number of times. In other words, \(\exists N>0\) such that
$$\begin{aligned} \alpha ^{N}(x) = \arg \max _a Q^{\alpha ^N}(x,a), \quad \forall x \in {\mathcal {X}} \end{aligned}$$
and
$$\begin{aligned} V^{\alpha ^N}(x) = Q^{\alpha ^N}(x,\alpha ^N(x)) = \max _a Q^{\alpha ^N}(x,a), \quad \forall x \in {\mathcal {X}}. \end{aligned}$$
Can \(\alpha ^N\) be still suboptimal? No, by extending Bellman and Dreyfus’s Optimality Theorem (1962), [3]. \(\square \)
Proof (Theorem (2))
where the last step is due to what shown in the proof of equation (57), i.e., the same policy \(\alpha ^*\) optimizes \(V^{\alpha }\) and \(Q^{\alpha }\). \(\square \)