Skip to main content
Log in

Semi-Markov control models for systems of large populations of interacting objects with possible unbounded costs: a mean field approach

  • Original Research
  • Published:
Annals of Operations Research Aims and scope Submit manuscript

Abstract

This paper is about optimal control problems associated to stochastic systems composed of a large number of N (\(N\sim \infty \)) interacting objects (e.g., particles, agents, data, etc.) evolving among a finite or countable set of classes or categories according to a semi-Markov process. Such systems are modeled by a control model \(\mathcal{S}\mathcal{M}_{N}\) where the states are vectors whose components are the proportions of objects in each class. Since N is too large, from a practical point of view, it is almost impossible to obtain a solution of the control problem. Under this setting, we apply a mean field approach which consists of letting \(N\rightarrow \infty \) (the mean field limit). Then we obtain the mean field control model \(\mathcal{S}\mathcal{M}\), independent on N,  which is easier to study than \(\mathcal{S}\mathcal{M}_{N}.\) Our main objective is to show that an optimal policy \(\pi _{*},\) under a discounted criterion, in \(\mathcal{S}\mathcal{M}\) has a good behavior in \(\mathcal{S}\mathcal{M}_{N}.\) Specifically, we prove that \(\pi _{*}\) is nearly discounted optimal in \(\mathcal{S}\mathcal{M}_{N}\) asymptotically as \(N\rightarrow \infty .\)

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Acciaio, B., Backhoff-Veraguas, J., & Carmona, R. (2019). Extended mean field control problems: Stochastic maximum principle and transport perspective. SIAM Journal on Control and Optimization, 57(6), 3666–3693.

    Article  Google Scholar 

  • Rami, M. A., Moore, J. B., & Zhou, X. Y. (2002). Indefinite stochastic linear quadratic control and generalized differential Riccati equation. SIAM Journal on Control and Optimization, 40(4), 1296–1311.

    Article  Google Scholar 

  • Bensoussan, A., Frehse, J., & Yam, P. (2013). Mean field games and mean field type control theory (Vol. 101). Springer.

    Book  Google Scholar 

  • Elliott, R., Li, X., & Ni, Y.-H. (2013). Discrete time mean-field stochastic linear-quadratic optimal control problems. Automatica, 49(11), 3222–3233.

    Article  Google Scholar 

  • Gast, N., & Gaujal, B. (2011). A mean field approach for optimization in discrete time. Discrete Event Dynamic Systems, 21(1), 63–101.

    Article  Google Scholar 

  • Gast, N., Gaujal, B., & Le Boudec, J. (2012). Mean field for Markov decision processes: From discrete to continuous optimization. IEEE Transactions on Automatic Control, 57(9), 2266–2280.

    Article  Google Scholar 

  • Hafayed, M. (2013). A mean-field necessary and sufficient conditions for optimal singular stochastic control. Communications in Mathematics and Statistics, 1(4), 417–435.

    Article  Google Scholar 

  • Higuera-Chan, C., Jasso-Fuentes, H., & Minjárez-Sosa, J. (2016). Discrete-time control for systems of interacting objects with unknown random disturbance distributions: A mean field approach. Applied Mathematics & Optimization, 74(1), 197–227.

    Article  Google Scholar 

  • Higuera-Chan, C., Jasso-Fuentes, H., & Minjárez-Sosa, J. (2017). Control systems of interacting objects modeled as a game against nature under a mean field approach. Journal of Dynamics & Games, 4(1), 59.

    Article  Google Scholar 

  • Le Boudec, J., McDonald, D., & Mundinger, J. (2007). A generic mean field convergence result for systems of interacting objects. In: Fourth International Conference on the Quantitative Evaluation of Systems (QEST 2007), pp. 3–18. IEEE.

  • Peyrard, N., & Sabbadin, R. (2006). Mean field approximation of the policy iteration algorithm for graph-based Markov decision processes. In: Proceedings of the 2006 Conference on ECAI 2006: 17th European Conference on Artificial Intelligence August 29–September 1, 2006, Riva del Garda, Italy, pp. 595–599.

  • Song, T., & Liu, B. (2021). Discrete-time mean-field stochastic linear-quadratic optimal control problem with finite horizon. Asian Journal of Control, 23(2), 979–989.

    Article  Google Scholar 

  • Higuera-Chan, C. G. (2021). Approximation and mean field control of systems of large populations. In D. Hernandez-Hernandez, F. Leonardi, R. H. Mena, & J. C. Pardo Millan (Eds.), Advances in probability and mathematical statistics. Springer.

    Google Scholar 

  • Martínez-Manzanares, M., & Minjárez-Sosa, J. (2021). A mean field absorbing control model for interacting objects systems. Discrete Event Dynamic Systems, 31(3), 349–372.

    Article  Google Scholar 

  • Dynkin, E., & Yushkevich, A. (1979). Controlled Markov processes (Vol. 235). Springer.

    Google Scholar 

  • Hernández-Lerma, O., & Lasserre, J. B. (2012). Discrete-time Markov control processes: Basic optimality criteria (Vol. 30). Springer.

    Google Scholar 

  • Hernández-Lerma, O. (2012). Adaptive Markov control processes (Vol. 79). Springer.

    Google Scholar 

  • Luque-Vásquez, F., & Minjárez-Sosa, J. (2005). Semi-Markov control processes with unknown holding times distribution under a discounted criterion. Mathematical Methods of Operations Research, 61(3), 455–468.

    Article  Google Scholar 

  • Luque-Vásquez, F., Minjárez-Sosa, J., & Carmen Rosas-Rosas, L. (2011). Semi-Markov control models with partially known holding times distribution: discounted and average criteria. Acta Applicandae Mathematicae, 114(3), 135–156.

    Article  Google Scholar 

  • Luque-Vásquez, F., & Minjárez-Sosa, J. A. (2014). A note on the \(\sigma \)-compactness of sets of probability measures on metric spaces. Statistics & Probability Letters, 84, 212–214.

    Article  Google Scholar 

  • Parthasarathy, K. (1967). Probability measures on metric spaces. Academic Press.

    Book  Google Scholar 

  • Hernández-Lerma, O., & Lasserre, J. B. (2012). Further topics on discrete-time Markov control processes (Vol. 42). Springer.

    Google Scholar 

  • Puterman, M. (2014). Markov decision processes: Discrete stochastic dynamic programming. Wiley.

    Google Scholar 

  • Ash, R. (1972). Probability and real analysis. Wiley.

    Google Scholar 

Download references

Funding

Work partially supported by Consejo Nacional de Ciencia y Tecnología (CONACYT - MÉXICO) under grant Ciencia Frontera 87787.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to J. Adolfo Minjárez-Sosa.

Ethics declarations

Conflicts of interest

Author M. Elena Martínez-Manzanares declares that she has no Conflict of interest. Author J. Adolfo Minjárez-Sosa declares that he has no Conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Proofs

Appendix: Proofs

1.1 Proof of Proposition 6

Fix \(k\in [\mathcal {K}]_0\), \(\mathcal {K}\in \mathbb {N}\) and \(\pi =\{f_\ell \}\in \Pi _M\), an arbitrary policy. Let \(\{a_t\}\in A\) be the sequence of controls corresponding to the application of \(\pi \), and \({\widetilde{M}^N(k)\in \mathbb {P}_N(S)}\) be the initial condition of the process \(\{M^N(t)\}\). Set

$$\begin{aligned} B_{ij,n}^{N}(t):=\mathbbm {1}_{\{\Delta _{ij}(a_{t})\}}(w_{n}^i(t)),\quad i,j\in S, n\in [N], \end{aligned}$$

where \(w_{n}^i(t)\) are i.i.d. uniform random variables on [0, 1] (see (4)–(7)). Notice that, for each \(t\in \mathbb {N}_0\), \(\{B_{ij,n}^{N}(t)\}_{ij,n}\) is a set of i.i.d. Bernoulli random variables with mean

$$\begin{aligned} E_{\widetilde{M}^N(k)}^{\pi }[B_{ij,n}^{N}(t)\vert a_{t}^{\pi , N}=a]=K_{ij}(a) \quad i,j\in S. \end{aligned}$$

For \(\varepsilon \in \mathcal {E}\), the Hoeffding inequality implies for each ij \(\in S\) and \(t\in \mathbb {N}_0\)

$$\begin{aligned} P_{\widetilde{M}^N(k)}^{\pi }\bigg [\bigg \vert \sum _{n=1}^{NM_{i}^{N}(t)}B_{ij,n}^{N}(t)-NM_{i}^{N}(t)K_{ij}(a_{t})\bigg \vert <N\varepsilon _{ij}\bigg ]> 1-2e^{-2N\varepsilon _{ij}^2}. \end{aligned}$$
(52)

Set

$$\begin{aligned} \Omega _{ij}=\bigg \{\omega \in \Omega ': \bigg \vert \sum _{n=1}^{NM_{i}^{N}(t)}B_{ij,n}^{N}(t)-NM_{i}^{N}(t)K_{ij}(a_{t})\bigg \vert <N\varepsilon _{ij}\bigg \}\subset \Omega ', \end{aligned}$$

and consider \(\bar{\Omega }=\cap _{i,j\in S}\Omega _{ij}\). Now, let \(\{\mu _{t}(\varepsilon ,\Theta _\mathcal {K}^N)\}_{t\in \mathbb {N}_0}\) be the sequence defined as

$$\begin{aligned} \mu _0(\varepsilon ,\Theta _\mathcal {K}^N):=\Theta _\mathcal {K}^N\text {; }\mu _t(\varepsilon ,\Theta _\mathcal {K}^N):=\vert \vert \varepsilon \vert \vert _{\mathcal {E}}\sum _{d=0}^{t-1}R^{d}+\Theta _\mathcal {K}^N R^t, \end{aligned}$$

with \(R=\sup _{a\in A}\sup _{j\in S}\#(S_j(a))\) and \(\Theta _\mathcal {K}^N\) as defined in (31). We now proceed by induction proving that, in \(\bar{\Omega }\), the following holds

$$\begin{aligned} \vert \vert M^N(t)-m(t)\vert \vert _{\infty }\le \mu _{t}(\varepsilon ,\Theta _\mathcal {K}^N)\text { }\forall t\in \mathbb {N}_0. \end{aligned}$$
(53)

For \(t=0\), we have that \(\vert \vert M^N(0)-m(0)\vert \vert _{\infty }=\vert \vert \widetilde{M}^N(k)-\widetilde{m}(k)\vert \vert _{\infty }\le \Theta _\mathcal {K}^N\). Assume that \(\vert \vert M^N(t)-m(t)\vert \vert _{\infty }\le \mu _{t}(\varepsilon ,\Theta _\mathcal {K}^N)\) for a particular \(t\in \mathbb {N}\). Then, from (4), (8), (32) and (54), it follows for each \(j\in S\)

$$\begin{aligned} \vert M_{j}^N(t+1)-m_j(t+1)\vert&=\bigg \vert \sum _{i=1}^{\infty }\frac{1}{N}\bigg [\sum _{n=1}^{NM_i^N(t)}B_{ij,n}^{N}(t)-Nm_i(t)K_{ij}(a_t)\bigg ]\bigg \vert \\&\le \sum _{i=1}^{\infty }\frac{1}{N}\bigg \vert \sum _{n=1}^{NM_i^N(t)}B_{ij,n}^{N}(t)-Nm_i(t)K_{ij}(a_t)\bigg \vert \\&\le \sum _{i=1}^{\infty }\frac{1}{N}\bigg \vert \sum _{n=1}^{NM_i^N(t)}B_{ij,n}^{N}(t)-NM_i^N(t)K_{ij}(a_t)\bigg \vert \\&+\sum _{i=1}^{\infty }\vert M_i^N(t)-m_i(k)\vert K_{ij}(a_t)\\&\le \sum _{i=1}^{\infty }\varepsilon _{ij}+\#(S_j(a_t))\mu _t(\varepsilon ,\Theta _\mathcal {K}^N)\\&\le \sum _{i=1}^{\infty }\varepsilon _{ij}+R\mu _{t}(\varepsilon ,\Theta _\mathcal {K}^N). \end{aligned}$$

This is, \(\vert M_j^N(t+1)-m_j(t+1)\vert \le \sum _{i=1}^{\infty }\varepsilon _{ij}+R\mu _t(\varepsilon ,\Theta _\mathcal {K}^N)\) \(\forall t\in \mathbb {N}_0\) over \(\bar{\Omega }\). Hence,

$$\begin{aligned} \vert \vert M^N(t+1)-m(t+1)\vert \vert _{\infty }\le & {} \sup _{j\in S}\sum _{i=1}^{\infty }\varepsilon _{ij}+R\mu _t(\varepsilon ,\Theta _\mathcal {K}^N)\\= & {} \vert \vert \varepsilon \vert \vert _{\mathcal {E}}+R\mu _t(\varepsilon ,\Theta _\mathcal {K}^N)\\= & {} \vert \vert \varepsilon \vert \vert _{\mathcal {E}}+R\left( \vert \vert \varepsilon \vert \vert _{\mathcal {E}}\sum _{d=0}^{t-1}R^d+\Theta _\mathcal {K}^N R^t\right) \\= & {} \vert \vert \varepsilon \vert \vert _{\mathcal {E}}\sum _{d=0}^{t}R^d+\Theta _\mathcal {K}^N R^{t+1}\\= & {} \mu _{t+1}(\varepsilon ,\Theta _\mathcal {K}^N), \end{aligned}$$

which proves (54). Furthermore, notice \(\{\mu _{t}(\varepsilon ,\Theta _\mathcal {K}^N)\}_{t\in \mathbb {N}_0}\) is an increasing sequence on t. This implies that, for \(T\in \mathbb {N}\), \(\mu _{t}(\varepsilon ,\Theta _\mathcal {K}^N)\le \mu _{T}(\varepsilon ,\Theta _\mathcal {K}^N)\) \(\forall t\le T\). Then, from (53), (54) and an induction argument over T, we have

$$\begin{aligned} P_{\widetilde{M}^N(k)}^{\pi }\left[ \sup _{t\in [T]_0}\vert \vert M^N(t+1)-m(t+1)\vert \vert _{\infty }>\mu _T(\varepsilon ,\Theta _\mathcal {K}^N)\right] > 2Te^{-2N\varepsilon _{ij}^2}\text { }\forall i,j\in S. \end{aligned}$$

Setting \(C=\lambda =2\), we prove (33).

Finally, (34) follows from similar arguments as presented in Higuera-Chan (2021). \(\square \)

1.2 Proof of Theorem 8

Lemma 9

Let Assumption 1 and 3 holds. For all \(t\in \mathbb {N}_0\) and \(\pi \in \Pi _{M}\) it follows

$$\begin{aligned} E_{\widetilde{M}^N(k)}^{\pi }\vert \Gamma _{\alpha }^{N,t}-\Gamma _{\alpha }^{t}\vert \le \sigma ^{t-1}L_{G}t\left( CTe^{-\lambda N\varepsilon _{ij}^2}+\mu _T(\varepsilon ,\Theta _\mathcal {K}^N)\right) \end{aligned}$$
(54)

for all \(i,j\in S\), and \(\sigma \) is defined in (17).

Proof

We proceed by induction. For \(t=0\), (14) and (40) implies (55). Assume (55) is valid for \(t\in \mathbb {N}\). This is,

$$\begin{aligned} E_{\widetilde{M}^N(k)}^{\pi }\left| \Gamma _{\alpha }^{N,t}-\Gamma _{\alpha }^{t}\right| \le \sigma ^{t-1}L_{G}t\left( CTe^{-\lambda N\varepsilon _{ij}^2}+\mu _T(\varepsilon ,\Theta _\mathcal {K}^N)\right) . \end{aligned}$$

Notice

$$\begin{aligned} E_{\widetilde{M}^N(k)}^{\pi }\vert \Gamma _{\alpha }^{N,t+1}-\Gamma _{\alpha }^{t+1}\vert= & {} E_{\widetilde{M}^N(k)}^{\pi }\left| \Gamma _{\alpha }^{N,t}\beta _{\alpha }(M^{N}(t))-\Gamma _{\alpha }^{t}\beta _{\alpha }(m(t))\right| \\\le & {} E_{\widetilde{M}^N(k)}^{\pi }\Gamma _{\alpha }^{N,t}\left| \beta _{\alpha }(M^{N}(t))-\beta _{\alpha }(m(t))\right| \\{} & {} +E_{\widetilde{M}^N(k)}^{\pi }\beta _{\alpha }(m(t))\left| \Gamma _{\alpha }^{N,t}-\Gamma _{\alpha }^{t}\right| . \end{aligned}$$

Now, from (18), Assumption 2 (g), and (34), it follows:

$$\begin{aligned}{} & {} \hspace{-0.9cm}E_{\widetilde{M}^N(k)}^{\pi }\vert \Gamma _{\alpha }^{N,t+1}-\Gamma _{\alpha }^{t+1}\vert \\{} & {} \hspace{-0.8cm}\le \sigma ^{t}L_{G}E_{\widetilde{M}^N(k)}^{\pi }\left[ \vert \vert M^{N}(t)-m(t)\vert \vert _{\infty }\right] +\sigma E_{\widetilde{M}^N(k)}^{\pi }\left| \Gamma _{\alpha }^{N,t}-\Gamma _{\alpha }^{t}\right| \\{} & {} \hspace{-0.8cm}\le \sigma ^{t}L_{G}\left( CTe^{-\lambda N\varepsilon _{ij}^2}+\mu _T(\varepsilon ,\Theta _\mathcal {K}^N)\right) +\sigma \sigma ^{t-1}L_{G}t\left( CTe^{-\lambda N\varepsilon _{ij}^2}+\mu _T(\varepsilon ,\Theta _\mathcal {K}^N)\right) \\{} & {} \hspace{-0.8cm}=\sigma ^{t}L_{G}(t+1)\left( CTe^{-\lambda N\varepsilon _{ij}^2}+\mu _T(\varepsilon ,\Theta _\mathcal {K}^N)\right) . \end{aligned}$$

Hence, we conclude (55). \(\square \)

Proposition 10

Under Assumptions 1, 2 and 3, we have, for \(\pi \in \Pi _M\),

$$\begin{aligned} \limsup _{N\rightarrow \infty } \Delta (t,N,k)=0,\text { }t,k\in \mathbb {N}_0, \end{aligned}$$
(55)

where

$$\begin{aligned} \Delta (t,N,k):=E_{\widetilde{M}^N(k)}^{\pi }\left[ \vert \Gamma _{\alpha }^{N,t}-\Gamma _{\alpha }^t\vert W(M^N(t))\right] . \end{aligned}$$

Proof

Observe that the right-hand side of (55) converges to zero as \(N\rightarrow \infty \) and \(\vert \vert \varepsilon \vert \vert _\mathcal {E}\rightarrow 0\) [see Remark 3 and Proposition 6]. This implies \(E_{\widetilde{M}^N(k)}^{\pi }\left[ \vert \Gamma _{\alpha }^{N,t}-\Gamma _{\alpha }^t\vert \right] \rightarrow 0\) as \(N\rightarrow \infty \) and \(\vert \vert \varepsilon \vert \vert _\mathcal {E}\rightarrow 0\) which in turn yields

$$\begin{aligned} \vert \Gamma _{\alpha }^{N,t}-\Gamma _{\alpha }^t\vert \xrightarrow {P_{\widetilde{M}^N(k)}^{\pi }} 0 \end{aligned}$$
(56)

as \(N\rightarrow \infty \) and \(\vert \vert \varepsilon \vert \vert _\mathcal {E}\rightarrow 0\).

On the other hand, the sequence \(\{\vert \Gamma _{\alpha }^{N,t}-\Gamma _{\alpha }^t\vert W(M^N(t))\}\) is uniform integrable. This follows from (57), (Ash (1972), Lemma 7.6.9, p. 301) and the fact

$$\begin{aligned} \sup _{t\in \mathbb {N}_0}E_{\widetilde{M}^N(k)}^{\pi }\left[ \vert \Gamma _{\alpha }^{N,t}-\Gamma _{\alpha }^t\vert W(M^N(t))\right] ^{p}\le & {} 2^p\sup _{t\in \mathbb {N}_0}E_{\widetilde{M}^N(k)}^{\pi }\left[ W^p(M^N(t))\right] \\\le & {} 2^p\left( 1+\frac{\bar{b}}{1-\bar{\rho }}\right) W^p(\widetilde{M}^N(k))<\infty , \end{aligned}$$

where the first inequality is due to (14) and (40) (see (17) and (18)), while the last inequality comes from (26). The proof is completed by applying (Ash (1972), Theorem 7.5.2, p. 295) whenever we show

$$\begin{aligned} \vert \Gamma _{\alpha }^{N,t}-\Gamma _{\alpha }^t\vert W(M^N(t))\xrightarrow {P_{\widetilde{M}^N(k)}^{\pi }} 0 \end{aligned}$$

as \(N\rightarrow \infty \) and \(\vert \vert \varepsilon \vert \vert _\mathcal {E}\rightarrow 0\), but this follows from (25), (57) and the relations

$$\begin{aligned}{} & {} P_{\widetilde{M}^N(k)}^{\pi }\left[ \vert \Gamma _{\alpha }^{N,t}-\Gamma _{\alpha }^t\vert W(M^N(t))>\eta \right] \\{} & {} \hspace{3cm}\le P_{\widetilde{M}^N(k)}^{\pi }\left[ \vert \Gamma _{\alpha }^{N,t}-\Gamma _{\alpha }^t\vert>\frac{\eta }{\ell }\right] +P_{\widetilde{M}^N(k)}^{\pi }\left[ W(M^N(t))>\ell \right] \\{} & {} \hspace{3cm}\le P_{\widetilde{M}^N(k)}^{\pi }\left[ \vert \Gamma _{\alpha }^{N,t}-\Gamma _{\alpha }^t\vert >\frac{\eta }{\ell }\right] +\frac{E_{\widetilde{M}^N(k)}^{\pi }\left[ W(M^N(t))\right] }{\ell } \end{aligned}$$

where \(\eta \) and \(\ell \) are arbitrary positive numbers. \(\square \)

For \(\pi \in \Pi _{M}\), \(m\in \mathbb {P}_{N}(S)\subset \mathbb {P}(S)\) and \( T\in \mathbb {N}\), we define the following the expected costs

$$\begin{aligned} V_{T}^{N}(\pi ,m):=E_{m}^{\pi }\left[ \sum _{t=0}^{T-1}\Gamma _{\alpha }^{N,t}r(M^{N}(t))\right] \end{aligned}$$

and

$$\begin{aligned} v_{T}(\pi ,m):=\sum _{t=0}^{T-1}\Gamma _{\alpha }^{t}r(m(t)). \end{aligned}$$

Proposition 11

Under Assumption 1, 2 and 3, for each \(m\in \mathbb {P}_N(S)\), \(\varepsilon \in \mathcal {E}\), \({T,\mathcal {K}\in \mathbb {N}}\), \(t\in [T]_0\) and \(k\in [\mathcal {K}]_0\) we have

$$\begin{aligned} E_{m}^{\varphi }\left[ \sup _{\pi \in \Pi _M}\vert V_T^N(\pi ,\widetilde{M}^N(k))-v_T(\pi ,\widetilde{m}(k))\vert \right]\le & {} \vert \vert r\vert \vert _{W}T\max _{t\in [T-1]_0}\Delta (t,N,k)\nonumber \\{} & {} \hspace{-6.5cm}+\left( L_r\left( CTe^{-\lambda N\varepsilon _{ij}^2}+\mu _{T}(\varepsilon ,\Theta _\mathcal {K}^N)\right) \right) \left( \frac{1-\sigma ^T}{1-\sigma }\right) \text { }\forall i,j\in S. \end{aligned}$$
(57)

Proof

Fix \(\pi \in \Pi _M\) and \(T\in \mathbb {N}\). From (21) and Proposition 6 it follows

$$\begin{aligned} E_{\widetilde{M}^N(k)}^{\pi }\vert r(M^N(t))-r(m(t))\vert&\le E_{\widetilde{M}^N(k)}^{\pi }\left( L_r\vert \vert M^N(t)-m(t)\vert \vert _{\infty }\right) \nonumber \\&\le L_r E_{\widetilde{M}^N(k)}^{\pi }\left( \max _{t\in [T]_0}\vert \vert M^N(t)-m(t)\vert \vert _{\infty }\right) \nonumber \\&\le L_r\left( CTe^{-\lambda N\varepsilon _{ij}^2}+\mu _{T}(\varepsilon ,\Theta _\mathcal {K}^N)\right) , \end{aligned}$$
(58)

for all \(i,j\in S\). Thus, for each \(\pi \in \Pi _M\), we have

$$\begin{aligned}{} & {} \vert V_T(\pi , \widetilde{M}^N(k))-v_T(\pi , \widetilde{m}(k))\vert \nonumber \\{} & {} \hspace{1.5cm}= \Big \vert E_{\widetilde{M}^N(k)}^{\pi }\left\{ \sum _{t=0}^{T-1} \Gamma ^{N,t}_{\alpha }r(M^N(t))-\sum _{t=0}^{T-1}\Gamma ^{t}_{\alpha } r(m(t))\right\} \Big \vert \nonumber \\{} & {} \hspace{1.5cm}\le \Big \vert E_{\widetilde{M}^N(k)}^{\pi }\left\{ \sum _{t=0}^{T-1} \Gamma ^{N,t}_{\alpha }r(M^N(t))-\sum _{t=0}^{T-1}\Gamma ^{t}_{\alpha }r(M^N(t))\right\} \Big \vert \nonumber \\{} & {} \hspace{1.5cm}+ \Big \vert E_{\widetilde{M}^N(k)}^{\pi } \left\{ \sum _{t=0}^{T-1}\Gamma ^{t}_{\alpha }r(M^N(t))-\sum _{t=0}^{T-1} \Gamma ^{t}_{\alpha }r(m(t))\right\} \Big \vert \nonumber \\{} & {} \hspace{1.5cm}\le \vert \vert r\vert \vert _{W}E_{\widetilde{M}^N(k)}^{\pi } \left\{ \sum _{t=0}^{T-1}\Big \vert \Gamma ^{N,t}_{\alpha }-\Gamma ^{t}_{\alpha }\Big \vert W(M^N(t))\right\} \nonumber \\{} & {} \hspace{1.5cm}+ \Big \vert E_{\widetilde{M}^N(k)}^{\pi }\left\{ \sum _{t=0}^{T-1}\sigma ^{t} (r(M^N(t))-r(m(t)))\right\} \Big \vert \text { (see }(18),(40)) \nonumber \\{} & {} \hspace{1.5cm}\le \vert \vert r\vert \vert _{W}T\max _{t\in [T-1]_0}\Delta (t,N,k)\nonumber \\{} & {} \hspace{1.5cm}+ \left( L_r\left( CTe^{-\lambda N\varepsilon _{ij}^2}+\mu _{T}(\varepsilon ,\Theta _\mathcal {K}^N)\right) \right) \left( \frac{1-\sigma ^T}{1-\sigma }\right) \text { (see }(59)). \end{aligned}$$
(59)

These relations yields (58).

\(\square \)

Proposition 12

Under Assumptions 1, 2 and 3, for each \(m\in \mathbb {P}_N(S)\), \(\varepsilon \in \mathcal {E}\), \(T,\mathcal {K}\in \mathbb {N}\), \(t\in [T]_0\) and \(k\in [\mathcal {K}]_0\) we have

$$\begin{aligned}{} & {} \hspace{-1.5cm}E_{m}^{\varphi }\left[ \sup _{\pi \in \Pi _M}\vert V^N(\pi , \widetilde{M}^N(k))-V_{T}^{N}(\pi , \widetilde{M}^N(k))\vert \right] \nonumber \\{} & {} \hspace{3cm}\le \vert \vert r\vert \vert _W\left( 1+\frac{b}{1-\rho }\right) ^2W(m)\frac{\sigma ^T}{1-\sigma }; \end{aligned}$$
(60)

and

$$\begin{aligned} \vert v(\pi ,\widetilde{m}(k))-v_T(\pi ,\widetilde{m}(k))\vert \le \vert \vert r\vert \vert _W\left( 1+\frac{b}{1-\rho }\right) ^2W(m)\frac{\sigma ^T}{1-\sigma }. \end{aligned}$$
(61)

Proof

Fix \(\pi \in \Pi _M\). Relation (61) is consequence of the following inequalities:

$$\begin{aligned}{} & {} {\vert V^N(\pi ,\widetilde{M}^N(k))-V_{T}^{N}(\pi , \widetilde{M}^N(k))\vert }\nonumber \\{} & {} \qquad \le \Big \vert E_{\widetilde{M}^N(k)}^{\pi }\left\{ \sum _{t=0}^{\infty }\Gamma _{\alpha }^{N,t}r(M^N(t))-\sum _{t=0}^{T-1}\Gamma _{\alpha }^{N,t}r(M^N(t))\right\} \Big \vert \nonumber \\{} & {} \qquad \le \sum _{t=T}^{\infty }E_{\widetilde{M}^N(k)}^{\pi }\sigma ^{t}\vert r(M^N(t))\vert \frac{W(M^N(t))}{W(M^N(t))}\text { (see (18))}\nonumber \\{} & {} \qquad \le \vert \vert r\vert \vert _W\sum _{t=T}^{\infty }\sigma ^{t}E_{\widetilde{M}^N(k)}^{\pi } W(M^N(t))\nonumber \\{} & {} \qquad \le \vert \vert r\vert \vert _W\left( 1+\frac{b}{1-\rho }\right) W(\widetilde{M}^N(k))\sum _{t=T}^{\infty }\sigma ^{t}\nonumber \text { (see (27))}\\{} & {} \qquad \le \vert \vert r\vert \vert _W\left( 1+\frac{b}{1-\rho }\right) W(\widetilde{M}^N(k))\frac{\sigma ^T}{1-\sigma }. \end{aligned}$$
(62)

Taking the supremum over \(\pi \in \Pi _M\) and expectation \(E_{m}^{\varphi }\), it follows that

$$\begin{aligned}{} & {} {E_{m}^{\varphi }\left[ \sup _{\pi \in \Pi _M}\vert V^N(\pi , \widetilde{M}^N(k))-V_{T}^{N}(\pi , \widetilde{M}^N(k))\vert \right] }\nonumber \\{} & {} \le \vert \vert r\vert \vert _W\left( 1+\frac{b}{1-\rho }\right) \frac{\sigma ^T}{1-\sigma } E_{m}^{\varphi }\left[ W(\widetilde{M}^N(k))\right] \\{} & {} \le \vert \vert r\vert \vert _W\left( 1+\frac{b}{1-\rho }\right) ^2W(m)\frac{\sigma ^T}{1-\sigma }\text { (see (27))} \end{aligned}$$

which proves (61).

By applying similar arguments, together with (39) we obtain (62). \(\square \)

1.2.1 Proof of Theorem 8 (a)

Let \(\pi _{*}^{N}=\{f_{*}^{N}\}\in \Pi _M^N\) be a stationary optimal policy for the model \(\mathcal{S}\mathcal{M}_N\) and \(\tilde{f}\in \mathbb {F}\) be an arbitrary selector. Fix \(\bar{\pi }=\{\bar{f}\}\in \Pi _M\), where \(\bar{f}:\mathbb {P}(S)\rightarrow A\) is defined as

$$\begin{aligned} \bar{f}(m) = f_{*}^{N}(m)\mathbbm {1}_{\mathbb {P}_N}(m)+\tilde{f}(m)\mathbbm {1}_{[\mathbb {P}_N]^c}(m). \end{aligned}$$

Recall, \(\widetilde{M}(k)\) and \(\widetilde{m}(k)\) are the trajectories generated by the policy \(\varphi \in \Pi _M\) (see Subsection 5.1). Due to (3) and the sufficiency of Markov policies, given an initial configuration \(m\in \mathbb {P}_N(S)\), it follows

$$\begin{aligned} \inf _{\pi \in \Pi ^N_M}V^N(\pi , m)=V_{*}^{N}(m)=V^N(\pi _{*}^N,m)=V^N(\bar{\pi },m)=\inf _{\pi \in \Pi _M}V^N(\pi ,m), \end{aligned}$$

for all \(m\in \mathbb {P}_N(S)\subset \mathbb {P}(S)\). This implies

$$\begin{aligned}{} & {} \hspace{-1cm}\vert V_{*}^N(\widetilde{M}^N(k))-v_{*}(\widetilde{m}(k))\vert \nonumber \\{} & {} \hspace{1cm}=\vert \inf _{\pi \in \Pi _M^N}V^N(\pi , \widetilde{M}^N(k))-\inf _{\pi \in \Pi _M}v(\pi , \widetilde{m}(k))\vert \nonumber \\{} & {} \hspace{1cm}=\vert \inf _{\pi \in \Pi _M}V^N(\pi , \widetilde{M}^N(k))-\inf _{\pi \in \Pi _M}v(\pi , \widetilde{m}(k))\vert \nonumber \\{} & {} \hspace{1cm}\le \sup _{\pi \in \Pi _M}\vert V^N(\pi , \widetilde{M}^N(k))-v(\pi , \widetilde{m}(k))\vert \text { } k\in \mathbb {N}_0. \end{aligned}$$
(63)

Thus, for each \(m\in \mathbb {P}_N(S)\), \(t\in [T]_0\) and \(0\le k\le \mathcal {K}\),

$$\begin{aligned}{} & {} {E_{m}^{\varphi }\vert V_{*}^N(\widetilde{M}^N(k))-v_{*}(\widetilde{m}(k))\vert }\nonumber \\{} & {} \le E_{m}^{\varphi }\left[ \sup _{\pi \in \Pi _M}\vert V^N(\pi ,\widetilde{M}^N(k))-v(\pi ,\widetilde{m}(k))\vert \right] \nonumber \\{} & {} \le E_{m}^{\varphi }\bigg [\sup _{\pi \in \Pi _M}\bigg \{\vert V^N(\pi ,\widetilde{M}^N(k))-V_{T}^N(\pi , \widetilde{M}^N(k))\vert \nonumber \\{} & {} +\vert V_{T}^N(\pi , \widetilde{M}^N(k))-v_T(\pi ,\widetilde{m}(k))\vert +\vert v_T(\pi ,\widetilde{m}(k))-v(\pi ,\widetilde{m}(k))\vert \bigg \}\bigg ]\nonumber \\{} & {} \le E_{m}^{\varphi }\bigg [\sup _{\pi \in \Pi _M}\vert V^N(\pi ,\widetilde{M}^N(k))-V_{T}^N(\pi , \widetilde{M}^N(k))\vert \bigg ]\nonumber \\{} & {} +E_{m}^{\varphi }\bigg [\sup _{\pi \in \Pi _M}\vert V_{T}^N(\pi , \widetilde{M}^N(k))-v_T(\pi ,\widetilde{m}(k))\vert \bigg ]\nonumber \\{} & {} +E_{m}^{\varphi }\bigg [\sup _{\pi \in \Pi _M}\vert v_T(\pi ,\widetilde{m}(k))-v(\pi ,\widetilde{m}(k))\vert \bigg ]\nonumber \\{} & {} \le \vert \vert r\vert \vert _{W}T \max _{t\in [T-1]_0}\Delta (t,N,k)+\left( L_r\left( CTe^{-\lambda N\varepsilon _{ij}^2}+ \mu _{T}(\varepsilon ,\Theta _\mathcal {K}^N)\right) \right) \left( \frac{1-\sigma ^T}{1-\sigma }\right) \nonumber \\{} & {} +2\vert \vert r\vert \vert _W\left( 1+\frac{b}{1-\rho }\right) ^2W(m)\frac{\sigma ^T}{1-\sigma }, \end{aligned}$$
(64)

where the last inequality is due to Proposition 11 and Proposition 12. Therefore, Theorem 8 (a) holds.

1.2.2 Proof of Theorem 8 (b)

Let \(\{\widetilde{M}^N(k)\}\) and \(\{\widetilde{m}(k)\}\) be the trajectories corresponding to the application of the policy \(\pi _*=\{f_*\}\) with initial condition \(\widetilde{M}^N(0)=\widetilde{m}(0)=m\in \mathbb {P}_N(S)\). Observe that from Theorem 8 (a)

$$\begin{aligned}{} & {} { E_{m}^{\varphi } \vert V_{*}^{N}(\widetilde{M}^N(k)) -v_{*}(\widetilde{m}(k))\vert }\nonumber \\{} & {} \le \vert \vert r\vert \vert _{W}T\max _{t\in [T-1]_0}\Delta (t,N,k)+\left( L_r\left( CTe^{-\lambda N\varepsilon _{ij}^2}+\mu _{T}(\varepsilon ,\Theta _\mathcal {K}^N\right) \right) \left( \frac{1-\sigma ^T}{1-\sigma }\right) \nonumber \\{} & {} +2\vert \vert r\vert \vert _W\left( 1+\frac{b}{1-\rho }\right) ^2W(m)\frac{\sigma ^T}{1-\sigma }. \end{aligned}$$
(65)

Combining this fact with the Markov property [see, e.g., Hernández-Lerma and Lasserre (2012)] we obtain

$$\begin{aligned}{} & {} {\Big \vert \int _{\mathbb {R}^N}(\beta _{\alpha }(m)V_{*}^N[H^N(m,f_*,w)]-\beta _{\alpha }(m)v_{*}[H(m,f_*)])\theta (dw)\Big \vert }\nonumber \\{} & {} \le \beta _{\alpha }(m)\int _{\mathbb {R}^N}\Big \vert V_{*}^N[H^N(m,f_*,w)]-v_{*}[H(m,f_*)]\Big \vert \theta (dw)\nonumber \\{} & {} =E_{m}^{\pi _*}\Big \vert V_{*}^N(\widetilde{M}^N(1))-v_{*}(\widetilde{m}(1))\Big \vert \nonumber \\{} & {} \le \vert \vert r\vert \vert _{W}T\max _{t\in [T-1]_0}\Delta (t,N,1)+\left( L_r\left( CTe^{-\lambda N\varepsilon _{ij}^2}+\mu _{T}(\varepsilon ,\Theta _\mathcal {K}^N)\right) \right) \left( \frac{1-\sigma ^T}{1-\sigma }\right) \nonumber \\{} & {} +2\vert \vert r\vert \vert _W\left( 1+\frac{b}{1-\rho }\right) ^2W(m)\frac{\sigma ^T}{1-\sigma }, \end{aligned}$$
(66)

where (67) is due (16) and (66).

Now, let

$$\begin{aligned} \Phi (m,f)=r(m)-\beta _{\alpha }(m) v_{*}[H(m,f)]-v_*(m), \end{aligned}$$

be the discrepancy function of the mean field control model. It is easy to see from (41) that \(\Phi (m,f_*)=0\). Then, considering the discrepancy function \(\Phi ^N\) defined in (46) we get

$$\begin{aligned}{} & {} \hspace{-0.5cm}\Phi ^N(m,f_*)=\Big \vert \Phi ^N(m,f_*)-\Phi (m,f_*)\Big \vert \nonumber \\= & {} \Big \vert r(m)+\beta _{\alpha }(m)\int _{\mathbb {R}^N}V_{*}^N[H^N(m,f_*,w)]\theta (dw)-V_{*}^{N}(m)\nonumber \\{} & {} -\left( r(m)-\beta _{\alpha }(m) v_{*}[H(m,f_*)]-v_*(m)\right) \Big \vert \nonumber \\\le & {} \vert V_{*}^N(m)-v_{*}(m)\vert \nonumber \\{} & {} +\Big \vert \int _{\mathbb {R}^N}(\beta _{\alpha }(m)V_{*}^N[H^N(m,f_*,w)]-\beta _{\alpha }(m)v_{*}[H(m,f_*)])\theta (dw)\Big \vert .\nonumber \end{aligned}$$
(67)

Finally, Theorem 8 (a) with \(k=0\) and (67), imply

$$\begin{aligned} \Phi ^N(m,f_*)\le & {} \vert \vert r\vert \vert _{W}T\left( \max _{t\in [T-1]_0}\Delta (t,N,1)+\max _{t\in [T-1]_0}\Delta (t,N,0)\right) \\{} & {} \hspace{-1.5cm}+2\Bigg (\left( L_r\left( CTe^{-\lambda N\varepsilon _{i\ell }^2}+\mu _{T}(\varepsilon ,\Theta _\mathcal {K}^N)\right) \right) \left( \frac{1-\sigma ^T}{1-\sigma }\right) \\{} & {} \hspace{-0.5cm}+2\vert \vert r\vert \vert _W\left( 1+\frac{b}{1-\rho }\right) ^2W(m)\frac{\sigma ^T}{1-\sigma }\Bigg ). \end{aligned}$$

Because T is arbitrary, letting \(\vert \vert \varepsilon \vert \vert _{\mathcal {E}}\rightarrow 0\) and \(N\rightarrow \infty \), and considering Remark 3, we obtain (45). \(\square \)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Martínez-Manzanares, M.E., Minjárez-Sosa, J.A. Semi-Markov control models for systems of large populations of interacting objects with possible unbounded costs: a mean field approach. Ann Oper Res (2024). https://doi.org/10.1007/s10479-024-05937-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10479-024-05937-2

Keywords

Navigation