Semi-Markov control models for systems of large populations of interacting objects with possible unbounded costs: a mean field approach

Martínez-Manzanares, M. Elena; Minjárez-Sosa, J. Adolfo

doi:10.1007/s10479-024-05937-2

Semi-Markov control models for systems of large populations of interacting objects with possible unbounded costs: a mean field approach

Original Research
Published: 08 April 2024

(2024)
Cite this article

Annals of Operations Research Aims and scope Submit manuscript

M. Elena Martínez-Manzanares¹^na1 &
J. Adolfo Minjárez-Sosa ORCID: orcid.org/0000-0002-3453-4508¹^na1

52 Accesses
Explore all metrics

Abstract

This paper is about optimal control problems associated to stochastic systems composed of a large number of N ($N\sim \infty $) interacting objects (e.g., particles, agents, data, etc.) evolving among a finite or countable set of classes or categories according to a semi-Markov process. Such systems are modeled by a control model $\mathcal{S}\mathcal{M}_{N}$ where the states are vectors whose components are the proportions of objects in each class. Since N is too large, from a practical point of view, it is almost impossible to obtain a solution of the control problem. Under this setting, we apply a mean field approach which consists of letting $N\rightarrow \infty $ (the mean field limit). Then we obtain the mean field control model $\mathcal{S}\mathcal{M}$, independent on N, which is easier to study than $\mathcal{S}\mathcal{M}_{N}.$ Our main objective is to show that an optimal policy $\pi _{*},$ under a discounted criterion, in $\mathcal{S}\mathcal{M}$ has a good behavior in $\mathcal{S}\mathcal{M}_{N}.$ Specifically, we prove that $\pi _{*}$ is nearly discounted optimal in $\mathcal{S}\mathcal{M}_{N}$ asymptotically as $N\rightarrow \infty .$

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Conservative and Semiconservative Random Walks: Recurrence and Transience

Article 27 February 2017

Existence and Uniqueness of Quasi-stationary Distributions for Symmetric Markov Processes with Tightness Property

Article 17 January 2019

On some mean field games and master equations through the lens of conservation laws

Article 16 April 2024

References

Acciaio, B., Backhoff-Veraguas, J., & Carmona, R. (2019). Extended mean field control problems: Stochastic maximum principle and transport perspective. SIAM Journal on Control and Optimization, 57(6), 3666–3693.
Article Google Scholar
Rami, M. A., Moore, J. B., & Zhou, X. Y. (2002). Indefinite stochastic linear quadratic control and generalized differential Riccati equation. SIAM Journal on Control and Optimization, 40(4), 1296–1311.
Article Google Scholar
Bensoussan, A., Frehse, J., & Yam, P. (2013). Mean field games and mean field type control theory (Vol. 101). Springer.
Book Google Scholar
Elliott, R., Li, X., & Ni, Y.-H. (2013). Discrete time mean-field stochastic linear-quadratic optimal control problems. Automatica, 49(11), 3222–3233.
Article Google Scholar
Gast, N., & Gaujal, B. (2011). A mean field approach for optimization in discrete time. Discrete Event Dynamic Systems, 21(1), 63–101.
Article Google Scholar
Gast, N., Gaujal, B., & Le Boudec, J. (2012). Mean field for Markov decision processes: From discrete to continuous optimization. IEEE Transactions on Automatic Control, 57(9), 2266–2280.
Article Google Scholar
Hafayed, M. (2013). A mean-field necessary and sufficient conditions for optimal singular stochastic control. Communications in Mathematics and Statistics, 1(4), 417–435.
Article Google Scholar
Higuera-Chan, C., Jasso-Fuentes, H., & Minjárez-Sosa, J. (2016). Discrete-time control for systems of interacting objects with unknown random disturbance distributions: A mean field approach. Applied Mathematics & Optimization, 74(1), 197–227.
Article Google Scholar
Higuera-Chan, C., Jasso-Fuentes, H., & Minjárez-Sosa, J. (2017). Control systems of interacting objects modeled as a game against nature under a mean field approach. Journal of Dynamics & Games, 4(1), 59.
Article Google Scholar
Le Boudec, J., McDonald, D., & Mundinger, J. (2007). A generic mean field convergence result for systems of interacting objects. In: Fourth International Conference on the Quantitative Evaluation of Systems (QEST 2007), pp. 3–18. IEEE.
Peyrard, N., & Sabbadin, R. (2006). Mean field approximation of the policy iteration algorithm for graph-based Markov decision processes. In: Proceedings of the 2006 Conference on ECAI 2006: 17th European Conference on Artificial Intelligence August 29–September 1, 2006, Riva del Garda, Italy, pp. 595–599.
Song, T., & Liu, B. (2021). Discrete-time mean-field stochastic linear-quadratic optimal control problem with finite horizon. Asian Journal of Control, 23(2), 979–989.
Article Google Scholar
Higuera-Chan, C. G. (2021). Approximation and mean field control of systems of large populations. In D. Hernandez-Hernandez, F. Leonardi, R. H. Mena, & J. C. Pardo Millan (Eds.), Advances in probability and mathematical statistics. Springer.
Google Scholar
Martínez-Manzanares, M., & Minjárez-Sosa, J. (2021). A mean field absorbing control model for interacting objects systems. Discrete Event Dynamic Systems, 31(3), 349–372.
Article Google Scholar
Dynkin, E., & Yushkevich, A. (1979). Controlled Markov processes (Vol. 235). Springer.
Google Scholar
Hernández-Lerma, O., & Lasserre, J. B. (2012). Discrete-time Markov control processes: Basic optimality criteria (Vol. 30). Springer.
Google Scholar
Hernández-Lerma, O. (2012). Adaptive Markov control processes (Vol. 79). Springer.
Google Scholar
Luque-Vásquez, F., & Minjárez-Sosa, J. (2005). Semi-Markov control processes with unknown holding times distribution under a discounted criterion. Mathematical Methods of Operations Research, 61(3), 455–468.
Article Google Scholar
Luque-Vásquez, F., Minjárez-Sosa, J., & Carmen Rosas-Rosas, L. (2011). Semi-Markov control models with partially known holding times distribution: discounted and average criteria. Acta Applicandae Mathematicae, 114(3), 135–156.
Article Google Scholar
Luque-Vásquez, F., & Minjárez-Sosa, J. A. (2014). A note on the $\sigma $-compactness of sets of probability measures on metric spaces. Statistics & Probability Letters, 84, 212–214.
Article Google Scholar
Parthasarathy, K. (1967). Probability measures on metric spaces. Academic Press.
Book Google Scholar
Hernández-Lerma, O., & Lasserre, J. B. (2012). Further topics on discrete-time Markov control processes (Vol. 42). Springer.
Google Scholar
Puterman, M. (2014). Markov decision processes: Discrete stochastic dynamic programming. Wiley.
Google Scholar
Ash, R. (1972). Probability and real analysis. Wiley.
Google Scholar

Download references

Funding

Work partially supported by Consejo Nacional de Ciencia y Tecnología (CONACYT - MÉXICO) under grant Ciencia Frontera 87787.

Author information

Both authors contributed equally to this work.

Authors and Affiliations

Departamento de Matemáticas, Universidad de Sonora, Rosales s/n, 83000, Hermosillo, Sonora, Mexico
M. Elena Martínez-Manzanares & J. Adolfo Minjárez-Sosa

Authors

M. Elena Martínez-Manzanares
View author publications
You can also search for this author in PubMed Google Scholar
J. Adolfo Minjárez-Sosa
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to J. Adolfo Minjárez-Sosa.

Ethics declarations

Conflicts of interest

Author M. Elena Martínez-Manzanares declares that she has no Conflict of interest. Author J. Adolfo Minjárez-Sosa declares that he has no Conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Proofs

1.1 Proof of Proposition 6

Fix $k\in [\mathcal {K}]_0$, $\mathcal {K}\in \mathbb {N}$ and $\pi =\{f_\ell \}\in \Pi _M$, an arbitrary policy. Let $\{a_t\}\in A$ be the sequence of controls corresponding to the application of $\pi $, and ${\widetilde{M}^N(k)\in \mathbb {P}_N(S)}$ be the initial condition of the process $\{M^N(t)\}$. Set

$$\begin{aligned} B_{ij,n}^{N}(t):=\mathbbm {1}_{\{\Delta _{ij}(a_{t})\}}(w_{n}^i(t)),\quad i,j\in S, n\in [N], \end{aligned}$$

where $w_{n}^i(t)$ are i.i.d. uniform random variables on [0, 1] (see (4)–(7)). Notice that, for each $t\in \mathbb {N}_0$, $\{B_{ij,n}^{N}(t)\}_{ij,n}$ is a set of i.i.d. Bernoulli random variables with mean

$$\begin{aligned} E_{\widetilde{M}^N(k)}^{\pi }[B_{ij,n}^{N}(t)\vert a_{t}^{\pi , N}=a]=K_{ij}(a) \quad i,j\in S. \end{aligned}$$

For $\varepsilon \in \mathcal {E}$, the Hoeffding inequality implies for each i, j $\in S$ and $t\in \mathbb {N}_0$

$$\begin{aligned} P_{\widetilde{M}^N(k)}^{\pi }\bigg [\bigg \vert \sum _{n=1}^{NM_{i}^{N}(t)}B_{ij,n}^{N}(t)-NM_{i}^{N}(t)K_{ij}(a_{t})\bigg \vert <N\varepsilon _{ij}\bigg ]> 1-2e^{-2N\varepsilon _{ij}^2}. \end{aligned}$$

(52)

Set

$$\begin{aligned} \Omega _{ij}=\bigg \{\omega \in \Omega ': \bigg \vert \sum _{n=1}^{NM_{i}^{N}(t)}B_{ij,n}^{N}(t)-NM_{i}^{N}(t)K_{ij}(a_{t})\bigg \vert <N\varepsilon _{ij}\bigg \}\subset \Omega ', \end{aligned}$$

and consider $\bar{\Omega }=\cap _{i,j\in S}\Omega _{ij}$. Now, let $\{\mu _{t}(\varepsilon ,\Theta _\mathcal {K}^N)\}_{t\in \mathbb {N}_0}$ be the sequence defined as

$$\begin{aligned} \mu _0(\varepsilon ,\Theta _\mathcal {K}^N):=\Theta _\mathcal {K}^N\text {; }\mu _t(\varepsilon ,\Theta _\mathcal {K}^N):=\vert \vert \varepsilon \vert \vert _{\mathcal {E}}\sum _{d=0}^{t-1}R^{d}+\Theta _\mathcal {K}^N R^t, \end{aligned}$$

with $R=\sup _{a\in A}\sup _{j\in S}\#(S_j(a))$ and $\Theta _\mathcal {K}^N$ as defined in (31). We now proceed by induction proving that, in $\bar{\Omega }$, the following holds

$$\begin{aligned} \vert \vert M^N(t)-m(t)\vert \vert _{\infty }\le \mu _{t}(\varepsilon ,\Theta _\mathcal {K}^N)\text { }\forall t\in \mathbb {N}_0. \end{aligned}$$

(53)

For $t=0$, we have that $\vert \vert M^N(0)-m(0)\vert \vert _{\infty }=\vert \vert \widetilde{M}^N(k)-\widetilde{m}(k)\vert \vert _{\infty }\le \Theta _\mathcal {K}^N$. Assume that $\vert \vert M^N(t)-m(t)\vert \vert _{\infty }\le \mu _{t}(\varepsilon ,\Theta _\mathcal {K}^N)$ for a particular $t\in \mathbb {N}$. Then, from (4), (8), (32) and (54), it follows for each $j\in S$

$$\begin{aligned} \vert M_{j}^N(t+1)-m_j(t+1)\vert&=\bigg \vert \sum _{i=1}^{\infty }\frac{1}{N}\bigg [\sum _{n=1}^{NM_i^N(t)}B_{ij,n}^{N}(t)-Nm_i(t)K_{ij}(a_t)\bigg ]\bigg \vert \\&\le \sum _{i=1}^{\infty }\frac{1}{N}\bigg \vert \sum _{n=1}^{NM_i^N(t)}B_{ij,n}^{N}(t)-Nm_i(t)K_{ij}(a_t)\bigg \vert \\&\le \sum _{i=1}^{\infty }\frac{1}{N}\bigg \vert \sum _{n=1}^{NM_i^N(t)}B_{ij,n}^{N}(t)-NM_i^N(t)K_{ij}(a_t)\bigg \vert \\&+\sum _{i=1}^{\infty }\vert M_i^N(t)-m_i(k)\vert K_{ij}(a_t)\\&\le \sum _{i=1}^{\infty }\varepsilon _{ij}+\#(S_j(a_t))\mu _t(\varepsilon ,\Theta _\mathcal {K}^N)\\&\le \sum _{i=1}^{\infty }\varepsilon _{ij}+R\mu _{t}(\varepsilon ,\Theta _\mathcal {K}^N). \end{aligned}$$

This is, $\vert M_j^N(t+1)-m_j(t+1)\vert \le \sum _{i=1}^{\infty }\varepsilon _{ij}+R\mu _t(\varepsilon ,\Theta _\mathcal {K}^N)$ $\forall t\in \mathbb {N}_0$ over $\bar{\Omega }$. Hence,

$$\begin{aligned} \vert \vert M^N(t+1)-m(t+1)\vert \vert _{\infty }\le & {} \sup _{j\in S}\sum _{i=1}^{\infty }\varepsilon _{ij}+R\mu _t(\varepsilon ,\Theta _\mathcal {K}^N)\\= & {} \vert \vert \varepsilon \vert \vert _{\mathcal {E}}+R\mu _t(\varepsilon ,\Theta _\mathcal {K}^N)\\= & {} \vert \vert \varepsilon \vert \vert _{\mathcal {E}}+R\left( \vert \vert \varepsilon \vert \vert _{\mathcal {E}}\sum _{d=0}^{t-1}R^d+\Theta _\mathcal {K}^N R^t\right) \\= & {} \vert \vert \varepsilon \vert \vert _{\mathcal {E}}\sum _{d=0}^{t}R^d+\Theta _\mathcal {K}^N R^{t+1}\\= & {} \mu _{t+1}(\varepsilon ,\Theta _\mathcal {K}^N), \end{aligned}$$

which proves (54). Furthermore, notice $\{\mu _{t}(\varepsilon ,\Theta _\mathcal {K}^N)\}_{t\in \mathbb {N}_0}$ is an increasing sequence on t. This implies that, for $T\in \mathbb {N}$, $\mu _{t}(\varepsilon ,\Theta _\mathcal {K}^N)\le \mu _{T}(\varepsilon ,\Theta _\mathcal {K}^N)$ $\forall t\le T$. Then, from (53), (54) and an induction argument over T, we have

$$\begin{aligned} P_{\widetilde{M}^N(k)}^{\pi }\left[ \sup _{t\in [T]_0}\vert \vert M^N(t+1)-m(t+1)\vert \vert _{\infty }>\mu _T(\varepsilon ,\Theta _\mathcal {K}^N)\right] > 2Te^{-2N\varepsilon _{ij}^2}\text { }\forall i,j\in S. \end{aligned}$$

Setting $C=\lambda =2$, we prove (33).

Finally, (34) follows from similar arguments as presented in Higuera-Chan (2021). $\square $

1.2 Proof of Theorem 8

Lemma 9

Let Assumption 1 and 3 holds. For all $t\in \mathbb {N}_0$ and $\pi \in \Pi _{M}$ it follows

$$\begin{aligned} E_{\widetilde{M}^N(k)}^{\pi }\vert \Gamma _{\alpha }^{N,t}-\Gamma _{\alpha }^{t}\vert \le \sigma ^{t-1}L_{G}t\left( CTe^{-\lambda N\varepsilon _{ij}^2}+\mu _T(\varepsilon ,\Theta _\mathcal {K}^N)\right) \end{aligned}$$

(54)

for all $i,j\in S$, and $\sigma $ is defined in (17).

Proof

We proceed by induction. For $t=0$, (14) and (40) implies (55). Assume (55) is valid for $t\in \mathbb {N}$. This is,

$$\begin{aligned} E_{\widetilde{M}^N(k)}^{\pi }\left| \Gamma _{\alpha }^{N,t}-\Gamma _{\alpha }^{t}\right| \le \sigma ^{t-1}L_{G}t\left( CTe^{-\lambda N\varepsilon _{ij}^2}+\mu _T(\varepsilon ,\Theta _\mathcal {K}^N)\right) . \end{aligned}$$

Notice

$$\begin{aligned} E_{\widetilde{M}^N(k)}^{\pi }\vert \Gamma _{\alpha }^{N,t+1}-\Gamma _{\alpha }^{t+1}\vert= & {} E_{\widetilde{M}^N(k)}^{\pi }\left| \Gamma _{\alpha }^{N,t}\beta _{\alpha }(M^{N}(t))-\Gamma _{\alpha }^{t}\beta _{\alpha }(m(t))\right| \\\le & {} E_{\widetilde{M}^N(k)}^{\pi }\Gamma _{\alpha }^{N,t}\left| \beta _{\alpha }(M^{N}(t))-\beta _{\alpha }(m(t))\right| \\{} & {} +E_{\widetilde{M}^N(k)}^{\pi }\beta _{\alpha }(m(t))\left| \Gamma _{\alpha }^{N,t}-\Gamma _{\alpha }^{t}\right| . \end{aligned}$$

Now, from (18), Assumption 2 (g), and (34), it follows:

$$\begin{aligned}{} & {} \hspace{-0.9cm}E_{\widetilde{M}^N(k)}^{\pi }\vert \Gamma _{\alpha }^{N,t+1}-\Gamma _{\alpha }^{t+1}\vert \\{} & {} \hspace{-0.8cm}\le \sigma ^{t}L_{G}E_{\widetilde{M}^N(k)}^{\pi }\left[ \vert \vert M^{N}(t)-m(t)\vert \vert _{\infty }\right] +\sigma E_{\widetilde{M}^N(k)}^{\pi }\left| \Gamma _{\alpha }^{N,t}-\Gamma _{\alpha }^{t}\right| \\{} & {} \hspace{-0.8cm}\le \sigma ^{t}L_{G}\left( CTe^{-\lambda N\varepsilon _{ij}^2}+\mu _T(\varepsilon ,\Theta _\mathcal {K}^N)\right) +\sigma \sigma ^{t-1}L_{G}t\left( CTe^{-\lambda N\varepsilon _{ij}^2}+\mu _T(\varepsilon ,\Theta _\mathcal {K}^N)\right) \\{} & {} \hspace{-0.8cm}=\sigma ^{t}L_{G}(t+1)\left( CTe^{-\lambda N\varepsilon _{ij}^2}+\mu _T(\varepsilon ,\Theta _\mathcal {K}^N)\right) . \end{aligned}$$

Hence, we conclude (55). $\square $

Proposition 10

Under Assumptions 1, 2 and 3, we have, for $\pi \in \Pi _M$,

$$\begin{aligned} \limsup _{N\rightarrow \infty } \Delta (t,N,k)=0,\text { }t,k\in \mathbb {N}_0, \end{aligned}$$

(55)

where

$$\begin{aligned} \Delta (t,N,k):=E_{\widetilde{M}^N(k)}^{\pi }\left[ \vert \Gamma _{\alpha }^{N,t}-\Gamma _{\alpha }^t\vert W(M^N(t))\right] . \end{aligned}$$

Proof

Observe that the right-hand side of (55) converges to zero as $N\rightarrow \infty $ and $\vert \vert \varepsilon \vert \vert _\mathcal {E}\rightarrow 0$ [see Remark 3 and Proposition 6]. This implies $E_{\widetilde{M}^N(k)}^{\pi }\left[ \vert \Gamma _{\alpha }^{N,t}-\Gamma _{\alpha }^t\vert \right] \rightarrow 0$ as $N\rightarrow \infty $ and $\vert \vert \varepsilon \vert \vert _\mathcal {E}\rightarrow 0$ which in turn yields

$$\begin{aligned} \vert \Gamma _{\alpha }^{N,t}-\Gamma _{\alpha }^t\vert \xrightarrow {P_{\widetilde{M}^N(k)}^{\pi }} 0 \end{aligned}$$

(56)

as $N\rightarrow \infty $ and $\vert \vert \varepsilon \vert \vert _\mathcal {E}\rightarrow 0$.

On the other hand, the sequence $\{\vert \Gamma _{\alpha }^{N,t}-\Gamma _{\alpha }^t\vert W(M^N(t))\}$ is uniform integrable. This follows from (57), (Ash (1972), Lemma 7.6.9, p. 301) and the fact

$$\begin{aligned} \sup _{t\in \mathbb {N}_0}E_{\widetilde{M}^N(k)}^{\pi }\left[ \vert \Gamma _{\alpha }^{N,t}-\Gamma _{\alpha }^t\vert W(M^N(t))\right] ^{p}\le & {} 2^p\sup _{t\in \mathbb {N}_0}E_{\widetilde{M}^N(k)}^{\pi }\left[ W^p(M^N(t))\right] \\\le & {} 2^p\left( 1+\frac{\bar{b}}{1-\bar{\rho }}\right) W^p(\widetilde{M}^N(k))<\infty , \end{aligned}$$

where the first inequality is due to (14) and (40) (see (17) and (18)), while the last inequality comes from (26). The proof is completed by applying (Ash (1972), Theorem 7.5.2, p. 295) whenever we show

$$\begin{aligned} \vert \Gamma _{\alpha }^{N,t}-\Gamma _{\alpha }^t\vert W(M^N(t))\xrightarrow {P_{\widetilde{M}^N(k)}^{\pi }} 0 \end{aligned}$$

as $N\rightarrow \infty $ and $\vert \vert \varepsilon \vert \vert _\mathcal {E}\rightarrow 0$, but this follows from (25), (57) and the relations

$$\begin{aligned}{} & {} P_{\widetilde{M}^N(k)}^{\pi }\left[ \vert \Gamma _{\alpha }^{N,t}-\Gamma _{\alpha }^t\vert W(M^N(t))>\eta \right] \\{} & {} \hspace{3cm}\le P_{\widetilde{M}^N(k)}^{\pi }\left[ \vert \Gamma _{\alpha }^{N,t}-\Gamma _{\alpha }^t\vert>\frac{\eta }{\ell }\right] +P_{\widetilde{M}^N(k)}^{\pi }\left[ W(M^N(t))>\ell \right] \\{} & {} \hspace{3cm}\le P_{\widetilde{M}^N(k)}^{\pi }\left[ \vert \Gamma _{\alpha }^{N,t}-\Gamma _{\alpha }^t\vert >\frac{\eta }{\ell }\right] +\frac{E_{\widetilde{M}^N(k)}^{\pi }\left[ W(M^N(t))\right] }{\ell } \end{aligned}$$

where $\eta $ and $\ell $ are arbitrary positive numbers. $\square $

For $\pi \in \Pi _{M}$, $m\in \mathbb {P}_{N}(S)\subset \mathbb {P}(S)$ and $ T\in \mathbb {N}$, we define the following the expected costs

$$\begin{aligned} V_{T}^{N}(\pi ,m):=E_{m}^{\pi }\left[ \sum _{t=0}^{T-1}\Gamma _{\alpha }^{N,t}r(M^{N}(t))\right] \end{aligned}$$

and

$$\begin{aligned} v_{T}(\pi ,m):=\sum _{t=0}^{T-1}\Gamma _{\alpha }^{t}r(m(t)). \end{aligned}$$

Proposition 11

Under Assumption 1, 2 and 3, for each $m\in \mathbb {P}_N(S)$, $\varepsilon \in \mathcal {E}$, ${T,\mathcal {K}\in \mathbb {N}}$, $t\in [T]_0$ and $k\in [\mathcal {K}]_0$ we have

$$\begin{aligned} E_{m}^{\varphi }\left[ \sup _{\pi \in \Pi _M}\vert V_T^N(\pi ,\widetilde{M}^N(k))-v_T(\pi ,\widetilde{m}(k))\vert \right]\le & {} \vert \vert r\vert \vert _{W}T\max _{t\in [T-1]_0}\Delta (t,N,k)\nonumber \\{} & {} \hspace{-6.5cm}+\left( L_r\left( CTe^{-\lambda N\varepsilon _{ij}^2}+\mu _{T}(\varepsilon ,\Theta _\mathcal {K}^N)\right) \right) \left( \frac{1-\sigma ^T}{1-\sigma }\right) \text { }\forall i,j\in S. \end{aligned}$$

(57)

Proof

Fix $\pi \in \Pi _M$ and $T\in \mathbb {N}$. From (21) and Proposition 6 it follows

$$\begin{aligned} E_{\widetilde{M}^N(k)}^{\pi }\vert r(M^N(t))-r(m(t))\vert&\le E_{\widetilde{M}^N(k)}^{\pi }\left( L_r\vert \vert M^N(t)-m(t)\vert \vert _{\infty }\right) \nonumber \\&\le L_r E_{\widetilde{M}^N(k)}^{\pi }\left( \max _{t\in [T]_0}\vert \vert M^N(t)-m(t)\vert \vert _{\infty }\right) \nonumber \\&\le L_r\left( CTe^{-\lambda N\varepsilon _{ij}^2}+\mu _{T}(\varepsilon ,\Theta _\mathcal {K}^N)\right) , \end{aligned}$$

(58)

for all $i,j\in S$. Thus, for each $\pi \in \Pi _M$, we have

$$\begin{aligned}{} & {} \vert V_T(\pi , \widetilde{M}^N(k))-v_T(\pi , \widetilde{m}(k))\vert \nonumber \\{} & {} \hspace{1.5cm}= \Big \vert E_{\widetilde{M}^N(k)}^{\pi }\left\{ \sum _{t=0}^{T-1} \Gamma ^{N,t}_{\alpha }r(M^N(t))-\sum _{t=0}^{T-1}\Gamma ^{t}_{\alpha } r(m(t))\right\} \Big \vert \nonumber \\{} & {} \hspace{1.5cm}\le \Big \vert E_{\widetilde{M}^N(k)}^{\pi }\left\{ \sum _{t=0}^{T-1} \Gamma ^{N,t}_{\alpha }r(M^N(t))-\sum _{t=0}^{T-1}\Gamma ^{t}_{\alpha }r(M^N(t))\right\} \Big \vert \nonumber \\{} & {} \hspace{1.5cm}+ \Big \vert E_{\widetilde{M}^N(k)}^{\pi } \left\{ \sum _{t=0}^{T-1}\Gamma ^{t}_{\alpha }r(M^N(t))-\sum _{t=0}^{T-1} \Gamma ^{t}_{\alpha }r(m(t))\right\} \Big \vert \nonumber \\{} & {} \hspace{1.5cm}\le \vert \vert r\vert \vert _{W}E_{\widetilde{M}^N(k)}^{\pi } \left\{ \sum _{t=0}^{T-1}\Big \vert \Gamma ^{N,t}_{\alpha }-\Gamma ^{t}_{\alpha }\Big \vert W(M^N(t))\right\} \nonumber \\{} & {} \hspace{1.5cm}+ \Big \vert E_{\widetilde{M}^N(k)}^{\pi }\left\{ \sum _{t=0}^{T-1}\sigma ^{t} (r(M^N(t))-r(m(t)))\right\} \Big \vert \text { (see }(18),(40)) \nonumber \\{} & {} \hspace{1.5cm}\le \vert \vert r\vert \vert _{W}T\max _{t\in [T-1]_0}\Delta (t,N,k)\nonumber \\{} & {} \hspace{1.5cm}+ \left( L_r\left( CTe^{-\lambda N\varepsilon _{ij}^2}+\mu _{T}(\varepsilon ,\Theta _\mathcal {K}^N)\right) \right) \left( \frac{1-\sigma ^T}{1-\sigma }\right) \text { (see }(59)). \end{aligned}$$

(59)

These relations yields (58).

$\square $

Proposition 12

Under Assumptions 1, 2 and 3, for each $m\in \mathbb {P}_N(S)$, $\varepsilon \in \mathcal {E}$, $T,\mathcal {K}\in \mathbb {N}$, $t\in [T]_0$ and $k\in [\mathcal {K}]_0$ we have

$$\begin{aligned}{} & {} \hspace{-1.5cm}E_{m}^{\varphi }\left[ \sup _{\pi \in \Pi _M}\vert V^N(\pi , \widetilde{M}^N(k))-V_{T}^{N}(\pi , \widetilde{M}^N(k))\vert \right] \nonumber \\{} & {} \hspace{3cm}\le \vert \vert r\vert \vert _W\left( 1+\frac{b}{1-\rho }\right) ^2W(m)\frac{\sigma ^T}{1-\sigma }; \end{aligned}$$

(60)

and

$$\begin{aligned} \vert v(\pi ,\widetilde{m}(k))-v_T(\pi ,\widetilde{m}(k))\vert \le \vert \vert r\vert \vert _W\left( 1+\frac{b}{1-\rho }\right) ^2W(m)\frac{\sigma ^T}{1-\sigma }. \end{aligned}$$

(61)

Proof

Fix $\pi \in \Pi _M$. Relation (61) is consequence of the following inequalities:

$$\begin{aligned}{} & {} {\vert V^N(\pi ,\widetilde{M}^N(k))-V_{T}^{N}(\pi , \widetilde{M}^N(k))\vert }\nonumber \\{} & {} \qquad \le \Big \vert E_{\widetilde{M}^N(k)}^{\pi }\left\{ \sum _{t=0}^{\infty }\Gamma _{\alpha }^{N,t}r(M^N(t))-\sum _{t=0}^{T-1}\Gamma _{\alpha }^{N,t}r(M^N(t))\right\} \Big \vert \nonumber \\{} & {} \qquad \le \sum _{t=T}^{\infty }E_{\widetilde{M}^N(k)}^{\pi }\sigma ^{t}\vert r(M^N(t))\vert \frac{W(M^N(t))}{W(M^N(t))}\text { (see (18))}\nonumber \\{} & {} \qquad \le \vert \vert r\vert \vert _W\sum _{t=T}^{\infty }\sigma ^{t}E_{\widetilde{M}^N(k)}^{\pi } W(M^N(t))\nonumber \\{} & {} \qquad \le \vert \vert r\vert \vert _W\left( 1+\frac{b}{1-\rho }\right) W(\widetilde{M}^N(k))\sum _{t=T}^{\infty }\sigma ^{t}\nonumber \text { (see (27))}\\{} & {} \qquad \le \vert \vert r\vert \vert _W\left( 1+\frac{b}{1-\rho }\right) W(\widetilde{M}^N(k))\frac{\sigma ^T}{1-\sigma }. \end{aligned}$$

(62)

Taking the supremum over $\pi \in \Pi _M$ and expectation $E_{m}^{\varphi }$, it follows that

$$\begin{aligned}{} & {} {E_{m}^{\varphi }\left[ \sup _{\pi \in \Pi _M}\vert V^N(\pi , \widetilde{M}^N(k))-V_{T}^{N}(\pi , \widetilde{M}^N(k))\vert \right] }\nonumber \\{} & {} \le \vert \vert r\vert \vert _W\left( 1+\frac{b}{1-\rho }\right) \frac{\sigma ^T}{1-\sigma } E_{m}^{\varphi }\left[ W(\widetilde{M}^N(k))\right] \\{} & {} \le \vert \vert r\vert \vert _W\left( 1+\frac{b}{1-\rho }\right) ^2W(m)\frac{\sigma ^T}{1-\sigma }\text { (see (27))} \end{aligned}$$

which proves (61).

By applying similar arguments, together with (39) we obtain (62). $\square $

1.2.1 Proof of Theorem 8 (a)

Let $\pi _{*}^{N}=\{f_{*}^{N}\}\in \Pi _M^N$ be a stationary optimal policy for the model $\mathcal{S}\mathcal{M}_N$ and $\tilde{f}\in \mathbb {F}$ be an arbitrary selector. Fix $\bar{\pi }=\{\bar{f}\}\in \Pi _M$, where $\bar{f}:\mathbb {P}(S)\rightarrow A$ is defined as

$$\begin{aligned} \bar{f}(m) = f_{*}^{N}(m)\mathbbm {1}_{\mathbb {P}_N}(m)+\tilde{f}(m)\mathbbm {1}_{[\mathbb {P}_N]^c}(m). \end{aligned}$$

Recall, $\widetilde{M}(k)$ and $\widetilde{m}(k)$ are the trajectories generated by the policy $\varphi \in \Pi _M$ (see Subsection 5.1). Due to (3) and the sufficiency of Markov policies, given an initial configuration $m\in \mathbb {P}_N(S)$, it follows

$$\begin{aligned} \inf _{\pi \in \Pi ^N_M}V^N(\pi , m)=V_{*}^{N}(m)=V^N(\pi _{*}^N,m)=V^N(\bar{\pi },m)=\inf _{\pi \in \Pi _M}V^N(\pi ,m), \end{aligned}$$

for all $m\in \mathbb {P}_N(S)\subset \mathbb {P}(S)$. This implies

$$\begin{aligned}{} & {} \hspace{-1cm}\vert V_{*}^N(\widetilde{M}^N(k))-v_{*}(\widetilde{m}(k))\vert \nonumber \\{} & {} \hspace{1cm}=\vert \inf _{\pi \in \Pi _M^N}V^N(\pi , \widetilde{M}^N(k))-\inf _{\pi \in \Pi _M}v(\pi , \widetilde{m}(k))\vert \nonumber \\{} & {} \hspace{1cm}=\vert \inf _{\pi \in \Pi _M}V^N(\pi , \widetilde{M}^N(k))-\inf _{\pi \in \Pi _M}v(\pi , \widetilde{m}(k))\vert \nonumber \\{} & {} \hspace{1cm}\le \sup _{\pi \in \Pi _M}\vert V^N(\pi , \widetilde{M}^N(k))-v(\pi , \widetilde{m}(k))\vert \text { } k\in \mathbb {N}_0. \end{aligned}$$

(63)

Thus, for each $m\in \mathbb {P}_N(S)$, $t\in [T]_0$ and $0\le k\le \mathcal {K}$,

$$\begin{aligned}{} & {} {E_{m}^{\varphi }\vert V_{*}^N(\widetilde{M}^N(k))-v_{*}(\widetilde{m}(k))\vert }\nonumber \\{} & {} \le E_{m}^{\varphi }\left[ \sup _{\pi \in \Pi _M}\vert V^N(\pi ,\widetilde{M}^N(k))-v(\pi ,\widetilde{m}(k))\vert \right] \nonumber \\{} & {} \le E_{m}^{\varphi }\bigg [\sup _{\pi \in \Pi _M}\bigg \{\vert V^N(\pi ,\widetilde{M}^N(k))-V_{T}^N(\pi , \widetilde{M}^N(k))\vert \nonumber \\{} & {} +\vert V_{T}^N(\pi , \widetilde{M}^N(k))-v_T(\pi ,\widetilde{m}(k))\vert +\vert v_T(\pi ,\widetilde{m}(k))-v(\pi ,\widetilde{m}(k))\vert \bigg \}\bigg ]\nonumber \\{} & {} \le E_{m}^{\varphi }\bigg [\sup _{\pi \in \Pi _M}\vert V^N(\pi ,\widetilde{M}^N(k))-V_{T}^N(\pi , \widetilde{M}^N(k))\vert \bigg ]\nonumber \\{} & {} +E_{m}^{\varphi }\bigg [\sup _{\pi \in \Pi _M}\vert V_{T}^N(\pi , \widetilde{M}^N(k))-v_T(\pi ,\widetilde{m}(k))\vert \bigg ]\nonumber \\{} & {} +E_{m}^{\varphi }\bigg [\sup _{\pi \in \Pi _M}\vert v_T(\pi ,\widetilde{m}(k))-v(\pi ,\widetilde{m}(k))\vert \bigg ]\nonumber \\{} & {} \le \vert \vert r\vert \vert _{W}T \max _{t\in [T-1]_0}\Delta (t,N,k)+\left( L_r\left( CTe^{-\lambda N\varepsilon _{ij}^2}+ \mu _{T}(\varepsilon ,\Theta _\mathcal {K}^N)\right) \right) \left( \frac{1-\sigma ^T}{1-\sigma }\right) \nonumber \\{} & {} +2\vert \vert r\vert \vert _W\left( 1+\frac{b}{1-\rho }\right) ^2W(m)\frac{\sigma ^T}{1-\sigma }, \end{aligned}$$

(64)

where the last inequality is due to Proposition 11 and Proposition 12. Therefore, Theorem 8 (a) holds.

1.2.2 Proof of Theorem 8 (b)

Let $\{\widetilde{M}^N(k)\}$ and $\{\widetilde{m}(k)\}$ be the trajectories corresponding to the application of the policy $\pi _*=\{f_*\}$ with initial condition $\widetilde{M}^N(0)=\widetilde{m}(0)=m\in \mathbb {P}_N(S)$. Observe that from Theorem 8 (a)

$$\begin{aligned}{} & {} { E_{m}^{\varphi } \vert V_{*}^{N}(\widetilde{M}^N(k)) -v_{*}(\widetilde{m}(k))\vert }\nonumber \\{} & {} \le \vert \vert r\vert \vert _{W}T\max _{t\in [T-1]_0}\Delta (t,N,k)+\left( L_r\left( CTe^{-\lambda N\varepsilon _{ij}^2}+\mu _{T}(\varepsilon ,\Theta _\mathcal {K}^N\right) \right) \left( \frac{1-\sigma ^T}{1-\sigma }\right) \nonumber \\{} & {} +2\vert \vert r\vert \vert _W\left( 1+\frac{b}{1-\rho }\right) ^2W(m)\frac{\sigma ^T}{1-\sigma }. \end{aligned}$$

(65)

Combining this fact with the Markov property [see, e.g., Hernández-Lerma and Lasserre (2012)] we obtain

$$\begin{aligned}{} & {} {\Big \vert \int _{\mathbb {R}^N}(\beta _{\alpha }(m)V_{*}^N[H^N(m,f_*,w)]-\beta _{\alpha }(m)v_{*}[H(m,f_*)])\theta (dw)\Big \vert }\nonumber \\{} & {} \le \beta _{\alpha }(m)\int _{\mathbb {R}^N}\Big \vert V_{*}^N[H^N(m,f_*,w)]-v_{*}[H(m,f_*)]\Big \vert \theta (dw)\nonumber \\{} & {} =E_{m}^{\pi _*}\Big \vert V_{*}^N(\widetilde{M}^N(1))-v_{*}(\widetilde{m}(1))\Big \vert \nonumber \\{} & {} \le \vert \vert r\vert \vert _{W}T\max _{t\in [T-1]_0}\Delta (t,N,1)+\left( L_r\left( CTe^{-\lambda N\varepsilon _{ij}^2}+\mu _{T}(\varepsilon ,\Theta _\mathcal {K}^N)\right) \right) \left( \frac{1-\sigma ^T}{1-\sigma }\right) \nonumber \\{} & {} +2\vert \vert r\vert \vert _W\left( 1+\frac{b}{1-\rho }\right) ^2W(m)\frac{\sigma ^T}{1-\sigma }, \end{aligned}$$

(66)

where (67) is due (16) and (66).

Now, let

$$\begin{aligned} \Phi (m,f)=r(m)-\beta _{\alpha }(m) v_{*}[H(m,f)]-v_*(m), \end{aligned}$$

be the discrepancy function of the mean field control model. It is easy to see from (41) that $\Phi (m,f_*)=0$. Then, considering the discrepancy function $\Phi ^N$ defined in (46) we get

$$\begin{aligned}{} & {} \hspace{-0.5cm}\Phi ^N(m,f_*)=\Big \vert \Phi ^N(m,f_*)-\Phi (m,f_*)\Big \vert \nonumber \\= & {} \Big \vert r(m)+\beta _{\alpha }(m)\int _{\mathbb {R}^N}V_{*}^N[H^N(m,f_*,w)]\theta (dw)-V_{*}^{N}(m)\nonumber \\{} & {} -\left( r(m)-\beta _{\alpha }(m) v_{*}[H(m,f_*)]-v_*(m)\right) \Big \vert \nonumber \\\le & {} \vert V_{*}^N(m)-v_{*}(m)\vert \nonumber \\{} & {} +\Big \vert \int _{\mathbb {R}^N}(\beta _{\alpha }(m)V_{*}^N[H^N(m,f_*,w)]-\beta _{\alpha }(m)v_{*}[H(m,f_*)])\theta (dw)\Big \vert .\nonumber \end{aligned}$$

(67)

Finally, Theorem 8 (a) with $k=0$ and (67), imply

$$\begin{aligned} \Phi ^N(m,f_*)\le & {} \vert \vert r\vert \vert _{W}T\left( \max _{t\in [T-1]_0}\Delta (t,N,1)+\max _{t\in [T-1]_0}\Delta (t,N,0)\right) \\{} & {} \hspace{-1.5cm}+2\Bigg (\left( L_r\left( CTe^{-\lambda N\varepsilon _{i\ell }^2}+\mu _{T}(\varepsilon ,\Theta _\mathcal {K}^N)\right) \right) \left( \frac{1-\sigma ^T}{1-\sigma }\right) \\{} & {} \hspace{-0.5cm}+2\vert \vert r\vert \vert _W\left( 1+\frac{b}{1-\rho }\right) ^2W(m)\frac{\sigma ^T}{1-\sigma }\Bigg ). \end{aligned}$$

Because T is arbitrary, letting $\vert \vert \varepsilon \vert \vert _{\mathcal {E}}\rightarrow 0$ and $N\rightarrow \infty $, and considering Remark 3, we obtain (45). $\square $

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Martínez-Manzanares, M.E., Minjárez-Sosa, J.A. Semi-Markov control models for systems of large populations of interacting objects with possible unbounded costs: a mean field approach. Ann Oper Res (2024). https://doi.org/10.1007/s10479-024-05937-2

Download citation

Received: 26 May 2023
Accepted: 05 March 2024
Published: 08 April 2024
DOI: https://doi.org/10.1007/s10479-024-05937-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Semi-Markov control models for systems of large populations of interacting objects with possible unbounded costs: a mean field approach

Abstract

Access this article

Similar content being viewed by others

Conservative and Semiconservative Random Walks: Recurrence and Transience

Existence and Uniqueness of Quasi-stationary Distributions for Symmetric Markov Processes with Tightness Property

On some mean field games and master equations through the lens of conservation laws

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Ethical approval

Additional information

Publisher's Note

Appendix: Proofs

1.1 Proof of Proposition 6

1.2 Proof of Theorem 8

Lemma 9

Proof

Proposition 10

Proof

Proposition 11

Proof

Proposition 12

Proof

1.2.1 Proof of Theorem 8 (a)

1.2.2 Proof of Theorem 8 (b)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Semi-Markov control models for systems of large populations of interacting objects with possible unbounded costs: a mean field approach

Abstract

Access this article

Similar content being viewed by others

Conservative and Semiconservative Random Walks: Recurrence and Transience

Existence and Uniqueness of Quasi-stationary Distributions for Symmetric Markov Processes with Tightness Property

On some mean field games and master equations through the lens of conservation laws

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Ethical approval

Additional information

Publisher's Note

Appendix: Proofs

Appendix: Proofs

1.1 Proof of Proposition 6

1.2 Proof of Theorem 8

Lemma 9

Proof

Proposition 10

Proof

Proposition 11

Proof

Proposition 12

Proof

1.2.1 Proof of Theorem 8 (a)

1.2.2 Proof of Theorem 8 (b)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation