Abstract
This paper is about optimal control problems associated to stochastic systems composed of a large number of N (\(N\sim \infty \)) interacting objects (e.g., particles, agents, data, etc.) evolving among a finite or countable set of classes or categories according to a semi-Markov process. Such systems are modeled by a control model \(\mathcal{S}\mathcal{M}_{N}\) where the states are vectors whose components are the proportions of objects in each class. Since N is too large, from a practical point of view, it is almost impossible to obtain a solution of the control problem. Under this setting, we apply a mean field approach which consists of letting \(N\rightarrow \infty \) (the mean field limit). Then we obtain the mean field control model \(\mathcal{S}\mathcal{M}\), independent on N, which is easier to study than \(\mathcal{S}\mathcal{M}_{N}.\) Our main objective is to show that an optimal policy \(\pi _{*},\) under a discounted criterion, in \(\mathcal{S}\mathcal{M}\) has a good behavior in \(\mathcal{S}\mathcal{M}_{N}.\) Specifically, we prove that \(\pi _{*}\) is nearly discounted optimal in \(\mathcal{S}\mathcal{M}_{N}\) asymptotically as \(N\rightarrow \infty .\)
Similar content being viewed by others
References
Acciaio, B., Backhoff-Veraguas, J., & Carmona, R. (2019). Extended mean field control problems: Stochastic maximum principle and transport perspective. SIAM Journal on Control and Optimization, 57(6), 3666–3693.
Rami, M. A., Moore, J. B., & Zhou, X. Y. (2002). Indefinite stochastic linear quadratic control and generalized differential Riccati equation. SIAM Journal on Control and Optimization, 40(4), 1296–1311.
Bensoussan, A., Frehse, J., & Yam, P. (2013). Mean field games and mean field type control theory (Vol. 101). Springer.
Elliott, R., Li, X., & Ni, Y.-H. (2013). Discrete time mean-field stochastic linear-quadratic optimal control problems. Automatica, 49(11), 3222–3233.
Gast, N., & Gaujal, B. (2011). A mean field approach for optimization in discrete time. Discrete Event Dynamic Systems, 21(1), 63–101.
Gast, N., Gaujal, B., & Le Boudec, J. (2012). Mean field for Markov decision processes: From discrete to continuous optimization. IEEE Transactions on Automatic Control, 57(9), 2266–2280.
Hafayed, M. (2013). A mean-field necessary and sufficient conditions for optimal singular stochastic control. Communications in Mathematics and Statistics, 1(4), 417–435.
Higuera-Chan, C., Jasso-Fuentes, H., & Minjárez-Sosa, J. (2016). Discrete-time control for systems of interacting objects with unknown random disturbance distributions: A mean field approach. Applied Mathematics & Optimization, 74(1), 197–227.
Higuera-Chan, C., Jasso-Fuentes, H., & Minjárez-Sosa, J. (2017). Control systems of interacting objects modeled as a game against nature under a mean field approach. Journal of Dynamics & Games, 4(1), 59.
Le Boudec, J., McDonald, D., & Mundinger, J. (2007). A generic mean field convergence result for systems of interacting objects. In: Fourth International Conference on the Quantitative Evaluation of Systems (QEST 2007), pp. 3–18. IEEE.
Peyrard, N., & Sabbadin, R. (2006). Mean field approximation of the policy iteration algorithm for graph-based Markov decision processes. In: Proceedings of the 2006 Conference on ECAI 2006: 17th European Conference on Artificial Intelligence August 29–September 1, 2006, Riva del Garda, Italy, pp. 595–599.
Song, T., & Liu, B. (2021). Discrete-time mean-field stochastic linear-quadratic optimal control problem with finite horizon. Asian Journal of Control, 23(2), 979–989.
Higuera-Chan, C. G. (2021). Approximation and mean field control of systems of large populations. In D. Hernandez-Hernandez, F. Leonardi, R. H. Mena, & J. C. Pardo Millan (Eds.), Advances in probability and mathematical statistics. Springer.
Martínez-Manzanares, M., & Minjárez-Sosa, J. (2021). A mean field absorbing control model for interacting objects systems. Discrete Event Dynamic Systems, 31(3), 349–372.
Dynkin, E., & Yushkevich, A. (1979). Controlled Markov processes (Vol. 235). Springer.
Hernández-Lerma, O., & Lasserre, J. B. (2012). Discrete-time Markov control processes: Basic optimality criteria (Vol. 30). Springer.
Hernández-Lerma, O. (2012). Adaptive Markov control processes (Vol. 79). Springer.
Luque-Vásquez, F., & Minjárez-Sosa, J. (2005). Semi-Markov control processes with unknown holding times distribution under a discounted criterion. Mathematical Methods of Operations Research, 61(3), 455–468.
Luque-Vásquez, F., Minjárez-Sosa, J., & Carmen Rosas-Rosas, L. (2011). Semi-Markov control models with partially known holding times distribution: discounted and average criteria. Acta Applicandae Mathematicae, 114(3), 135–156.
Luque-Vásquez, F., & Minjárez-Sosa, J. A. (2014). A note on the \(\sigma \)-compactness of sets of probability measures on metric spaces. Statistics & Probability Letters, 84, 212–214.
Parthasarathy, K. (1967). Probability measures on metric spaces. Academic Press.
Hernández-Lerma, O., & Lasserre, J. B. (2012). Further topics on discrete-time Markov control processes (Vol. 42). Springer.
Puterman, M. (2014). Markov decision processes: Discrete stochastic dynamic programming. Wiley.
Ash, R. (1972). Probability and real analysis. Wiley.
Funding
Work partially supported by Consejo Nacional de Ciencia y Tecnología (CONACYT - MÉXICO) under grant Ciencia Frontera 87787.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
Author M. Elena Martínez-Manzanares declares that she has no Conflict of interest. Author J. Adolfo Minjárez-Sosa declares that he has no Conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix: Proofs
Appendix: Proofs
1.1 Proof of Proposition 6
Fix \(k\in [\mathcal {K}]_0\), \(\mathcal {K}\in \mathbb {N}\) and \(\pi =\{f_\ell \}\in \Pi _M\), an arbitrary policy. Let \(\{a_t\}\in A\) be the sequence of controls corresponding to the application of \(\pi \), and \({\widetilde{M}^N(k)\in \mathbb {P}_N(S)}\) be the initial condition of the process \(\{M^N(t)\}\). Set
where \(w_{n}^i(t)\) are i.i.d. uniform random variables on [0, 1] (see (4)–(7)). Notice that, for each \(t\in \mathbb {N}_0\), \(\{B_{ij,n}^{N}(t)\}_{ij,n}\) is a set of i.i.d. Bernoulli random variables with mean
For \(\varepsilon \in \mathcal {E}\), the Hoeffding inequality implies for each i, j \(\in S\) and \(t\in \mathbb {N}_0\)
Set
and consider \(\bar{\Omega }=\cap _{i,j\in S}\Omega _{ij}\). Now, let \(\{\mu _{t}(\varepsilon ,\Theta _\mathcal {K}^N)\}_{t\in \mathbb {N}_0}\) be the sequence defined as
with \(R=\sup _{a\in A}\sup _{j\in S}\#(S_j(a))\) and \(\Theta _\mathcal {K}^N\) as defined in (31). We now proceed by induction proving that, in \(\bar{\Omega }\), the following holds
For \(t=0\), we have that \(\vert \vert M^N(0)-m(0)\vert \vert _{\infty }=\vert \vert \widetilde{M}^N(k)-\widetilde{m}(k)\vert \vert _{\infty }\le \Theta _\mathcal {K}^N\). Assume that \(\vert \vert M^N(t)-m(t)\vert \vert _{\infty }\le \mu _{t}(\varepsilon ,\Theta _\mathcal {K}^N)\) for a particular \(t\in \mathbb {N}\). Then, from (4), (8), (32) and (54), it follows for each \(j\in S\)
This is, \(\vert M_j^N(t+1)-m_j(t+1)\vert \le \sum _{i=1}^{\infty }\varepsilon _{ij}+R\mu _t(\varepsilon ,\Theta _\mathcal {K}^N)\) \(\forall t\in \mathbb {N}_0\) over \(\bar{\Omega }\). Hence,
which proves (54). Furthermore, notice \(\{\mu _{t}(\varepsilon ,\Theta _\mathcal {K}^N)\}_{t\in \mathbb {N}_0}\) is an increasing sequence on t. This implies that, for \(T\in \mathbb {N}\), \(\mu _{t}(\varepsilon ,\Theta _\mathcal {K}^N)\le \mu _{T}(\varepsilon ,\Theta _\mathcal {K}^N)\) \(\forall t\le T\). Then, from (53), (54) and an induction argument over T, we have
Setting \(C=\lambda =2\), we prove (33).
Finally, (34) follows from similar arguments as presented in Higuera-Chan (2021). \(\square \)
1.2 Proof of Theorem 8
Lemma 9
Let Assumption 1 and 3 holds. For all \(t\in \mathbb {N}_0\) and \(\pi \in \Pi _{M}\) it follows
for all \(i,j\in S\), and \(\sigma \) is defined in (17).
Proof
We proceed by induction. For \(t=0\), (14) and (40) implies (55). Assume (55) is valid for \(t\in \mathbb {N}\). This is,
Notice
Now, from (18), Assumption 2 (g), and (34), it follows:
Hence, we conclude (55). \(\square \)
Proposition 10
Under Assumptions 1, 2 and 3, we have, for \(\pi \in \Pi _M\),
where
Proof
Observe that the right-hand side of (55) converges to zero as \(N\rightarrow \infty \) and \(\vert \vert \varepsilon \vert \vert _\mathcal {E}\rightarrow 0\) [see Remark 3 and Proposition 6]. This implies \(E_{\widetilde{M}^N(k)}^{\pi }\left[ \vert \Gamma _{\alpha }^{N,t}-\Gamma _{\alpha }^t\vert \right] \rightarrow 0\) as \(N\rightarrow \infty \) and \(\vert \vert \varepsilon \vert \vert _\mathcal {E}\rightarrow 0\) which in turn yields
as \(N\rightarrow \infty \) and \(\vert \vert \varepsilon \vert \vert _\mathcal {E}\rightarrow 0\).
On the other hand, the sequence \(\{\vert \Gamma _{\alpha }^{N,t}-\Gamma _{\alpha }^t\vert W(M^N(t))\}\) is uniform integrable. This follows from (57), (Ash (1972), Lemma 7.6.9, p. 301) and the fact
where the first inequality is due to (14) and (40) (see (17) and (18)), while the last inequality comes from (26). The proof is completed by applying (Ash (1972), Theorem 7.5.2, p. 295) whenever we show
as \(N\rightarrow \infty \) and \(\vert \vert \varepsilon \vert \vert _\mathcal {E}\rightarrow 0\), but this follows from (25), (57) and the relations
where \(\eta \) and \(\ell \) are arbitrary positive numbers. \(\square \)
For \(\pi \in \Pi _{M}\), \(m\in \mathbb {P}_{N}(S)\subset \mathbb {P}(S)\) and \( T\in \mathbb {N}\), we define the following the expected costs
and
Proposition 11
Under Assumption 1, 2 and 3, for each \(m\in \mathbb {P}_N(S)\), \(\varepsilon \in \mathcal {E}\), \({T,\mathcal {K}\in \mathbb {N}}\), \(t\in [T]_0\) and \(k\in [\mathcal {K}]_0\) we have
Proof
Fix \(\pi \in \Pi _M\) and \(T\in \mathbb {N}\). From (21) and Proposition 6 it follows
for all \(i,j\in S\). Thus, for each \(\pi \in \Pi _M\), we have
These relations yields (58).
\(\square \)
Proposition 12
Under Assumptions 1, 2 and 3, for each \(m\in \mathbb {P}_N(S)\), \(\varepsilon \in \mathcal {E}\), \(T,\mathcal {K}\in \mathbb {N}\), \(t\in [T]_0\) and \(k\in [\mathcal {K}]_0\) we have
and
Proof
Fix \(\pi \in \Pi _M\). Relation (61) is consequence of the following inequalities:
Taking the supremum over \(\pi \in \Pi _M\) and expectation \(E_{m}^{\varphi }\), it follows that
which proves (61).
By applying similar arguments, together with (39) we obtain (62). \(\square \)
1.2.1 Proof of Theorem 8 (a)
Let \(\pi _{*}^{N}=\{f_{*}^{N}\}\in \Pi _M^N\) be a stationary optimal policy for the model \(\mathcal{S}\mathcal{M}_N\) and \(\tilde{f}\in \mathbb {F}\) be an arbitrary selector. Fix \(\bar{\pi }=\{\bar{f}\}\in \Pi _M\), where \(\bar{f}:\mathbb {P}(S)\rightarrow A\) is defined as
Recall, \(\widetilde{M}(k)\) and \(\widetilde{m}(k)\) are the trajectories generated by the policy \(\varphi \in \Pi _M\) (see Subsection 5.1). Due to (3) and the sufficiency of Markov policies, given an initial configuration \(m\in \mathbb {P}_N(S)\), it follows
for all \(m\in \mathbb {P}_N(S)\subset \mathbb {P}(S)\). This implies
Thus, for each \(m\in \mathbb {P}_N(S)\), \(t\in [T]_0\) and \(0\le k\le \mathcal {K}\),
where the last inequality is due to Proposition 11 and Proposition 12. Therefore, Theorem 8 (a) holds.
1.2.2 Proof of Theorem 8 (b)
Let \(\{\widetilde{M}^N(k)\}\) and \(\{\widetilde{m}(k)\}\) be the trajectories corresponding to the application of the policy \(\pi _*=\{f_*\}\) with initial condition \(\widetilde{M}^N(0)=\widetilde{m}(0)=m\in \mathbb {P}_N(S)\). Observe that from Theorem 8 (a)
Combining this fact with the Markov property [see, e.g., Hernández-Lerma and Lasserre (2012)] we obtain
where (67) is due (16) and (66).
Now, let
be the discrepancy function of the mean field control model. It is easy to see from (41) that \(\Phi (m,f_*)=0\). Then, considering the discrepancy function \(\Phi ^N\) defined in (46) we get
Finally, Theorem 8 (a) with \(k=0\) and (67), imply
Because T is arbitrary, letting \(\vert \vert \varepsilon \vert \vert _{\mathcal {E}}\rightarrow 0\) and \(N\rightarrow \infty \), and considering Remark 3, we obtain (45). \(\square \)
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Martínez-Manzanares, M.E., Minjárez-Sosa, J.A. Semi-Markov control models for systems of large populations of interacting objects with possible unbounded costs: a mean field approach. Ann Oper Res (2024). https://doi.org/10.1007/s10479-024-05937-2
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10479-024-05937-2