Abstract
In this paper we consider a singularly perturbed Markov decision process with the limiting average cost criterion. We assume that the underlying process is composed ofn separate irreducible processes, and that the small perturbation is such that it “unites” these processes into a single irreducible process. We formulate the underlying control problem for the singularly perturbed MDP, and call it the “limit Markov control problem” (limit MCP). We prove the validity of the “the limit control principle” which states that an optimal solution to the perturbed MDP can be approximated by an optimal solution of the limit MCP for any sufficiently small perturbation. We also demonstrate that the limit Markov control problem is equivalent to a suitably constructed nonlinear program in the space of long-run state-action frequencies. This approach combines the solutions of the original separated irreducible MDPs with the stationary distribution of a certain “aggregated MDP” and creates a framework for future algorithmic approaches.
Similar content being viewed by others
References
R. Aldhaheri and h. Khalil, Aggregation and optimal control of nearly completely decomposable Markov chains, in:Proc. 28th CDC, (IEEE, 1989).
D. Blackwell, Discrete dynamic programming, Ann. Math. Stat. 33 (1962) 719–726.
M. Cordech, A. Willsky, S. Sastry and D. Castanon, Hierarchical aggregation of linear systems with multiple time scales, IEEE Trans. Automatic Control AC-28 (1983) 1017–1029.
F. Delebecque, A reduction process for perturbed Markov chains, SIAM J. Appl. Math. 48 (1983) 325–350.
F. Delebecque and J. Quadrat, Optimal control of Markov chains admitting strong and weak interactions, Automatica 17 (1981) 281–296.
N.V. Dijk, Perturbation theory for unbounded Markov reward processes with applications to queueing, Adv. Appl. Prob. 20 (1988) 99–111.
N.V. Dijk and M. Puterman, Perturbation theory for Markov reward processes with applications to queueing systems, Adv. Appl. Prob. 20 (1988) 79–98.
R.A. Howard,Dynamic Programming and Markov Processes (Wiley, New York, 1960).
L.C.M. Kallenberg,Linear Programming and Finite Markovian Control Problems, Mathematical Center Tracts 148, Amsterdam (1983).
T. Kato,Perturbation Theory for Linear Operators (Springer, Berlin, 1980).
P. Kokotovič, Application of singular perturbation techniques to control problems, SIAM Rev. 26 (1984) 501–550.
R.G. Phillips and P. Kokotovič, A singular perurbation approach to modelling and control of Markov chains, IEEE Trans. Automatic Control AC-26 (1981) 1087–1094.
J. Rohlicek and A. Willsky, Multiple time scale decomposition of discrete time Markov chains, Syst. Control Lett. 11 (1988) 309–314.
P. Schweitzer, Perturbation series for nearly completely-decomposable Markov chains, in:Teletraffic Analysis and Computer Performance Evaluation, O. Boxma, J. Cohen and H. Tijms (eds.) (Elsevier Science, 1986).
P.J. Schweitzer, Perturbation theory and finite Markov chains, J. Appl. Prob. 5 (1968) 401–413.
Author information
Authors and Affiliations
Additional information
On leave from Main College of Planning and Statistics, Warsaw, Poland.
Supported in part by the AFOSR and the NSF under the grant ECS-8704954.
Rights and permissions
About this article
Cite this article
Bielecki, T.R., Filar, J.A. Singularly perturbed Markov control problem: Limiting average cost. Ann Oper Res 28, 153–168 (1991). https://doi.org/10.1007/BF02055579
Issue Date:
DOI: https://doi.org/10.1007/BF02055579