Abstract
We examine a finite-horizon Markov decision process which admits an unbounded Bellman function. The optimality equation is analyzed and the necessary and sufficient conditions for the optimal Markov A-strategies are obtained. Optimal synthesis using the entropy criterion is considered.
Similar content being viewed by others
Literature Cited
V. I. Arkin and I. V. Evstigneev, Probabilistic Models of Control and Economic Dynamics [in Russian], Nauka, Moscow (1979).
A. M. Ter-Krikorov, Optimal Control and Mathematical Economics [in Russian], Nauka, Moscow (1977).
É. L. Presman and I. M. Sonin, Sequential Control under Incomplete Information [in Russian], Nauka, Moscow (1982).
R. Sh. Liptser and A. N. Shiryaev, Statistics of Random Processes [in Russian], Nauka, Moscow (1974).
A. A. Yushkevich, “Controlled Markov jump models,” Teor. Veroyatn. Primen., No. 2, 247–270 (1980).
V. M. Chametov and A. B. Piunovski, “On the optimal control of information transmission,” Fundamentals of Teletraffic Theory, Proc. 3rd Int. Seminar on Teletraffic Theory, Moscow, Int. Advisory Council of Int. Teletraffic Congresses, Inst. of Problems of Information Transmission, Akad. Nauk SSSR (1984), pp. 57–60.
H. Mine and S. Osaki, Markovian Decision Processes, Elsevier, New York (1970).
E. A. Fainberg, “ε-Optimal control of a finite Markov chain by a mean criterion,” Teor. Veroyatn. Primen., No. 1, 71 (1980).
P. Varaiyal, “Optimal and suboptimal stationary controls for markov chains,” IEEE Trans. Autom. Contr.,AC-23, No. 3, 388–394 (1978).
A. A. Yushkevich and R. Ya. Chitashvili, “Controlled random sequences and Markov chains,” Usp. Mat. Nauk,37, No. 6, 213–242 (1982).
D. Bertsekas and S. Shreve, Stochastic Optimal Control: The Discrete Time Case, Academic Press, New York (1978).
Van der Wal, Stochastic Dynamic Programming, Successive Approximations and Nearly Optimal Strategies for Markov Decision Processes and Markov Games, MCT, Amsterdam (1980).
J. Wessels, “Markov programming by successive approximations with respect to weighted supremum norms,” J. Math. Anal. Appl.,58, 326–335 (1977).
J. A. E. E. Van Nunen and J. Wessels, “Markov decision processes with unbounded rewards,” Proc. Adv. Seminar on Markov Decision Theory, MCT, Amsterdam, (1977), pp. 1–24.
D. R. Robinson, “Markov decision chains with unbounded costs and applications to the control of queues,” Adv. Appl. Probab.,8, No. 1, 159–176 (1976).
A. N. Shiryaev, Probability [in Russian], Nauka, Moscow (1980).
A. A. Borovkov, Mathematical Statistics [in Russian], Nauka, Moscow (1984).
A. K. Zvonkin, “On sequentially controlled Markov processes,” Mat. Sb.,86, 611–621 (1971).
Mathematical Encyclopedia [in Russian], Vol. 2, Izd. Sov. Éntsikl., Moscow (1979).
Additional information
Translated from Kibernetika, No. 3, pp. 82–90, May–June, 1991.
Rights and permissions
About this article
Cite this article
Piunovskii, A.B., Khametov, V.M. New exactly solvable examples for controlled discrete-time Markov chains. Cybern Syst Anal 27, 420–433 (1991). https://doi.org/10.1007/BF01068323
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1007/BF01068323