Abstract
This brief paper presents a policy-improvement method of generating a feasible stochastic policy \({\tilde{\pi}}\) from a given feasible stochastic base-policy π such that \({\tilde{\pi}}\) improves all of the feasible policies “induced” from π for infinite-horizon constrained discounted controlled Markov chains (CMCs). A policy-iteration heuristic for approximately solving constrained discounted CMCs is developed from this improvement method.
References
Abndramonov M., Filar J., Pardalos P.M., Rubinov A.: Hamiltonian cycle problem via Markov chains and min-type approaches. In: Pardalos, P.M. (eds) Approximation and Complexity in Numerical Optimization, pp. 31–47. Kluwer Academic Publishers, New York (2000)
Altman E.: Constrained Markov Decision Processes. Chapman & Hall/CRC, Boca Raton (1998)
Chang H.S.: A policy improvement method in constrained stochastic dynamic programming. IEEE Trans. Autom. Control 51(9), 1523–1526 (2006)
Feinberg E.A., Schwartz A.: Constrained discounted dynamic programming. Math. Oper. Res 21(4), 922–945 (1996)
Feinberg E.A., Schwartz A.: Constrained dynamic programming with two discount factors: applications and an algorithm. IEEE Trans. Autom. Control 44, 628–631 (1999)
Filar, J., Oberije, M., Pardalos, P.M.: Hamiltonian Cycle Problem, Controlled Markov Chains and Quadratic Programming. In: Proceedings of the 12th National Conference of the Australian Society For Operations Research, pp. 263–281 (1993)
Floudas, C.A., Pardalos, P.M. (eds): Encyclopedia of Optimization. Springer, (2009)
Puterman M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, New York (1994)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Chang, H.S. A policy iteration heuristic for constrained discounted controlled Markov Chains. Optim Lett 6, 1573–1577 (2012). https://doi.org/10.1007/s11590-011-0338-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11590-011-0338-7