Abstract
We consider the long run average or ‘ergodic’ control of a discrete time Markov process with a probabilistic constraint in terms of a bound on the exit rate from a bounded subset of the state space. This is a natural counterpart of the more common probabilistic constraints in the finite horizon control problems. Using a recent characterization by Anantharam and the first author of risk-sensitive reward as the value of an average cost ergodic control problem, this problem is mapped to a constrained ergodic control problem that seeks to maximize an ergodic reward subject to a constraint on another ergodic reward. However, unlike the classical constrained ergodic reward/cost problems, this problem has some non-classical features due to a non-standard coupling between between the primary ergodic reward and the one that gets constrained. This renders the problem inaccessible to standard solution methodologies. A brief discussion of possible ways out is included.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Anantharam, V., Borkar, V. S.: A variational formula for risk-sensitive reward. SIAM Journal of Control and Optimization. 55(2), 961-988 (2017).
Andrieu, L., Cohen, J., Va´zquez-Abad, F. J.: Gradient-based simulation optimization under probability constraints. European Journal of Operational Research. 212(2), 345-351 (2011).
Borkar, V. S.: Topics in Controlled Markov Chains. Pitman Research Notes in Math. No. 240, Longmans Scientific and Technical, Harlow, UK (1991).
Borkar, V. S.: Probability Theory: An Advanced Course. Springer Verlag, New York (1995).
Borkar, V. S.: Convex analytic methods in Markov decision processes. In: Feinberg E. A., Shwartz A. (eds.) Handbook of Markov Decision Processes, pp. 347-375. Kluwer Academic Publishers, Norwell, Mass. (2002).
Borkar, V. S.: Stochastic Approximation: A Dynamical Systems Viewpoint. Hindustan Pub- lishing Agency, New Delhi, and Cambridge University Press, Cambridge, UK (2008).
Danskin, J. M.: Theory of max-min, with applications. SIAM Journal of Applied Mathemat- ics.14(4) (1966), 641-664.
Herna´ndez-Lerma, O., Lasserre, J. B.: Policy iteration for average cost Markov control pro- cesses on Borel spaces. Acta Applicandae Mathematicae. 47, 125-154 (1997).
Kang, B., Filar, J. A.: Time consistent dynamic risk measures. Mathematical Methods of Op- erations Research 63(1) (2006), 169-186.
Krein, M. G., Rutman, M. A¿: Linear operators leaving invariant a cone in Banach spaces. Uspekhi Mat. Nauk. 3(1), 3-95 (1948).
Meyn, S. P.: The policy iteration algorithm for average reward Markov decision processes with general state space. IEEE Transactions on Automatic Control. 42(12), 1663-1680 (1997).
Milgrom, P., Segal, I.: Envelope theorems for arbitrary choice sets. Econometrica. 70(2), 2002, 583-601.
Puterman, M.: Markov Decision Processes: Discrete Dynamic Programming. John Wiley and Sons, Hoboken, NJ, 1994.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Borkar, V.S., Filar, J.A. (2019). Postponing Collapse: Ergodic Control with a Probabilistic Constraint. In: Yin, G., Zhang, Q. (eds) Modeling, Stochastic Control, Optimization, and Applications. The IMA Volumes in Mathematics and its Applications, vol 164. Springer, Cham. https://doi.org/10.1007/978-3-030-25498-8_3
Download citation
DOI: https://doi.org/10.1007/978-3-030-25498-8_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-25497-1
Online ISBN: 978-3-030-25498-8
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)