Abstract
This paper deals with discrete-time Markov decision processes (MDPs) with Borel state and action spaces, and total expected discounted cost optimality criterion. We assume that the discount factor is not constant: it may depend on the state and action; moreover, it can even take the extreme values zero or one. We propose sufficient conditions on the data of the model ensuring the existence of optimal control policies and allowing the characterization of the optimal value function as a solution to the dynamic programming equation. As a particular case of these MDPs with varying discount factor, we study MDPs with stopping, as well as the corresponding optimal stopping times and contact set. We show applications to switching MDPs models and, in particular, we study a pollution accumulation problem.
Similar content being viewed by others
References
Aliprantis CD, Border KC (2006) Infinite dimensional analysis. Springer, New York
Bensoussan A (2011) Dynamic programming and inventory control. IOS Press, Amsterdam
Bensoussan A, Lions JL (1982) Applications of variational inequalities in stochastic control. North-Holland, Amsterdam
Bertsekas DP (1976) Dynamic programming and stochastic control. Academic Press, New York
Dufour F, Piunovskiy A (2010) Multiobjective stopping problem for discrete-time Markov processes: conves analytic approach. J Appl Prob 47:947–966
Hernández-Lerma O, Lasserre JB (1996) Discrete-time Markov control processes: basic optimality criteria. Springer, New York
Hernández-Lerma O, Lasserre JB (1999) Further topics on discrete-time Markov control processes. Springer, New York
Hinderer K, Rieder U, Stieglitz M (2010) Dynamic optimization. Springer, New York
Horiguchi M (2001a) Stopped Markov decision processes with a stopping time constraint. Math Methods Oper Res 53:279–295
Horiguchi M (2001b) Stopped Markov decision processes with multiple constraints. Math Methods Oper Res 54:455–469
Ilhuicatzi-Roldán R, Cruz-Suárez H, Chávez-Rodríguez S (2017) Markov decision processes with time-varying discount factors and random horizon. Kybernetica 53:82–98
Jasso-Fuentes H, López-Barrientos JD (2015) On the use of stochastic differential games against nature to ergodic control problems with unknown parameters. Int J Control 88:897–909
Jasso-Fuentes H, Yin G (2013) Advanced criteria for controlled Markov-modulated diffusions in an infinite horizon: overtaking, bias, and Blackwell optimality. Science Press, Beijing
Jasso-Fuentes H, Menaldi JL, Prieto-Rumeau T, Robin M (2018) Discrete-time hybrid control in Borel spaces: average cost optimality criterion. J Math Anal Appl 462:1695–1713
Jasso-Fuentes H, Menaldi JL, Prieto-Rumeau T (2020) Discrete-time hybrid control in Borel spaces. Appl Math Optim 81:409–441
Kallenberg O (2002) Foundations of modern probability. Springer, New York
Menaldi JL, Blankenship GL (1984) Optimal stochastic scheduling of power generation systems with scheduling delays and large cost differentials. SIAM J Control Optim 22:121–132
Meyn S, Tweedie RL (2009) Markov chains and Stochastic stability. Cambridge University Press, Cambridge
Minjárez-Sosa A (2015) Markov control models with unknown random state-action-dependent discount factors. TOP 23:743–772
Morimoto H (2010) Stochastic control and mathematical modeling. Encyclopedia of mathematics and its applications. Cambridge University Press, New York
Puterman ML (1994) Markov decision processes: discrete stochastic dynamic programming. Wiley, New York
Rieder U (1975) On stopped decision processes with discrete time parameter. Stoch Process Appl 3:365–383
Ross SM (1983) Introduction to stochastic dynamic programming. Academic Press, New York
Wei Q, Guo XP (2011) Markov decision processes with state-dependent discount factors and unbounded rewards/costs. Oper Res Lett 39:369–374
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supported by Grant MTM2016-75497-P from the Spanish Ministerio de Economía y Competitividad.
Rights and permissions
About this article
Cite this article
Jasso-Fuentes, H., Menaldi, JL. & Prieto-Rumeau, T. Discrete-time control with non-constant discount factor. Math Meth Oper Res 92, 377–399 (2020). https://doi.org/10.1007/s00186-020-00716-8
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00186-020-00716-8