Abstract
We consider continuous-time Markov decision processes in Polish spaces. The performance of a control policy is measured by the expected discounted reward criterion associated with state-dependent discount factors. All underlying Markov processes are determined by the given transition rates which are allowed to be unbounded, and the reward rates may have neither upper nor lower bounds. By using the dynamic programming approach, we establish the discounted reward optimality equation (DROE) and the existence and uniqueness of its solutions. Under suitable conditions, we also obtain a discounted optimal stationary policy which is optimal in the class of all randomized stationary policies. Moreover, when the transition rates are uniformly bounded, we provide an algorithm to compute (or at least to approximate) the discounted reward optimal value function as well as a discounted optimal stationary policy. Finally, we use an example to illustrate our results. Specially, we first derive an explicit and exact solution to the DROE and an explicit expression of a discounted optimal stationary policy for such an example.
Similar content being viewed by others
References
Carmon, Y., Shwartz, A.: Markov decision processes with exponentially representable discounting. Oper. Res. Lett. 37(1), 51–55 (2009)
Doshi, B.T.: Continuous-time control of Markov processes on an arbitrary state space: discounted rewards. Ann. Stat. 4, 1219–1235 (1976)
Dynkin, E.B., Yushkevich, A.A.: Controlled Markov Processes. Springer, New York (1979)
Feinberg, E.A.: Continuous-time jump Markov decision processes: A discrete-event approach. Math. Oper. Res. 29, 492–524 (2004)
Feinberg, E.A., Shwartz, A.: Markov decision models with weighted discounted criteria. Math. Oper. Res. 19(1), 152–168 (1994)
Feinberg, E.A., Shwartz, A.: Constrained dynamic programming with two discount factors: Applications and algorithm. IEEE Trans. Autom. Control 44(3), 628–631 (1999)
Feller, W.: On the integro-differential equations of purely discontinuous Markoff processes. Trans. Am. Math. Soc. 48, 488–515 (1940)
Gihman, I.I., Skorohod, A.V.: The Theory of Stochastic Processes II. Springer, Berlin (2004) (This is a reprint of the First edition 1975)
González-Hernández, J., López-Martínez, R.R., Pérez-Hernández, J.R.: Markov control processes with randomized discounted cost. Math. Methods Oper. Res. 65, 27–44 (2007)
Guo, X.P.: Continuous-time Markov decision processes with discounted rewards: The case of Polish spaces. Math. Oper. Res. 32(1), 73–87 (2007)
Guo, X.P.: Constrained optimization for average cost continuous-time Markov decision processes. IEEE Trans. Autom. Control 52(6), 1139–1143 (2007)
Guo, X.P., Hernández-Lerma, O.: Continuous-time controlled Markov chains. Ann. Appl. Probab. 13, 363–388 (2003)
Guo, X.P., Hernández-Lerma, O.: Continuous-time Markov Decision Processes: Theory and Applications. Springer, Berlin (2009)
Guo, X.P., Song, X.Y.: Discounted continuous-time constrained Markov decision processes in Polish spaces. Ann. Appl. Probab. 21(5), 2016–2049 (2011)
Guo, X.P., Ye, L.E.: New discount and average optimality conditions for continuous-time Markov decision processes. Adv. Appl. Probab. 42, 953–985 (2010)
Hernández-Lerma, O., Lasserre, J.B.: Discrete-Time Markov Control Processes. Springer, New York (1996)
Hernández-Lerma, O., Lasserre, J.B.: Further Topics on Discrete-Time Markov Control Processes. Springer, New York (1999)
Hinderer, K.: Foundations of Non Stationary Dynamic Programming with Discrete Time Parameter. Lecture Notes in Operations Research, vol. 33. Springer, New York (1970)
Kakumanu, P.: Continuously discounted Markov decision models with countable state and action spaces. Ann. Math. Stat. 42, 919–926 (1971)
Puterman, M.L.: Markov Decision Processes. Wiley, New York (1994)
Schäl, M.: Conditions for optimality in dynamic programming and for the limit of n-stage optimal policies to be optimal. Z. Wahrscheinlichkeitstheor. Verw. Geb. 32(3), 179–196 (1975)
Ye, L.E., Guo, X.P.: (2012) Construction and regularity of transition functions on Polish spaces under measurability conditions. Acta Math. Appl. Sin. (accepted)
Ye, L.E., Guo, X.P., Hernández-Lerma, O.: Existence and regularity of a nonhomogeneous transition matrix under measurability conditions. J. Theor. Probab. 21, 604–627 (2008)
Acknowledgements
Research supported by NSFC, GDUPS, and GPK-LCS.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ye, L., Guo, X. Continuous-Time Markov Decision Processes with State-Dependent Discount Factors. Acta Appl Math 121, 5–27 (2012). https://doi.org/10.1007/s10440-012-9669-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10440-012-9669-3
Keywords
- Continuous-time Markov decision processes
- State-dependent discount factor
- Dynamic programming
- Explicit and exact solution
- Explicit expression