Abstract
This paper discusses continuous time Markov decision processes with criterion of expected discounted total rewards, where the state space is countable, the reward rate function is extended real-valued and the discount rate is a real number. Under necessary conditions that the model is well defined, the state space is partitioned into three subsets, on which the optimal value function is positive infinity, negative infinity, or finite, respectively. Correspondingly, the model is reduced into three submodels, by generalizing policies and eliminating some worst actions. Then for the submodel with finite optimal value, the validity of the optimality equation is shown and some its properties are obtained.
This research was supported by the National Natural Science Foundation of China, and by Institute of Applied Mathematics, Academia Sinica and by GRANT-IN-AID FOR SCIENTIFIC RESEARCH (No.13650440), Japan.
Chapter PDF
Similar content being viewed by others
References
Kakumanu, P.V.: Continuous Time Markov Decision Models with Applications to Optimization Problems. Technical Report 63, Dept. of Oper. Res., Cornell Univ. (1969)
Lewis, M.E. and Puterman, M.L.: A Probabilistic Analysis of Bias Optimality in Unichain Markov Decision Processes. IEEE Trans. on Autom. Contr. 46 (2001) 96–100
Lippman, S.A.: On Dynamic Programming with Unbounded Rewards, Mgt. Sci. 21 (1975) 1225–1233
Cassandras, C.G., Pepyne, D.L. and Wardi, Y.: Optimal control of A Class of Hybrid Systems. IEEE Trans. on AC 46 (2001) 398–415
Song, J.: Continuous Time Markov Decision Processes with Nonuniformly Bounded Transition Rate Family, Scientia Sinica Series A, 11 (1988) 1281–1290
Hu, Q.: CTMDP and Its Relationship with DTMDP. Chinese Sci. Bull. 35 (1990) 710–714
Serfozo, R.F.: An Equivalence Between Continuous and Discrete Time Markov Decision Processes, J. Oper. Res. 27 (1979) 60–70
Hou, B.: Continuous-time Markov Decision Processes Programming with Polynomical Reward, Thesis, Institute of Appl. Math. Academic Sinica, Bejing (1986).
Guo, X.P. and Zhu, W.P.: Denumerable-state Continuous-time Markov Decision Processes with Unbounded Transition and Reward Rates under the Discounted Criterion. J. Appl. Prob. 39 (2002) 233–250
Guo, X.P. and Zhu, W.P.: Denumerable-state Continuous-time Markov Decision Processes with Unbounded Cost and Transition Rates under Average Criterion. ANZIAM J. 43 (2002) 541–557
Hu, Q. and Xu, C.: The Finiteness of the Reward Function and the Optimal Value Function in Markov Decision Processes. J. Math. Methods in Ope. Res. 49 (1999) 255–266
Chung, K.L.: Markov Chains with StationaryTransition Probabilities. Springer-Verlag (1960)
Kuczura, A.: Piecewise Markov Processes. SIAM J. Appl. Math. 24 (1973) 169–181
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hu, Q., Liu, J., Yue, W. (2003). Continuous Time Markov Decision Processes with Expected Discounted Total Rewards. In: Sloot, P.M.A., Abramson, D., Bogdanov, A.V., Gorbachev, Y.E., Dongarra, J.J., Zomaya, A.Y. (eds) Computational Science — ICCS 2003. ICCS 2003. Lecture Notes in Computer Science, vol 2658. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44862-4_8
Download citation
DOI: https://doi.org/10.1007/3-540-44862-4_8
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40195-7
Online ISBN: 978-3-540-44862-4
eBook Packages: Springer Book Archive