Abstract
In this paper, the infinite horizon Markovian decision programming with recursive reward functions is discussed. We show that Bellman's optimal principle is applicable for our model. Then, a sufficient and necessary condition for a policy to be optimal is given. For the stationary case, an iteration algorithm for finding a stationary optimal policy is designed. The algorithm is a generalization of Howard's [7] and Iwamoto's [3] algorithms.
Similar content being viewed by others
References
N. Furukawa and S. Iwamoto, Markovian decision processes with recursive reward functions, Bull. Math. Statist. 15, 3–4(1973)79–91.
N. Furukawa and S. Iwamoto, Correction to “Markovian decision processes with recursive reward functions”, Bull. Math. Statist. 16, 1–2(1974)127.
S. Iwamoto, Discrete dynamic programming with a recursive additive system, Bull. Math. Statist. 16, 1–2(1974)49–66.
N. Furukawa and S. Iwamoto, Dynamic programming on recursive reward systems, Bull. Math. Statist. 17, 1–2(1976)103–126.
Dong Zeqing and Liu Ke, Structure of optimal policies for discounted Markovian decision programming. J. Math. Res. Exposition 6, 3(1986)125–134, in Chinese.
D. Blackwell, Discrete dynamic programming, Ann. Math. Statist. 33(1962)719–726.
R.A. Howard,Dynamic Programming and Markov Processes (Wiley, New York, 1960).
Dong Zeqing, Lecture on Markovian decision programming, Institute of Applied Mathematics, Academia Sinica, Beijing, Mimeograph (1985), in Chinese.
Dong Zeqing and Zhang Sheng, On the properties of ε(≥0) optimal policies in the discounted unbounded return model, Acta Math. Appl. Sinica (English Series) 3, 1(1987)15–25.
Dong Zeqing and Liu Ke, Structure of optimal policies for discounted semi-Markov decision programming with unbounded rewards, Sci. Sinica (Ser. A), 4(1986)337–349.
Author information
Authors and Affiliations
Additional information
This research was supported by the National Natural Science Foundation of China.
Rights and permissions
About this article
Cite this article
Liu, J., Liu, K. On Markovian decision programming with recursive reward functions. Ann Oper Res 24, 145–164 (1990). https://doi.org/10.1007/BF02216820
Issue Date:
DOI: https://doi.org/10.1007/BF02216820