On the properties of ε(≥0) optimal policies in discounted unbounded return model
- 12 Downloads
This paper investigates the properties of ε(≥0) optimal policies in the model of . It is shown that, if π* = (π 0 * , π 1 * ,..., π n * , π n +1/* , ...) is aβ-discounted optimal policy, then (π 0 * , π 1 * , ..., π n * )∞ for alln≥0 is also aβ-discounted optimal policy. Under some condition we prove that stochastic stationary policy π n *∞ corresponding to the decision rule π n * is also optimal for the same discounting factorβ. We have also shown that for eachβ-optimal stochastic stationary policy π 0 *∞ , π 0 *∞ can be decomposed into several decision rules to which the corresponding stationary policies are alsoβ-optimal separately; and conversely, a proper convex combination of these decision rules is identified with the former π 0 * . We have further proved that for any (ε,β)-optimal policy, say π*=(π 0 * , π 1 * , ..., π n * , π n +1/* , ...), π n−1 * )∞ is ((1−β n )−1e, β) optimal forn>0. At the end of this paper we mention that the results about convex combinations and decompositions of optimal policies of § 4 in  can be extended to our case.
KeywordsDecision Rule Stationary Policy Optimal Policy Convex Combination Math Application
Unable to display preview. Download preview PDF.
- Luo Handong, Liu Jiwei, Xia Zhihao, Properties of ε-optimal Policies for Discounted Model (Chinese),J. Huazhong (Central China) University of Science and Technology,14 4 (1986).Google Scholar
- Guo Shizhen, Optimal Policies of Discounted Markovian Decision Programming (Chinese),Math. in Economics,1 (1984), 109–120.Google Scholar
- Dong Zeqing, Lecture of Markovian Decision Programming (Chinese), Institue of Applied Mathematics, Academia Sinica, Beijing, Mimeograph, 1981 (Reprint, 1982).Google Scholar
- Dong Zeqing; Liu Ke, Structure of Optimal Policies for Discounted Markovian Decision Programming (Chinesn),J. Math. Res. Exposition,6 3 (1986); Letter,Kexue Tongbao,30: 1 (1985).Google Scholar
- Chitgopekar, S. S., Denumerable state Markovian sequential control processes: On randomizations of optimal policies,Naval Res. Logist Quart,22 (1975), 567–573.Google Scholar
- Harrison, J. M., Discrete dynamic programming with unbounded rewards,Ann. Math. Statist.,43 (1972), 636–644.Google Scholar
- Wal, J. van der, On stationary strategies in countable state total reward Morkov decision processes,Math. Opers. Res.,9 (1984), 290–300.Google Scholar