Acta Mathematicae Applicatae Sinica

, Volume 3, Issue 1, pp 15–25 | Cite as

On the properties of ε(≥0) optimal policies in discounted unbounded return model

  • Dong Zeqing 
  • Zhang Sheng 


This paper investigates the properties of ε(≥0) optimal policies in the model of [2]. It is shown that, if π* = (π 0 * , π 1 * ,..., π n * , π n +1/* , ...) is aβ-discounted optimal policy, then (π 0 * , π 1 * , ..., π n * ) for alln≥0 is also aβ-discounted optimal policy. Under some condition we prove that stochastic stationary policy π n *∞ corresponding to the decision rule π n * is also optimal for the same discounting factorβ. We have also shown that for eachβ-optimal stochastic stationary policy π 0 *∞ , π 0 *∞ can be decomposed into several decision rules to which the corresponding stationary policies are alsoβ-optimal separately; and conversely, a proper convex combination of these decision rules is identified with the former π 0 * . We have further proved that for any (ε,β)-optimal policy, say π*=(π 0 * , π 1 * , ..., π n * , π n +1/* , ...), π n−1 * ) is ((1−β n )−1e, β) optimal forn>0. At the end of this paper we mention that the results about convex combinations and decompositions of optimal policies of § 4 in [1] can be extended to our case.


Decision Rule Stationary Policy Optimal Policy Convex Combination Math Application 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    Luo Handong, Liu Jiwei, Xia Zhihao, Properties of ε-optimal Policies for Discounted Model (Chinese),J. Huazhong (Central China) University of Science and Technology,14 4 (1986).Google Scholar
  2. [2]
    Guo Shizhen, Optimal Policies of Discounted Markovian Decision Programming (Chinese),Math. in Economics,1 (1984), 109–120.Google Scholar
  3. [3]
    Dong Zeqing, Lecture of Markovian Decision Programming (Chinese), Institue of Applied Mathematics, Academia Sinica, Beijing, Mimeograph, 1981 (Reprint, 1982).Google Scholar
  4. [4]
    Dong Zeqing; Liu Ke, Structure of Optimal Policies for Discounted Markovian Decision Programming (Chinesn),J. Math. Res. Exposition,6 3 (1986); Letter,Kexue Tongbao,30: 1 (1985).Google Scholar
  5. [5]
    Chitgopekar, S. S., Denumerable state Markovian sequential control processes: On randomizations of optimal policies,Naval Res. Logist Quart,22 (1975), 567–573.Google Scholar
  6. [6]
    Harrison, J. M., Discrete dynamic programming with unbounded rewards,Ann. Math. Statist.,43 (1972), 636–644.Google Scholar
  7. [7]
    Wal, J. van der, On stationary strategies in countable state total reward Morkov decision processes,Math. Opers. Res.,9 (1984), 290–300.Google Scholar

Copyright information

© Science Press, Beijing, China and Allerton Press, Inc. New York, U.S.A. 1987

Authors and Affiliations

  • Dong Zeqing 
    • 1
  • Zhang Sheng 
    • 2
  1. 1.Institute of Applied MathematicsAcademia SinicaChina
  2. 2.Yunnan UniversityChina

Personalised recommendations