Skip to main content
Log in

Notes on equivalent stationary policies in Markov decision processes with total rewards

  • Published:
Mathematical Methods of Operations Research Aims and scope Submit manuscript

Abstract

We construct examples of Markov Decision Processes for which, for a given initial state and for a given nonstationary transient policy, there is no equivalent (randomized) stationary policy, i.e. there is no stationary policy which occupation measure is equal to the occupation measure of a given policy. We also investigate the relation between the existence of equivalent stationary policies in special models and the existence of equivalent strategies in various classes of nonstationary policies in general models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Altman E (1994) Denumerable constrained Markov decision processes and finite approximations. Math Oper Res 19:169–191

    Google Scholar 

  • Altman E (1996) Constrained Markov decision processes with total cost criteria: occupation measures and primal LP. Zor — Mathematical Methods of Operations Research 43:45–72

    Google Scholar 

  • Altman E, Shwartz A (1991) Markov decision problems are state-action frequencies. SIAM J Control and Optimization 29:786–809

    Google Scholar 

  • Bertsekas D (1987) Dynamic programming. Prentice-Hall. Englewood Cliffs

    Google Scholar 

  • Borkar V (1988) A convex analytic approach to Markov decision process. Probab Th Rel Fields 78:583–602

    Google Scholar 

  • Chung K (1960) Markov chains with stationary transition probabilities. Springer-Verlag, Berlin

    Google Scholar 

  • Derman C (1970) Finite state Markovian decision processes. Academic Press, New York

    Google Scholar 

  • Derman C, Strauch R (1966) A note on memoryless rules for controlling sequential control processes. Ann Math Stat 37:276–278

    Google Scholar 

  • Derman C, Veinott A (1972) Constrained Markov decision chains. Management Sci 19:389–390

    Google Scholar 

  • Dynkin E, Yushkevich A (1979) Controlled Markov processes. Springer-Verlag, Berlin

    Google Scholar 

  • Feinberg E (1982) Controlled Markov processes with arbitrary numerical criteria. Theory Probab Appl 27:486–503

    Google Scholar 

  • Feinberg E (1986) Sufficient classes of strategies in discrete dynamic programming I: Decomposition of randomized strategies and embedded models. Theory Probab Appl 31:658–668

    Google Scholar 

  • Feinberg E (1987) Sufficient classes of strategies in discrete dynamic programming II: Locally stationary strategies. Theory Probab Appl 32:435–448

    Google Scholar 

  • Feinberg E (1991) Non-randomized strategies in stochastic decision processes. Ann Oper Res 29:315–332

    Google Scholar 

  • Feinberg E, Park H (1994) Finite state Markov decision models with average reward criteria. Stochastic Process Appl 49:159–177

    Google Scholar 

  • Feinberg E, Sonin I (1985) Persistently nearly optimal strategies in stochastic dynamic programming. In: Krylov N et al. (eds) Statistics and control of stochastic processes, Steklov Seminar, 1984, Optimization Software, New York, 69–101

    Google Scholar 

  • Heyman D, Sobel M (1984) Stochastic models in operations research, Volume II: Stochastic optimization. McGraw-Hill, New York

    Google Scholar 

  • Hill T (1979) On the existence of good Markov strategies. Trans Amer Math Soc 249:157–176

    Google Scholar 

  • Hill T, Pestien V (1987) The existence of good Markov strategies for decision processes with general payoffs. Stochastic Process Appl 24:61–76

    Google Scholar 

  • Hordijk A (1974) Dynamic programming and Markov potential theory. Math Centre Tracts 51, Mathematisch Centrum, Amsterdam

    Google Scholar 

  • Krylov N (1985) Once more about the connection between elliptic operators and Ito's stochastic equations. In: Krylov N et al. (eds), Statistics and control of stochastic processes, Steklov Seminar, 1984, Optimization Software, New York, 214–229

    Google Scholar 

  • Neveu J (1965) Mathematical foundations of the calculus of probability. Holden-Day, San Francisco

    Google Scholar 

  • Pestien V, Wang X (1993) Finite-stage reward functions having the Markov adequacy property. Stochastic Process Appl 46:129–151

    Google Scholar 

  • Puterman M (1994) Markov decision processes. John Wiley & Sons, New York

    Google Scholar 

  • Shiryayev A (1978) Optimal stopping rules. Springer-Verlag, Berlin

    Google Scholar 

  • Sonin I (1991) On an extremal property of Markov chains and sufficiency of Markov strategies in Markov decision processes with the Dubins-Savage criterion. Ann Oper Res 29:417–426

    Google Scholar 

  • Strauch R (1966) Negative dynamic programming. Ann Math Stat 37:871–890

    Google Scholar 

  • van der Wal J (1981) Stochastic dynamic programming. Math Centre Tracts 139, Mathematisch Centrum, Amsterdam

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Feinberg, E.A., Sonin, I.M. Notes on equivalent stationary policies in Markov decision processes with total rewards. Mathematical Methods of Operations Research 44, 205–221 (1996). https://doi.org/10.1007/BF01194331

Download citation

  • Received:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF01194331

Key words

Navigation