Abstract
We construct examples of Markov Decision Processes for which, for a given initial state and for a given nonstationary transient policy, there is no equivalent (randomized) stationary policy, i.e. there is no stationary policy which occupation measure is equal to the occupation measure of a given policy. We also investigate the relation between the existence of equivalent stationary policies in special models and the existence of equivalent strategies in various classes of nonstationary policies in general models.
Similar content being viewed by others
References
Altman E (1994) Denumerable constrained Markov decision processes and finite approximations. Math Oper Res 19:169–191
Altman E (1996) Constrained Markov decision processes with total cost criteria: occupation measures and primal LP. Zor — Mathematical Methods of Operations Research 43:45–72
Altman E, Shwartz A (1991) Markov decision problems are state-action frequencies. SIAM J Control and Optimization 29:786–809
Bertsekas D (1987) Dynamic programming. Prentice-Hall. Englewood Cliffs
Borkar V (1988) A convex analytic approach to Markov decision process. Probab Th Rel Fields 78:583–602
Chung K (1960) Markov chains with stationary transition probabilities. Springer-Verlag, Berlin
Derman C (1970) Finite state Markovian decision processes. Academic Press, New York
Derman C, Strauch R (1966) A note on memoryless rules for controlling sequential control processes. Ann Math Stat 37:276–278
Derman C, Veinott A (1972) Constrained Markov decision chains. Management Sci 19:389–390
Dynkin E, Yushkevich A (1979) Controlled Markov processes. Springer-Verlag, Berlin
Feinberg E (1982) Controlled Markov processes with arbitrary numerical criteria. Theory Probab Appl 27:486–503
Feinberg E (1986) Sufficient classes of strategies in discrete dynamic programming I: Decomposition of randomized strategies and embedded models. Theory Probab Appl 31:658–668
Feinberg E (1987) Sufficient classes of strategies in discrete dynamic programming II: Locally stationary strategies. Theory Probab Appl 32:435–448
Feinberg E (1991) Non-randomized strategies in stochastic decision processes. Ann Oper Res 29:315–332
Feinberg E, Park H (1994) Finite state Markov decision models with average reward criteria. Stochastic Process Appl 49:159–177
Feinberg E, Sonin I (1985) Persistently nearly optimal strategies in stochastic dynamic programming. In: Krylov N et al. (eds) Statistics and control of stochastic processes, Steklov Seminar, 1984, Optimization Software, New York, 69–101
Heyman D, Sobel M (1984) Stochastic models in operations research, Volume II: Stochastic optimization. McGraw-Hill, New York
Hill T (1979) On the existence of good Markov strategies. Trans Amer Math Soc 249:157–176
Hill T, Pestien V (1987) The existence of good Markov strategies for decision processes with general payoffs. Stochastic Process Appl 24:61–76
Hordijk A (1974) Dynamic programming and Markov potential theory. Math Centre Tracts 51, Mathematisch Centrum, Amsterdam
Krylov N (1985) Once more about the connection between elliptic operators and Ito's stochastic equations. In: Krylov N et al. (eds), Statistics and control of stochastic processes, Steklov Seminar, 1984, Optimization Software, New York, 214–229
Neveu J (1965) Mathematical foundations of the calculus of probability. Holden-Day, San Francisco
Pestien V, Wang X (1993) Finite-stage reward functions having the Markov adequacy property. Stochastic Process Appl 46:129–151
Puterman M (1994) Markov decision processes. John Wiley & Sons, New York
Shiryayev A (1978) Optimal stopping rules. Springer-Verlag, Berlin
Sonin I (1991) On an extremal property of Markov chains and sufficiency of Markov strategies in Markov decision processes with the Dubins-Savage criterion. Ann Oper Res 29:417–426
Strauch R (1966) Negative dynamic programming. Ann Math Stat 37:871–890
van der Wal J (1981) Stochastic dynamic programming. Math Centre Tracts 139, Mathematisch Centrum, Amsterdam
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Feinberg, E.A., Sonin, I.M. Notes on equivalent stationary policies in Markov decision processes with total rewards. Mathematical Methods of Operations Research 44, 205–221 (1996). https://doi.org/10.1007/BF01194331
Received:
Issue Date:
DOI: https://doi.org/10.1007/BF01194331