Notes on equivalent stationary policies in Markov decision processes with total rewards

Feinberg, Eugene A.; Sonin, Isaac M.

doi:10.1007/BF01194331

Notes on equivalent stationary policies in Markov decision processes with total rewards

Published: June 1996

Volume 44, pages 205–221, (1996)
Cite this article

Mathematical Methods of Operations Research Aims and scope Submit manuscript

Eugene A. Feinberg¹ &
Isaac M. Sonin²

187 Accesses
9 Citations
Explore all metrics

Abstract

We construct examples of Markov Decision Processes for which, for a given initial state and for a given nonstationary transient policy, there is no equivalent (randomized) stationary policy, i.e. there is no stationary policy which occupation measure is equal to the occupation measure of a given policy. We also investigate the relation between the existence of equivalent stationary policies in special models and the existence of equivalent strategies in various classes of nonstationary policies in general models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On the Expected Total Reward with Unbounded Returns for Markov Decision Processes

Article 23 October 2018

Average cost criterion induced by the regular utility function for continuous-time Markov decision processes

Article 20 February 2017

Constrained Optimality for First Passage Criteria in Semi-Markov Decision Processes

References

Altman E (1994) Denumerable constrained Markov decision processes and finite approximations. Math Oper Res 19:169–191
Google Scholar
Altman E (1996) Constrained Markov decision processes with total cost criteria: occupation measures and primal LP. Zor — Mathematical Methods of Operations Research 43:45–72
Google Scholar
Altman E, Shwartz A (1991) Markov decision problems are state-action frequencies. SIAM J Control and Optimization 29:786–809
Google Scholar
Bertsekas D (1987) Dynamic programming. Prentice-Hall. Englewood Cliffs
Google Scholar
Borkar V (1988) A convex analytic approach to Markov decision process. Probab Th Rel Fields 78:583–602
Google Scholar
Chung K (1960) Markov chains with stationary transition probabilities. Springer-Verlag, Berlin
Google Scholar
Derman C (1970) Finite state Markovian decision processes. Academic Press, New York
Google Scholar
Derman C, Strauch R (1966) A note on memoryless rules for controlling sequential control processes. Ann Math Stat 37:276–278
Google Scholar
Derman C, Veinott A (1972) Constrained Markov decision chains. Management Sci 19:389–390
Google Scholar
Dynkin E, Yushkevich A (1979) Controlled Markov processes. Springer-Verlag, Berlin
Google Scholar
Feinberg E (1982) Controlled Markov processes with arbitrary numerical criteria. Theory Probab Appl 27:486–503
Google Scholar
Feinberg E (1986) Sufficient classes of strategies in discrete dynamic programming I: Decomposition of randomized strategies and embedded models. Theory Probab Appl 31:658–668
Google Scholar
Feinberg E (1987) Sufficient classes of strategies in discrete dynamic programming II: Locally stationary strategies. Theory Probab Appl 32:435–448
Google Scholar
Feinberg E (1991) Non-randomized strategies in stochastic decision processes. Ann Oper Res 29:315–332
Google Scholar
Feinberg E, Park H (1994) Finite state Markov decision models with average reward criteria. Stochastic Process Appl 49:159–177
Google Scholar
Feinberg E, Sonin I (1985) Persistently nearly optimal strategies in stochastic dynamic programming. In: Krylov N et al. (eds) Statistics and control of stochastic processes, Steklov Seminar, 1984, Optimization Software, New York, 69–101
Google Scholar
Heyman D, Sobel M (1984) Stochastic models in operations research, Volume II: Stochastic optimization. McGraw-Hill, New York
Google Scholar
Hill T (1979) On the existence of good Markov strategies. Trans Amer Math Soc 249:157–176
Google Scholar
Hill T, Pestien V (1987) The existence of good Markov strategies for decision processes with general payoffs. Stochastic Process Appl 24:61–76
Google Scholar
Hordijk A (1974) Dynamic programming and Markov potential theory. Math Centre Tracts 51, Mathematisch Centrum, Amsterdam
Google Scholar
Krylov N (1985) Once more about the connection between elliptic operators and Ito's stochastic equations. In: Krylov N et al. (eds), Statistics and control of stochastic processes, Steklov Seminar, 1984, Optimization Software, New York, 214–229
Google Scholar
Neveu J (1965) Mathematical foundations of the calculus of probability. Holden-Day, San Francisco
Google Scholar
Pestien V, Wang X (1993) Finite-stage reward functions having the Markov adequacy property. Stochastic Process Appl 46:129–151
Google Scholar
Puterman M (1994) Markov decision processes. John Wiley & Sons, New York
Google Scholar
Shiryayev A (1978) Optimal stopping rules. Springer-Verlag, Berlin
Google Scholar
Sonin I (1991) On an extremal property of Markov chains and sufficiency of Markov strategies in Markov decision processes with the Dubins-Savage criterion. Ann Oper Res 29:417–426
Google Scholar
Strauch R (1966) Negative dynamic programming. Ann Math Stat 37:871–890
Google Scholar
van der Wal J (1981) Stochastic dynamic programming. Math Centre Tracts 139, Mathematisch Centrum, Amsterdam
Google Scholar

Download references

Author information

Authors and Affiliations

W. A. Harriman School for Management and Policy, SUNY at Stony Brook, 11794-3775, Stony Brook, NY, USA
Eugene A. Feinberg
Department of Mathematics, University of North Carolina at Charlotte, 28223, Chartlotte, NC, USA
Isaac M. Sonin

Authors

Eugene A. Feinberg
View author publications
You can also search for this author in PubMed Google Scholar
Isaac M. Sonin
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Feinberg, E.A., Sonin, I.M. Notes on equivalent stationary policies in Markov decision processes with total rewards. Mathematical Methods of Operations Research 44, 205–221 (1996). https://doi.org/10.1007/BF01194331

Download citation

Received: 15 May 1995
Issue Date: June 1996
DOI: https://doi.org/10.1007/BF01194331

Key words

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes on equivalent stationary policies in Markov decision processes with total rewards

Abstract

Access this article

Similar content being viewed by others

On the Expected Total Reward with Unbounded Returns for Markov Decision Processes

Average cost criterion induced by the regular utility function for continuous-time Markov decision processes

Constrained Optimality for First Passage Criteria in Semi-Markov Decision Processes

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Key words

Navigation

Notes on equivalent stationary policies in Markov decision processes with total rewards

Abstract

Access this article

Similar content being viewed by others

On the Expected Total Reward with Unbounded Returns for Markov Decision Processes

Average cost criterion induced by the regular utility function for continuous-time Markov decision processes

Constrained Optimality for First Passage Criteria in Semi-Markov Decision Processes

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Key words

Search

Navigation