On the construction of ε-optimal strategies in partially observed MDPs

Runggaldier, Wolfgang J.

doi:10.1007/BF02055576

On the construction of ε-optimal strategies in partially observed MDPs

Published: December 1991

Volume 28, pages 81–95, (1991)
Cite this article

Annals of Operations Research Aims and scope Submit manuscript

Wolfgang J. Runggaldier¹

76 Accesses
5 Citations
Explore all metrics

Abstract

The purpose of the paper is to give a survey of methods, partly derived by the author in joint work with other researchers, concerning the problem of constructingε-optimal strategies for partially observable MDPs. The methods basically consist in transforming the problem into one of approximation: Starting from the original problem a sequence of approximating problems is constructed such that:

(i)
For each approximating problem an optimal strategy can actually be computed.
(ii)
Givenε>0, there exists an approximating problem such that the optimal strategy for the latter isε-optimal for the original problem.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Simple Strategies in Multi-Objective MDPs

Online Learning in Markov Decision Processes with Continuous Actions

MIDAS: A mixed integer dynamic approximation scheme

Article 08 February 2019

References

A. Bensoussan and W.J. Runggaldier, An approximation method for stochastic control problems with partial observation of the state — a method for constructingε-optimal controls, Acta Appl. Math. 10 (1987) 145–170.
Google Scholar
D.P. Bertsekas, Convergence of discretization procedures in dynamic programming, IEEE Trans. Automat. Control AC-20 (1975) 415–419.
Google Scholar
M.H.A. Davis and S.I. Marcus, An introduction to nonlinear filtering, in:Stochastic Systems: the Mathematics of Filtering and Identification and Applications, eds. M. Hazewinkel and J.C. Willems (D. Reidel, 1981) pp. 53–75.
G.B. Di Masi and W.J. Runggaldier, An approach to discrete-time stochastic control problems under partial observation, SIAM J. Control Optim. 25 (1987) 38–48.
Google Scholar
G.B. Di Masi, W.J. Runggaldier and B. Armellin, On recursive approximations with error bounds in nonlinear filtering, in:Stochastic Optimization, eds. V.I. Arkin, A. Shiryayev and R. Wets, LN in Contr. and Info. Sci., IIASA 81, (Springer, 1986) pp. 127–136.
I.V. Girsanov, On transforming a certain class of stochastic processes by absolutely continuous substitution of measures, Theory Probab. Appl. 5 (1960) 285–301.
Google Scholar
K. Hinderer, On approximate solutions of finite-stage dynamic programs, in:Dynamic Programming and its Applications, ed. M. Puterman (Academic Press, 1979) pp. 289–317.
H.J. Kushner,Probability Methods for Approximations in Stochastic Control and for Elliptic Equations (Academic Press, 1977).
H.J. Kushner, Numerical methods for stochastic control problems in continuous time, Lefschetz Center for Dynamical Systems Report (July 1988), to appear also as invited survey in SIAM J. Control Optim. Also: SIAM J. Control Optim. 28 (1990) 999–1048.
Google Scholar
G.E. Monahan, A survey of partially observable Markov decision processes: theory, models, and algorithms, Manag. Sci. 28 (1982) 1–16.
Google Scholar
W.J. Runggaldier and O. Zane, Approximations for discrete-time adaptive control. Construction of ε-optimal controls, to appear in Math. Control, Signals, and Systems.
W.J. Runggaldier and L. Stettner, On the construction of nearly optimal strategies for a general problem of control of partially observed diffusions, IMPAN (Institute of Mathematics, Polish Academy of Sciences) Preprint No. 450 (1989). To appear in Stochastics and Stoch. Rep.
J. Satia and R. Lave, Markovian decision processes with probabilistic observation of states, Manag. Sci. 20 (1973) 1–13.
Google Scholar
R.D. Smallwood and E.J. Sondik, The optimal control of partially observable Markov processes over a finite horizon, Oper. Res. 21 (1973) 1071–1088.
Article Google Scholar
E.J. Sondik, The optimal control of partially observable Markov processes over the infinite horizon: discounted costs, Oper. Res. 26 (1978) 282–304.
Article Google Scholar
C.C. White III and W.T. Scherer, Solution procedures for partially observed Markov decision processes, Oper. Res. 37 (1989) 791–797.
Article Google Scholar
W. Whitt, Approximations of dynamic programs I, Math. Oper. Res. 3 (1978) 231–243.
Google Scholar

Download references

Author information

Authors and Affiliations

Dipartimento di Matematica Pura ed Applicata, Universitá di Padova, Via Belzoni 7, 35131, Padova, Italy
Wolfgang J. Runggaldier

Authors

Wolfgang J. Runggaldier
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Runggaldier, W.J. On the construction of ε-optimal strategies in partially observed MDPs. Ann Oper Res 28, 81–95 (1991). https://doi.org/10.1007/BF02055576

Download citation

Issue Date: December 1991
DOI: https://doi.org/10.1007/BF02055576

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On the construction of ε-optimal strategies in partially observed MDPs

Abstract

Access this article

Similar content being viewed by others

Simple Strategies in Multi-Objective MDPs

Online Learning in Markov Decision Processes with Continuous Actions

MIDAS: A mixed integer dynamic approximation scheme

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

Navigation

On the construction of ε-optimal strategies in partially observed MDPs

Abstract

Access this article

Similar content being viewed by others

Simple Strategies in Multi-Objective MDPs

Online Learning in Markov Decision Processes with Continuous Actions

MIDAS: A mixed integer dynamic approximation scheme

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation