Abstract
The purpose of the paper is to give a survey of methods, partly derived by the author in joint work with other researchers, concerning the problem of constructingε-optimal strategies for partially observable MDPs. The methods basically consist in transforming the problem into one of approximation: Starting from the original problem a sequence of approximating problems is constructed such that:
-
(i)
For each approximating problem an optimal strategy can actually be computed.
-
(ii)
Givenε>0, there exists an approximating problem such that the optimal strategy for the latter isε-optimal for the original problem.
Similar content being viewed by others
References
A. Bensoussan and W.J. Runggaldier, An approximation method for stochastic control problems with partial observation of the state — a method for constructingε-optimal controls, Acta Appl. Math. 10 (1987) 145–170.
D.P. Bertsekas, Convergence of discretization procedures in dynamic programming, IEEE Trans. Automat. Control AC-20 (1975) 415–419.
M.H.A. Davis and S.I. Marcus, An introduction to nonlinear filtering, in:Stochastic Systems: the Mathematics of Filtering and Identification and Applications, eds. M. Hazewinkel and J.C. Willems (D. Reidel, 1981) pp. 53–75.
G.B. Di Masi and W.J. Runggaldier, An approach to discrete-time stochastic control problems under partial observation, SIAM J. Control Optim. 25 (1987) 38–48.
G.B. Di Masi, W.J. Runggaldier and B. Armellin, On recursive approximations with error bounds in nonlinear filtering, in:Stochastic Optimization, eds. V.I. Arkin, A. Shiryayev and R. Wets, LN in Contr. and Info. Sci., IIASA 81, (Springer, 1986) pp. 127–136.
I.V. Girsanov, On transforming a certain class of stochastic processes by absolutely continuous substitution of measures, Theory Probab. Appl. 5 (1960) 285–301.
K. Hinderer, On approximate solutions of finite-stage dynamic programs, in:Dynamic Programming and its Applications, ed. M. Puterman (Academic Press, 1979) pp. 289–317.
H.J. Kushner,Probability Methods for Approximations in Stochastic Control and for Elliptic Equations (Academic Press, 1977).
H.J. Kushner, Numerical methods for stochastic control problems in continuous time, Lefschetz Center for Dynamical Systems Report (July 1988), to appear also as invited survey in SIAM J. Control Optim. Also: SIAM J. Control Optim. 28 (1990) 999–1048.
G.E. Monahan, A survey of partially observable Markov decision processes: theory, models, and algorithms, Manag. Sci. 28 (1982) 1–16.
W.J. Runggaldier and O. Zane, Approximations for discrete-time adaptive control. Construction of ε-optimal controls, to appear in Math. Control, Signals, and Systems.
W.J. Runggaldier and L. Stettner, On the construction of nearly optimal strategies for a general problem of control of partially observed diffusions, IMPAN (Institute of Mathematics, Polish Academy of Sciences) Preprint No. 450 (1989). To appear in Stochastics and Stoch. Rep.
J. Satia and R. Lave, Markovian decision processes with probabilistic observation of states, Manag. Sci. 20 (1973) 1–13.
R.D. Smallwood and E.J. Sondik, The optimal control of partially observable Markov processes over a finite horizon, Oper. Res. 21 (1973) 1071–1088.
E.J. Sondik, The optimal control of partially observable Markov processes over the infinite horizon: discounted costs, Oper. Res. 26 (1978) 282–304.
C.C. White III and W.T. Scherer, Solution procedures for partially observed Markov decision processes, Oper. Res. 37 (1989) 791–797.
W. Whitt, Approximations of dynamic programs I, Math. Oper. Res. 3 (1978) 231–243.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Runggaldier, W.J. On the construction of ε-optimal strategies in partially observed MDPs. Ann Oper Res 28, 81–95 (1991). https://doi.org/10.1007/BF02055576
Issue Date:
DOI: https://doi.org/10.1007/BF02055576