Abstract
We consider several applications of two state, finite action, infinite horizon, discrete-time Markov decision processes with partial observations, for two special cases of observation quality, and show that in each of these cases the optimal cost function is piecewise linear. This in turn allows us to obtain either explicit formulas or simplified algorithms to compute the optimal cost function and the associated optimal control policy. Several examples are presented.
Similar content being viewed by others
References
S.C. Albright, Structural results for partially observable Markov decision processes, Oper. Res. 27 (1979) 1041–1053.
V.A. Andriyanov, I.A. Kogan and G.A. Umnov, Optimal control of a partially observable discrete Markov process, Autom. Remote Contr. 4 (1980) 555–561.
K.J. Āström, Optimal control of Markov processes with incomplete state information, J. Math. Anal. Appl. 10 (1965) 174–205.
A. Ben-Israel and S.D. Flam, Input optimization for infinite discounted programs, J. Optim. Theory Appl. 61 (1989) 347–357.
D.P. Bertsekas,Dynamic Programming (Prentice Hall, Englewood Cliffs, NJ, 1987).
D.P. Bertsekas and S.E. Shreve,Stochastic Optimal Control: The Discrete Time Case (Academic Press, New York, 1978).
A. Federgruen and P.J. Schweitzer, Discounted and undiscounted value iteration in Markov decision problems: A survey, in:Dynamic Programming and its Applications, ed. M. Puterman (Academic Press, 1979) pp. 23–52.
E. Fernandez-Gaucherand, A. Araposthatis and S.I. Marcus, On partially observable Markov decision processes with an average cost criterion,Proc. 28th IEEE Conf. on Decision and Control, Tampa, Florida (1989) 1267–1272.
E. Fernandez-Gaucherand, A. Arapostathis and S.I. Marcus, On the average cost optimality equation and the structure of optimal policies for partially observable Markov decision processes, this volume.
A. Gheorghe, Partially observable Markov processes with a risk sensitivity decision maker, Rev. Roumaine Math. Pures Appl. 22 (1977) 461–482.
W.J. Hopp and S.C. Wu, Multiaction maintenance under Markovian deterioration and incomplete state information, Naval Res. Log. Quart. 35 (1988) 447–462.
J.S. Hughes, Optimal internal audit timing, The Accounting Review 52 (1977) 56–68.
J.S. Hughes, A note on quality control under Markovian deterioration, Oper. Res. 28 (1980) 421–424.
S.H. Kim, State information lag Markov process with control limit rule, Naval Res. Log. Quart. 32 (1985) 491–496.
S.H. Kim and B.H. Jeong, A partially observable Markov decision process with lagged information, J. Oper. Res. Soc. 38 (1987) 439–446.
P.R. Kumar and T.I. Seidman, On the optimal solution of the one armed bandit adaptive control problem, IEEE Trans. Automatic Control, 26 (1981) 1176–1184.
J.J. Martin,Bayesian Decision Problems and Markov Chains (Wiley, New York, 1967).
G. Monahan, Optimal stopping in a partially observable binary-valued Markov chain with costly perfect information, J. Appl. Prob. 19 (1982) 72–81.
S.M. Pollock, Minimum-cost checking using imperfect information, Management Sci. 13 (1967) 454–465.
S.M. Ross, Quality control under Markovian deterioration, Management Sci. 17 (1971) 587–596.
J.K. Satia and R.E. Lave, Markovian decision processes with probabilistic observation of states, Management Sci. 20 (1973) 1–13.
K. Sawaki and A. Ichikawa, Optimal control for partially observable Markov decision processes over an infinite horizon, J. Oper. Res. Soc. Japan 21 (1978) 1–15.
K. Sawaki, Transformation of partially observable Markov decision processes into piecewise linear ones, J. Math. Anal. Appl. 91 (1983) 112–118.
E.L. Sernik and S.I. Marcus, Comments on the sensitivity of the optimal cost and the optimal policy for a discrete Markov decision process,Proc. 27th Annual Allerton Conf. on Communication, Control and Computing, Monticello, Illinois (1989) pp. 935–944.
E.L. Sernik and S.I. Marcus, On the optimal cost and policy for a Markovian replacement problem (1990), to appear in J. Optim. Theory Appl.
E.J. Sondik, The optimal control of partially observable Markov processes, Ph. D. Thesis, Department of Electrical Engineering Systems, Stanford University (1971).
E.J. Sondik, The optimal control of partially observable Markov processes over the infinite horizon: Discounted costs, Oper. Res. 26 (1978) 282–304.
L.C. Thomas, P.A. Jacobs and D.P. Gaver, Optimal inspection policies for standby systems, Comm. Stat. — Stochastic Models 3 (1987) 259–273.
R.C. Wang, Computing optimal quality control policies — Two actions, J. Appl. Prob. 13 (1976) 826–832.
R.C. Wang, Optimal replacement policy with unobservable states, J. Appl. Prob. 14 (1977) 340–348.
C.C. White, A Markov quality control process subject to partial observation, Management Sci. 23 (1977) 843–852.
C.C. White, Optimal inspection and repair of a production process subject to deterioration, J. Oper. Res. Soc. 29 (1978) 235–243.
C.C. White, Bounds on the optimal cost for a replacement problem with partial observations, Naval. Res. Log. Quart. 26 (1979) 415–422.
C.C. White, Note on “A partially observable Markov decision process with lagged information”, J. Oper. Res. Soc. 39 (1988) 217–218.
Author information
Authors and Affiliations
Additional information
Research supported in part by the Air Force Office of Scientific Research under Grant AFOSR-86-0029, in part by the National Science Foundation under Grant ECS-8617860, in part by the Advanced Technology Program of the State of Texas, and in part by the DoD Joint Services Electronics Program through the Air Force Office of Scientific Research (AFSC) Contract F49620-86-C-0045.
Rights and permissions
About this article
Cite this article
Sernik, E.L., Marcus, S.I. On the computation of the optimal cost function for discrete time Markov models with partial observations. Ann Oper Res 29, 471–511 (1991). https://doi.org/10.1007/BF02283611
Issue Date:
DOI: https://doi.org/10.1007/BF02283611