Skip to main content
Log in

On the computation of the optimal cost function for discrete time Markov models with partial observations

  • Partially Observable Markov Decision Processes
  • Published:
Annals of Operations Research Aims and scope Submit manuscript

Abstract

We consider several applications of two state, finite action, infinite horizon, discrete-time Markov decision processes with partial observations, for two special cases of observation quality, and show that in each of these cases the optimal cost function is piecewise linear. This in turn allows us to obtain either explicit formulas or simplified algorithms to compute the optimal cost function and the associated optimal control policy. Several examples are presented.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. S.C. Albright, Structural results for partially observable Markov decision processes, Oper. Res. 27 (1979) 1041–1053.

    Google Scholar 

  2. V.A. Andriyanov, I.A. Kogan and G.A. Umnov, Optimal control of a partially observable discrete Markov process, Autom. Remote Contr. 4 (1980) 555–561.

    Google Scholar 

  3. K.J. Āström, Optimal control of Markov processes with incomplete state information, J. Math. Anal. Appl. 10 (1965) 174–205.

    Google Scholar 

  4. A. Ben-Israel and S.D. Flam, Input optimization for infinite discounted programs, J. Optim. Theory Appl. 61 (1989) 347–357.

    Google Scholar 

  5. D.P. Bertsekas,Dynamic Programming (Prentice Hall, Englewood Cliffs, NJ, 1987).

    Google Scholar 

  6. D.P. Bertsekas and S.E. Shreve,Stochastic Optimal Control: The Discrete Time Case (Academic Press, New York, 1978).

    Google Scholar 

  7. A. Federgruen and P.J. Schweitzer, Discounted and undiscounted value iteration in Markov decision problems: A survey, in:Dynamic Programming and its Applications, ed. M. Puterman (Academic Press, 1979) pp. 23–52.

  8. E. Fernandez-Gaucherand, A. Araposthatis and S.I. Marcus, On partially observable Markov decision processes with an average cost criterion,Proc. 28th IEEE Conf. on Decision and Control, Tampa, Florida (1989) 1267–1272.

  9. E. Fernandez-Gaucherand, A. Arapostathis and S.I. Marcus, On the average cost optimality equation and the structure of optimal policies for partially observable Markov decision processes, this volume.

  10. A. Gheorghe, Partially observable Markov processes with a risk sensitivity decision maker, Rev. Roumaine Math. Pures Appl. 22 (1977) 461–482.

    Google Scholar 

  11. W.J. Hopp and S.C. Wu, Multiaction maintenance under Markovian deterioration and incomplete state information, Naval Res. Log. Quart. 35 (1988) 447–462.

    Google Scholar 

  12. J.S. Hughes, Optimal internal audit timing, The Accounting Review 52 (1977) 56–68.

    Google Scholar 

  13. J.S. Hughes, A note on quality control under Markovian deterioration, Oper. Res. 28 (1980) 421–424.

    Google Scholar 

  14. S.H. Kim, State information lag Markov process with control limit rule, Naval Res. Log. Quart. 32 (1985) 491–496.

    Google Scholar 

  15. S.H. Kim and B.H. Jeong, A partially observable Markov decision process with lagged information, J. Oper. Res. Soc. 38 (1987) 439–446.

    Google Scholar 

  16. P.R. Kumar and T.I. Seidman, On the optimal solution of the one armed bandit adaptive control problem, IEEE Trans. Automatic Control, 26 (1981) 1176–1184.

    Google Scholar 

  17. J.J. Martin,Bayesian Decision Problems and Markov Chains (Wiley, New York, 1967).

    Google Scholar 

  18. G. Monahan, Optimal stopping in a partially observable binary-valued Markov chain with costly perfect information, J. Appl. Prob. 19 (1982) 72–81.

    Google Scholar 

  19. S.M. Pollock, Minimum-cost checking using imperfect information, Management Sci. 13 (1967) 454–465.

    Google Scholar 

  20. S.M. Ross, Quality control under Markovian deterioration, Management Sci. 17 (1971) 587–596.

    Google Scholar 

  21. J.K. Satia and R.E. Lave, Markovian decision processes with probabilistic observation of states, Management Sci. 20 (1973) 1–13.

    Google Scholar 

  22. K. Sawaki and A. Ichikawa, Optimal control for partially observable Markov decision processes over an infinite horizon, J. Oper. Res. Soc. Japan 21 (1978) 1–15.

    Google Scholar 

  23. K. Sawaki, Transformation of partially observable Markov decision processes into piecewise linear ones, J. Math. Anal. Appl. 91 (1983) 112–118.

    Google Scholar 

  24. E.L. Sernik and S.I. Marcus, Comments on the sensitivity of the optimal cost and the optimal policy for a discrete Markov decision process,Proc. 27th Annual Allerton Conf. on Communication, Control and Computing, Monticello, Illinois (1989) pp. 935–944.

  25. E.L. Sernik and S.I. Marcus, On the optimal cost and policy for a Markovian replacement problem (1990), to appear in J. Optim. Theory Appl.

  26. E.J. Sondik, The optimal control of partially observable Markov processes, Ph. D. Thesis, Department of Electrical Engineering Systems, Stanford University (1971).

  27. E.J. Sondik, The optimal control of partially observable Markov processes over the infinite horizon: Discounted costs, Oper. Res. 26 (1978) 282–304.

    Google Scholar 

  28. L.C. Thomas, P.A. Jacobs and D.P. Gaver, Optimal inspection policies for standby systems, Comm. Stat. — Stochastic Models 3 (1987) 259–273.

    Google Scholar 

  29. R.C. Wang, Computing optimal quality control policies — Two actions, J. Appl. Prob. 13 (1976) 826–832.

    Google Scholar 

  30. R.C. Wang, Optimal replacement policy with unobservable states, J. Appl. Prob. 14 (1977) 340–348.

    Google Scholar 

  31. C.C. White, A Markov quality control process subject to partial observation, Management Sci. 23 (1977) 843–852.

    Google Scholar 

  32. C.C. White, Optimal inspection and repair of a production process subject to deterioration, J. Oper. Res. Soc. 29 (1978) 235–243.

    Google Scholar 

  33. C.C. White, Bounds on the optimal cost for a replacement problem with partial observations, Naval. Res. Log. Quart. 26 (1979) 415–422.

    Google Scholar 

  34. C.C. White, Note on “A partially observable Markov decision process with lagged information”, J. Oper. Res. Soc. 39 (1988) 217–218.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Additional information

Research supported in part by the Air Force Office of Scientific Research under Grant AFOSR-86-0029, in part by the National Science Foundation under Grant ECS-8617860, in part by the Advanced Technology Program of the State of Texas, and in part by the DoD Joint Services Electronics Program through the Air Force Office of Scientific Research (AFSC) Contract F49620-86-C-0045.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sernik, E.L., Marcus, S.I. On the computation of the optimal cost function for discrete time Markov models with partial observations. Ann Oper Res 29, 471–511 (1991). https://doi.org/10.1007/BF02283611

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02283611

Keywords

Navigation