Advertisement

Analysis of Markov Decision Processes Under Parameter Uncertainty

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10497)

Abstract

Markov Decision Processes (MDPs) are a popular decision model for stochastic systems. Introducing uncertainty in the transition probability distribution by giving upper and lower bounds for the transition probabilities yields the model of Bounded Parameter MDPs (BMDPs) which captures many practical situations with limited knowledge about a system or its environment. In this paper the class of BMDPs is extended to Bounded Parameter Semi Markov Decision Processes (BSMDPs). The main focus of the paper is on the introduction and numerical comparison of different algorithms to compute optimal policies for BMDPs and BSMDPs; specifically, we introduce and compare variants of value and policy iteration.

The paper delivers an empirical comparison between different numerical algorithms for BMDPs and BSMDPs, with an emphasis on the required solution time.

Keywords

(Bounded Parameter) (Semi-)Markov Decision Process Discounted reward Average reward Value iteration Policy iteration 

References

  1. 1.
    Analysis of Markov decision processes under parameter uncertainty online companion. http://ls4-www.cs.tu-dortmund.de/cms/de/home/dohndorf/publications/
  2. 2.
    Bertsekas, D.P.: Dynamic Programming and Optimal Control, vol. 2, 3rd edn. Athena Scientific (2005, 2007)Google Scholar
  3. 3.
    Beutler, F.J., Ross, K.W.: Uniformization for Semi-Markov decision processes under stationary policies. J. Appl. Probab. 24, 644–656 (1987)MathSciNetCrossRefzbMATHGoogle Scholar
  4. 4.
    Buchholz, P., Kriege, J., Felko, I.: Input Modeling with Phase-Type Distributions and Markov Models. SM. Springer, Cham (2014)CrossRefzbMATHGoogle Scholar
  5. 5.
    Chen, T., Hahn, E.M., Han, T., Kwiatkowska, M.Z., Qu, H., Zhang, L.: Model repair for Markov decision processes. In: TASE, pp. 85–92 (2013)Google Scholar
  6. 6.
    Cubuktepe, M., Jansen, N., Junges, S., Katoen, J., Papusha, I., Poonawala, H.A., Topcu, U.: Sequential convex programming for the efficient verification of parametric MDPs. CoRR, abs/1702.00063 (2017)Google Scholar
  7. 7.
    Delgado, K.V., de Barros, L.N., Cozman, F.G., Sanner, S.: Using mathematical programming to solve factored Markov decision processes with imprecise probabilities. Int. J. Approx. Reasoning 52(7), 1000–1017 (2011)MathSciNetCrossRefzbMATHGoogle Scholar
  8. 8.
    Delgado, K.V., Sanner, S., de Barros, L.N.: Efficient solutions to factored MDPs with imprecise transition probabilities. Artif. Intell. 175, 1498–1527 (2011)MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Filho, R.S., Cozman, F.G., Trevizan, F.W., de Campos, C.P., de Barros, L.N.: Multilinear and integer programming for Markov decision processes with imprecise probabilities. In: 5th Int. Symposium on Imprecise Porbability: Theories and Applications, Prague, Czech Republic, pp. 395–404 (2007)Google Scholar
  10. 10.
    Givan, R., Leach, S.M., Dean, T.L.: Bounded-parameter Markov decision processes. Artif. Intell. 122(1–2), 71–109 (2000)MathSciNetCrossRefzbMATHGoogle Scholar
  11. 11.
    Gross, D., Miller, D.: The randomization technique as a modeling tool and solution procedure for transient Markov processes. Oper. Res. 32, 343–361 (1984)MathSciNetCrossRefzbMATHGoogle Scholar
  12. 12.
    Hoffman, A.J., Karp, R.M.: On nonterminating stochastic games. Manage. Sci. 12(5), 359–370 (1966)MathSciNetCrossRefzbMATHGoogle Scholar
  13. 13.
    Kallenberg, L.: Markov decision processes. Lecture Notes, University Leiden (2011). https://www.math.leidenuniv.nl/~kallenberg/Lecture-notes-MDP.pdf
  14. 14.
    Müller, A., Stoyan, D.: Comparison Methods for Stochastic Models and Risks. Wiley, Chichester (2002)zbMATHGoogle Scholar
  15. 15.
    Puterman, M.L.: Markov Decision Processes. Wiley, New York (2005)zbMATHGoogle Scholar
  16. 16.
    Satia, J.K., Lave, R.E.: Markovian decision processes with uncertain transition probabilities. Oper. Res. 21(3), 728–740 (1973)MathSciNetCrossRefzbMATHGoogle Scholar
  17. 17.
    Serfozo, R.F.: An equivalence between continuous and discrete time Markov decision processes. Oper. Res. 27(3), 616–620 (1979)MathSciNetCrossRefzbMATHGoogle Scholar
  18. 18.
    Sigaud, O., Buffet, O. (eds.): Markov Decision Processes in Artificial Intelligence. Wiley-ISTE (2010)Google Scholar
  19. 19.
    Tewari, A., Bartlett, P.L.: Bounded parameter Markov decision processes with average reward criterion. In: Bshouty, N.H., Gentile, C. (eds.) COLT 2007. LNCS (LNAI), vol. 4539, pp. 263–277. Springer, Heidelberg (2007). doi: 10.1007/978-3-540-72927-3_20 CrossRefGoogle Scholar
  20. 20.
    White, C.C., Eldeib, H.K.: Markov decision processes with imprecise transition probabilities. Oper. Res. 42(4), 739–749 (1994)MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Department of Computer ScienceTU DortmundDortmundGermany

Personalised recommendations