Advertisement

Pure Stationary Optimal Strategies in Markov Decision Processes

  • Hugo Gimbert
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4393)

Abstract

Markov decision processes (MDPs) are controllable discrete event systems with stochastic transitions. Performances of an MDP are evaluated by a payoff function. The controller of the MDP seeks to optimize those performances, using optimal strategies.

There exists various ways of measuring performances, i.e. various classes of payoff functions. For example, average performances can be evaluated by a mean-payoff function, peak performances by a limsup payoff function, and the parity payoff function can be used to encode logical specifications.

Surprisingly, all the MDPs equipped with mean, limsup or parity payoff functions share a common non-trivial property: they admit pure stationary optimal strategies.

In this paper, we introduce the class of prefix-independent and submixing payoff functions, and we prove that any MDP equipped with such a payoff function admits pure stationary optimal strategies.

This result unifies and simplifies several existing proofs. Moreover, it is a key tool for generating new examples of MDPs with pure stationary optimal strategies.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [Bie87]
    Bierth, K.-J.: An expected average reward criterion. Stochastic Processes and Applications 26, 133–140 (1987)MathSciNetCrossRefGoogle Scholar
  2. [BS78]
    Bertsekas, D., Shreve, S.: Stochastic Optimal Control: The Discrete-Time Case. Academic Press, London (1978)zbMATHGoogle Scholar
  3. [BSV04]
    Björklund, H., Sandberg, S., Vorobyov, S.: Memoryless determinacy of parity and mean payoff games: a simple proof (2004)Google Scholar
  4. [Cha06]
    Chatterjee, K.: Concurrent games with tail objectives. In: CSL’06 (2006)Google Scholar
  5. [CHJ05]
    Chatterjee, K., Henzinger, T.A., Jurdzinski, M.: Mean-payoff parity games. In: LICS’05, pp. 178–187 (2005)Google Scholar
  6. [CMH06]
    Chatterjee, K., Majumdar, R., Henzinger, T.A.: Markov decision processes with multiple objectives. In: STACS’06, pp. 325–336 (2006)Google Scholar
  7. [CN06]
    Colcombet, T., Niwinski, D.: On the positional determinacy of edge-labeled games. Theor. Comput. Sci. 352(1-3), 190–196 (2006)CrossRefMathSciNetzbMATHGoogle Scholar
  8. [CY90]
    Courcoubetis, C., Yannakakis, M.: Markov decision processes and regular events. In: Paterson, M.S. (ed.) Automata, Languages and Programming. LNCS, vol. 443, pp. 336–349. Springer, Heidelberg (1990)CrossRefGoogle Scholar
  9. [dA97]
    de Alfaro, L.: Formal Verification of Probabilistic Systems. PhD thesis, Stanford University (Dec. 1997)Google Scholar
  10. [dA98]
    de Alfaro, L.: How to specify and verify the long-run average behavior of probabilistic systems. In: LICS, pp. 454–465 (1998)Google Scholar
  11. [FV97]
    Filar, J., Vrieze, K.: Competitive Markov Decision Processes. Springer, Heidelberg (1997)zbMATHGoogle Scholar
  12. [Gil57]
    Gilette, D.: Stochastic games with zero stop probabilities (1957)Google Scholar
  13. [Gim]
    Gimbert, H.: Pure stationary optimal strategies in Markov decision processes. http://www.lix.polytechnique.fr/~gimbert/recherche/mdp_gimbert.ps
  14. [Grä04]
    Grädel, E.: Positional determinacy of infinite games. In: Diekert, V., Habib, M. (eds.) STACS 2004. LNCS, vol. 2996, pp. 4–18. Springer, Heidelberg (2004)Google Scholar
  15. [GTW02]
    Grädel, E., Thomas, W., Wilke, T. (eds.): Automata, Logics, and Infinite Games. LNCS, vol. 2500. Springer, Heidelberg (2002)zbMATHGoogle Scholar
  16. [GZ04]
    Gimbert, H., Zielonka, W.: When can you play positionally? In: Fiala, J., Koubek, V., Kratochvíl, J. (eds.) MFCS 2004. LNCS, vol. 3153, pp. 686–697. Springer, Heidelberg (2004)Google Scholar
  17. [GZ05]
    Gimbert, H., Zielonka, W.: Games where you can play optimally without any memory. In: Abadi, M., de Alfaro, L. (eds.) CONCUR 2005. LNCS, vol. 3653, pp. 428–442. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  18. [GZ06]
    Gimbert, H., Zielonka, W.: Deterministic priority mean-payoff games as limits of discounted games. In: Bugliesi, M., et al. (eds.) ICALP 2006. LNCS, vol. 4052, Springer, Heidelberg (2006)CrossRefGoogle Scholar
  19. [Kop06]
    Kopczyński, E.: Half-positional determinacy of infinite games. In: Bugliesi, M., et al. (eds.) ICALP 2006. LNCS, vol. 4052, Springer, Heidelberg (2006)Google Scholar
  20. [MS96]
    Maitra, A.P., Sudderth, W.D.: Discrete gambling and stochastic games. Springer, Heidelberg (1996)zbMATHGoogle Scholar
  21. [NS03]
    Neyman, A., Sorin, S.: Stochastic games and applications, p. 2. Kluwer Academic Publishers, Dordrecht (2003)zbMATHGoogle Scholar
  22. [Put94]
    Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons, Inc., New York (1994)zbMATHGoogle Scholar
  23. [Sha53]
    Shapley, L.S.: Stochastic games. Proceedings of the National Academy of Science USA 39, 1095–1100 (1953)CrossRefMathSciNetzbMATHGoogle Scholar
  24. [Tho95]
    Thomas, W.: On the synthesis of strategies in infinite games. In: Mayr, E.W., Puech, C. (eds.) STACS 1995. LNCS, vol. 900, pp. 1–13. Springer, Heidelberg (1995)Google Scholar
  25. [TV87]
    Thuijsman, F., Vrieze, O.J.: The Bad Match, a total reward stochastic game, vol. 9, pp. 93–99 (1987)Google Scholar

Copyright information

© Springer Berlin Heidelberg 2007

Authors and Affiliations

  • Hugo Gimbert
    • 1
  1. 1.LIX, Ecole PolytechniqueFrance

Personalised recommendations