Advertisement

Efficient Policies for Stationary Possibilistic Markov Decision Processes

  • Nahla Ben AmorEmail author
  • Zeineb EL khalfiEmail author
  • Hélène FargierEmail author
  • Régis SabaddinEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10369)

Abstract

Possibilistic Markov Decision Processes offer a compact and tractable way to represent and solve problems of sequential decision under qualitative uncertainty. Even though appealing for its ability to handle qualitative problems, this model suffers from the drowning effect that is inherent to possibilistic decision theory. The present paper proposes to escape the drowning effect by extending to stationary possibilistic MDPs the lexicographic preference relations defined in [6] for non-sequential decision problems and provides a value iteration algorithm to compute policies that are optimal for these new criteria.

Keywords

Markov Decision Process Possibility theory Lexicographic comparisons Possibilistic qualitative utilities 

References

  1. 1.
    Bauters, K., Liu, W., Godo, L.: Anytime algorithms for solving possibilistic MDPs and hybrid MDPs. In: Gyssens, M., Simari, G. (eds.) FoIKS 2016. LNCS, vol. 9616, pp. 24–41. Springer, Cham (2016)CrossRefGoogle Scholar
  2. 2.
    Bellman, R.: A Markovian decision process. J. Math. Mech. 6, 679–684 (1957)MathSciNetzbMATHGoogle Scholar
  3. 3.
    Ben Amor, N., El Khalfi, Z., Fargier, H., Sabbadin, R.: Lexicographic refinements in possibilistic decision trees. In: Proceedings ECAI 2016, pp. 202–208 (2016)Google Scholar
  4. 4.
    Drougard, N., Teichteil-Konigsbuch, F., Farges, J.L., Dubois, D.: Qualitative possibilistic mixed-observable MDPs. In: Proceedings UAI 2013, pp. 192–201 (2013)Google Scholar
  5. 5.
    Dubois, D., Prade, H.: Possibility theory as a basis for qualitative decision theory. In: Proceedings IJCAI 1995, pp. 1925–1930 (1995)Google Scholar
  6. 6.
    Fargier, H., Sabbadin, R.: Qualitative decision under uncertainty: back to expected utility. Artif. Intell. 164, 245–280 (2005)MathSciNetCrossRefzbMATHGoogle Scholar
  7. 7.
    Gilbert, H., Weng, P.: Quantile reinforcement learning. In: Proceedings JMLR 2016, pp. 1–16 (2016)Google Scholar
  8. 8.
    Gilbert, H., Weng, P., Xu, Y.: Optimizing quantiles in preference-based Markov decision processes. In: Proceedings AAAI 2017, pp. 3569–3575 (2017)Google Scholar
  9. 9.
    Montes, I., Miranda, E., Montes, S.: Decision making with imprecise probabilities and utilities by means of statistical preference and stochastic dominance. Eur. J. Oper. Res. 234(1), 209–220 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    Moulin, H.: Axioms of Cooperative Decision Making. Cambridge University Press, Cambridge (1988)CrossRefzbMATHGoogle Scholar
  11. 11.
    Puterman, M.L.: Markov Decision Processes. Wiley, Hoboken (1994)CrossRefzbMATHGoogle Scholar
  12. 12.
    Sabbadin, R.: Possibilistic Markov decision processes. Eng. Appl. Artif. Intell. 14, 287–300 (2001)CrossRefGoogle Scholar
  13. 13.
    Sabbadin, R., Fargier, H.: Towards qualitative approaches to multi-stage decision making. Int. J. Approximate Reasoning 19, 441–471 (1998)MathSciNetCrossRefzbMATHGoogle Scholar
  14. 14.
    Sutton, R.S., Barto, A.G.: Introduction to Reinforcement Learning. MIT Press, Cambridge (1998)Google Scholar
  15. 15.
    Szörényi, B., Busa-Fekete, R., Weng, P., Hüllermeier, E.: Qualitative multi-armed bandits: a quantile-based approach. In: Proceedings ICML 2015, pp. 1660–1668 (2015)Google Scholar
  16. 16.
    Weng, P.: Qualitative decision making under possibilistic uncertainty: toward more discriminating criteria. In: Proceedings UAI 2005, pp. 615–622 (2005)Google Scholar
  17. 17.
    Weng, P.: Markov decision processes with ordinal rewards: reference point-based preferences. In: Proceedings ICAPS 2011, pp. 282–289 (2011)Google Scholar
  18. 18.
    Yue, Y., Broder, J., Kleinberg, R., Joachims, T.: The k-armed dueling bandits problem. J. Comput. Syst. Sci. 78(5), 1538–1556 (2012)MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.LARODECLe BardoTunisie
  2. 2.IRITToulouseFrance
  3. 3.INRA-MIATToulouseFrance

Personalised recommendations