Efficient Policies for Stationary Possibilistic Markov Decision Processes

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10369)

Abstract

Possibilistic Markov Decision Processes offer a compact and tractable way to represent and solve problems of sequential decision under qualitative uncertainty. Even though appealing for its ability to handle qualitative problems, this model suffers from the drowning effect that is inherent to possibilistic decision theory. The present paper proposes to escape the drowning effect by extending to stationary possibilistic MDPs the lexicographic preference relations defined in [6] for non-sequential decision problems and provides a value iteration algorithm to compute policies that are optimal for these new criteria.

Keywords

Markov Decision Process Possibility theory Lexicographic comparisons Possibilistic qualitative utilities 

References

  1. 1.
    Bauters, K., Liu, W., Godo, L.: Anytime algorithms for solving possibilistic MDPs and hybrid MDPs. In: Gyssens, M., Simari, G. (eds.) FoIKS 2016. LNCS, vol. 9616, pp. 24–41. Springer, Cham (2016)CrossRefGoogle Scholar
  2. 2.
    Bellman, R.: A Markovian decision process. J. Math. Mech. 6, 679–684 (1957)MathSciNetMATHGoogle Scholar
  3. 3.
    Ben Amor, N., El Khalfi, Z., Fargier, H., Sabbadin, R.: Lexicographic refinements in possibilistic decision trees. In: Proceedings ECAI 2016, pp. 202–208 (2016)Google Scholar
  4. 4.
    Drougard, N., Teichteil-Konigsbuch, F., Farges, J.L., Dubois, D.: Qualitative possibilistic mixed-observable MDPs. In: Proceedings UAI 2013, pp. 192–201 (2013)Google Scholar
  5. 5.
    Dubois, D., Prade, H.: Possibility theory as a basis for qualitative decision theory. In: Proceedings IJCAI 1995, pp. 1925–1930 (1995)Google Scholar
  6. 6.
    Fargier, H., Sabbadin, R.: Qualitative decision under uncertainty: back to expected utility. Artif. Intell. 164, 245–280 (2005)MathSciNetCrossRefMATHGoogle Scholar
  7. 7.
    Gilbert, H., Weng, P.: Quantile reinforcement learning. In: Proceedings JMLR 2016, pp. 1–16 (2016)Google Scholar
  8. 8.
    Gilbert, H., Weng, P., Xu, Y.: Optimizing quantiles in preference-based Markov decision processes. In: Proceedings AAAI 2017, pp. 3569–3575 (2017)Google Scholar
  9. 9.
    Montes, I., Miranda, E., Montes, S.: Decision making with imprecise probabilities and utilities by means of statistical preference and stochastic dominance. Eur. J. Oper. Res. 234(1), 209–220 (2014)MathSciNetCrossRefMATHGoogle Scholar
  10. 10.
    Moulin, H.: Axioms of Cooperative Decision Making. Cambridge University Press, Cambridge (1988)CrossRefMATHGoogle Scholar
  11. 11.
    Puterman, M.L.: Markov Decision Processes. Wiley, Hoboken (1994)CrossRefMATHGoogle Scholar
  12. 12.
    Sabbadin, R.: Possibilistic Markov decision processes. Eng. Appl. Artif. Intell. 14, 287–300 (2001)CrossRefGoogle Scholar
  13. 13.
    Sabbadin, R., Fargier, H.: Towards qualitative approaches to multi-stage decision making. Int. J. Approximate Reasoning 19, 441–471 (1998)MathSciNetCrossRefMATHGoogle Scholar
  14. 14.
    Sutton, R.S., Barto, A.G.: Introduction to Reinforcement Learning. MIT Press, Cambridge (1998)Google Scholar
  15. 15.
    Szörényi, B., Busa-Fekete, R., Weng, P., Hüllermeier, E.: Qualitative multi-armed bandits: a quantile-based approach. In: Proceedings ICML 2015, pp. 1660–1668 (2015)Google Scholar
  16. 16.
    Weng, P.: Qualitative decision making under possibilistic uncertainty: toward more discriminating criteria. In: Proceedings UAI 2005, pp. 615–622 (2005)Google Scholar
  17. 17.
    Weng, P.: Markov decision processes with ordinal rewards: reference point-based preferences. In: Proceedings ICAPS 2011, pp. 282–289 (2011)Google Scholar
  18. 18.
    Yue, Y., Broder, J., Kleinberg, R., Joachims, T.: The k-armed dueling bandits problem. J. Comput. Syst. Sci. 78(5), 1538–1556 (2012)MathSciNetCrossRefMATHGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.LARODECLe BardoTunisie
  2. 2.IRITToulouseFrance
  3. 3.INRA-MIATToulouseFrance

Personalised recommendations