Graphical models for interactive POMDPs: representations and solutions

Article

Abstract

We develop new graphical representations for the problem of sequential decision making in partially observable multiagent environments, as formalized by interactive partially observable Markov decision processes (I-POMDPs). The graphical models called interactive influence diagrams (I-IDs) and their dynamic counterparts, interactive dynamic influence diagrams (I-DIDs), seek to explicitly model the structure that is often present in real-world problems by decomposing the situation into chance and decision variables, and the dependencies between the variables. I-DIDs generalize DIDs, which may be viewed as graphical representations of POMDPs, to multiagent settings in the same way that I-POMDPs generalize POMDPs. I-DIDs may be used to compute the policy of an agent given its belief as the agent acts and observes in a setting that is populated by other interacting agents. Using several examples, we show how I-IDs and I-DIDs may be applied and demonstrate their usefulness. We also show how the models may be solved using the standard algorithms that are applicable to DIDs. Solving I-DIDs exactly involves knowing the solutions of possible models of the other agents. The space of models grows exponentially with the number of time steps. We present a method of solving I-DIDs approximately by limiting the number of other agents’ candidate models at each time step to a constant. We do this by clustering models that are likely to be behaviorally equivalent and selecting a representative set from the clusters. We discuss the error bound of the approximation technique and demonstrate its empirical performance.

Keywords

Probabilistic graphical models Interactive POMDPs Sequential multiagent decision making 

References

  1. 1.
    Adam B., Dekel E. (1993) Hierarchies of beliefs and common knowledge. International Journal of Game Theory 59(1): 189–198MATHGoogle Scholar
  2. 2.
    Aumann R.J. (1999) Interactive epistemology i: Knowledge. International Journal of Game Theory, 28(3): 263–300MATHCrossRefMathSciNetGoogle Scholar
  3. 3.
    Boutilier, C. (1999). Sequential optimality and coordination in multiagent systems. In Sixteenth International Joint Conference on Artificial Intelligence (IJCAI) (pp. 478–485). Stockhom, Sweeden.Google Scholar
  4. 4.
    Boutilier, C., & Poole, D. (1996). Computing optimal policies for partially observable decision processes using compact representations. In Thirteenth Conference on Artificial Intelligence (AAAI) (pp. 1168–1175). Portland, Oregon.Google Scholar
  5. 5.
    Camerer C. (2003) Behavioral game theory: Experiments in strategic interaction. Princeton University Press, Princeton New JerseyMATHGoogle Scholar
  6. 6.
    Charnes J.M., Shenoy P. (2004) Multistage monte carlo methods for solving influence diagrams using local computation. Management Science 50(3): 405–418CrossRefGoogle Scholar
  7. 7.
    Dennett D. (1986) Intentional systems. MIT Press, BrainstormsGoogle Scholar
  8. 8.
    Doshi, P., & Gmytrasiewicz, P. J. (2005). Approximating state estimation in multiagent settings using particle filters. In Autonomous Agents and Multi-agent Systems Conference (AAMAS) (pp. 320–327). Utrecht, Netherlands.Google Scholar
  9. 9.
    Doshi, P., & Gmytrasiewicz, P. J. (2005). A particle filtering based approach to approximating interactive pomdps. In Twentieth Conference on Artificial Intelligence (AAAI) (pp. 969–974). Pittsburg, PA.Google Scholar
  10. 10.
    Fehr E., Gachter S. (2000) Cooperation and punishment in public good experiments. American Economic Review 90(4): 980–994Google Scholar
  11. 11.
    Fudenberg D., Levine D.K. (1998) The theory of learning in games. The MIT Press, Cambridge MAMATHGoogle Scholar
  12. 12.
    Fudenberg, D., & Tirole, J. (1991). Game theory. MIT Press.Google Scholar
  13. 13.
    Gal, Y., & Pfeffer, A. (2003). A language for modeling agent’s decision-making processes in games. In Autonomous Agents and Multi-Agents Systems Conference (AAMAS) (pp. 265–272). Melbourne, Australia.Google Scholar
  14. 14.
    Gmytrasiewicz P., Doshi P. (2005) A framework for sequential planning in multiagent settings. Journal of Artificial Intelligence Research (JAIR) 24: 49–79MATHGoogle Scholar
  15. 15.
    Gmytrasiewicz P.J., Durfee E.H. (2000) Rational coordination in multi-agent environments. Journal of Autonomous Agents and Multi-Agent Systems 3(4): 319–350CrossRefGoogle Scholar
  16. 16.
    Guestrin, C., Koller, D., & Parr, R. (2001). Solving factored pomdps with linear value functions. In Workshop on Planning under Uncertainty and Incomplete Information, IJCAI. Seattle, Washington.Google Scholar
  17. 17.
    Harsanyi J.C. (1967) Games with incomplete information played by bayesian players. Management Science 14(3): 159–182MATHCrossRefMathSciNetGoogle Scholar
  18. 18.
    Howard, R. A., & Matheson, J. E. (1984). Influence diagrams. In Readings on the Principles and Applications of Decision Analysis (pp. 721–762).Google Scholar
  19. 19.
    Kaelbling L., Littman M., Cassandra A. (1998) Planning and acting in partially observable stochastic domains. Artificial Intelligence Journal 101(1–2): 99–134MATHCrossRefMathSciNetGoogle Scholar
  20. 20.
    Koller, D., & Milch, B. (2001). Multi-agent influence diagrams for representing and solving games. In International Joint Conference on Artificial Intelligence (IJCAI) (pp. 1027–1034). Seattle, Washington.Google Scholar
  21. 21.
    Littman, M. (1994). Markov games as a framework for multiagent reinforcement learning. In International Conference on Machine Learning (ICML) (pp. 157–163). New Brunswick, New Jersey.Google Scholar
  22. 22.
    MacQueen J. (1967) Some methods for classification and analysis of multivariate observations. In: LeCam L.M., Neyman J.(eds) Proceedings of the Fifth Berkeley Symposium on Mathematics, Statistics, and Probablity. UC Press, Berkeley, CA, pp 281–297Google Scholar
  23. 23.
    Mertens J.F., Zamir S. (1985) Formulation of bayesian analysis for games with incomplete information. International Journal of Game Theory 14: 1–29MATHCrossRefMathSciNetGoogle Scholar
  24. 24.
    Nair, R., Tambe, M., Yokoo, M., Pynadath, D., & Marsella, S. (2003). Taming decentralized pomdps: Towards efficient policy computation for multiagent settings. In International Joint Conference on Artificial Intelligence (IJCAI) (pp. 705–711). Acapulco, Mexico.Google Scholar
  25. 25.
    Nilsson, D., & Lauritzen, S. (2000). Evaluating influence diagrams using limids. In Uncertainty in Artificial Intelligence (UAI) (pp. 436–445). Stanford, California.Google Scholar
  26. 26.
    Pearl J. (1988). Probabilistic reasoning in intelligent systems: Networks of plausible inference. Morgan-Kaufmann: Los Altos, California.Google Scholar
  27. 27.
    Pineau J., Gordon G., Thrun S. (2006) Anytime point-based approximations for large pomdps. Journal of Artificial Intelligence Research (JAIR) 27: 335–380MATHGoogle Scholar
  28. 28.
    Polich, K., & Gmytrasiewicz, P. (2006). Interactive dynamic influence diagrams. In Game Theory and Decision Theory (GTDT) Workshop, AAMAS. Hakodate, Japan.Google Scholar
  29. 29.
    Pynadath, D., & Marsella, S. (2007). Minimal mental models. In Twenty-Second Conference on Artificial Intelligence (AAAI) (pp. 1038–1044). Canada, Vancouver.Google Scholar
  30. 30.
    Rathnas, B., Doshi, P., & Gmytrasiewicz, P. J. (2006). Exact solutions to interactive pomdps using behavioral equivalence. In Autonomous Agents and Multi-Agents Systems Conference (AAMAS) (pp. 1025–1032). Hakodate, Japan.Google Scholar
  31. 31.
    Russell, S., & Norvig, P. (2003). Artificial intelligence: A modern approach (2nd edn). Prentice Hall.Google Scholar
  32. 32.
    Seuken, S., & Zilberstein, S. (2008). Formal models and algorithms for decentralized decision making under uncertainty. Journal of Autonomous Agents and Multi-agent Systems. doi:10.1007/s10458-007-9026-5.
  33. 33.
    Shachter R.D. (1986) Evaluating influence diagrams. Operations Research 34(6): 871–882CrossRefMathSciNetGoogle Scholar
  34. 34.
    Smallwood R., Sondik E. (1973) The optimal control of partially observable markov decision processes over a finite horizon. Operations Research (OR) 21: 1071–1088MATHCrossRefGoogle Scholar
  35. 35.
    Suryadi, D., & Gmytrasiewicz, P. (1999). Learning models of other agents using influence diagrams. In International Conference on User Modeling (pp. 223–232).Google Scholar
  36. 36.
    Tatman J.A., Shachter R.D. (1990) Dynamic programming and influence diagrams. IEEE Transactions on Systems, Man, and Cybernetics 20(2): 365–379MATHCrossRefMathSciNetGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2008

Authors and Affiliations

  1. 1.Department of Computer Science and Institute for AIUniversity of GeorgiaAthensUSA
  2. 2.Department of Computer ScienceAalborg UniversityAalborgDenmark
  3. 3.Department of Computer ScienceNational University of SingaporeSingaporeSingapore

Personalised recommendations