Skip to main content
Log in

Graphical models for interactive POMDPs: representations and solutions

  • Published:
Autonomous Agents and Multi-Agent Systems Aims and scope Submit manuscript

Abstract

We develop new graphical representations for the problem of sequential decision making in partially observable multiagent environments, as formalized by interactive partially observable Markov decision processes (I-POMDPs). The graphical models called interactive influence diagrams (I-IDs) and their dynamic counterparts, interactive dynamic influence diagrams (I-DIDs), seek to explicitly model the structure that is often present in real-world problems by decomposing the situation into chance and decision variables, and the dependencies between the variables. I-DIDs generalize DIDs, which may be viewed as graphical representations of POMDPs, to multiagent settings in the same way that I-POMDPs generalize POMDPs. I-DIDs may be used to compute the policy of an agent given its belief as the agent acts and observes in a setting that is populated by other interacting agents. Using several examples, we show how I-IDs and I-DIDs may be applied and demonstrate their usefulness. We also show how the models may be solved using the standard algorithms that are applicable to DIDs. Solving I-DIDs exactly involves knowing the solutions of possible models of the other agents. The space of models grows exponentially with the number of time steps. We present a method of solving I-DIDs approximately by limiting the number of other agents’ candidate models at each time step to a constant. We do this by clustering models that are likely to be behaviorally equivalent and selecting a representative set from the clusters. We discuss the error bound of the approximation technique and demonstrate its empirical performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Adam B., Dekel E. (1993) Hierarchies of beliefs and common knowledge. International Journal of Game Theory 59(1): 189–198

    MATH  Google Scholar 

  2. Aumann R.J. (1999) Interactive epistemology i: Knowledge. International Journal of Game Theory, 28(3): 263–300

    Article  MATH  MathSciNet  Google Scholar 

  3. Boutilier, C. (1999). Sequential optimality and coordination in multiagent systems. In Sixteenth International Joint Conference on Artificial Intelligence (IJCAI) (pp. 478–485). Stockhom, Sweeden.

  4. Boutilier, C., & Poole, D. (1996). Computing optimal policies for partially observable decision processes using compact representations. In Thirteenth Conference on Artificial Intelligence (AAAI) (pp. 1168–1175). Portland, Oregon.

  5. Camerer C. (2003) Behavioral game theory: Experiments in strategic interaction. Princeton University Press, Princeton New Jersey

    MATH  Google Scholar 

  6. Charnes J.M., Shenoy P. (2004) Multistage monte carlo methods for solving influence diagrams using local computation. Management Science 50(3): 405–418

    Article  Google Scholar 

  7. Dennett D. (1986) Intentional systems. MIT Press, Brainstorms

    Google Scholar 

  8. Doshi, P., & Gmytrasiewicz, P. J. (2005). Approximating state estimation in multiagent settings using particle filters. In Autonomous Agents and Multi-agent Systems Conference (AAMAS) (pp. 320–327). Utrecht, Netherlands.

  9. Doshi, P., & Gmytrasiewicz, P. J. (2005). A particle filtering based approach to approximating interactive pomdps. In Twentieth Conference on Artificial Intelligence (AAAI) (pp. 969–974). Pittsburg, PA.

  10. Fehr E., Gachter S. (2000) Cooperation and punishment in public good experiments. American Economic Review 90(4): 980–994

    Google Scholar 

  11. Fudenberg D., Levine D.K. (1998) The theory of learning in games. The MIT Press, Cambridge MA

    MATH  Google Scholar 

  12. Fudenberg, D., & Tirole, J. (1991). Game theory. MIT Press.

  13. Gal, Y., & Pfeffer, A. (2003). A language for modeling agent’s decision-making processes in games. In Autonomous Agents and Multi-Agents Systems Conference (AAMAS) (pp. 265–272). Melbourne, Australia.

  14. Gmytrasiewicz P., Doshi P. (2005) A framework for sequential planning in multiagent settings. Journal of Artificial Intelligence Research (JAIR) 24: 49–79

    MATH  Google Scholar 

  15. Gmytrasiewicz P.J., Durfee E.H. (2000) Rational coordination in multi-agent environments. Journal of Autonomous Agents and Multi-Agent Systems 3(4): 319–350

    Article  Google Scholar 

  16. Guestrin, C., Koller, D., & Parr, R. (2001). Solving factored pomdps with linear value functions. In Workshop on Planning under Uncertainty and Incomplete Information, IJCAI. Seattle, Washington.

  17. Harsanyi J.C. (1967) Games with incomplete information played by bayesian players. Management Science 14(3): 159–182

    Article  MATH  MathSciNet  Google Scholar 

  18. Howard, R. A., & Matheson, J. E. (1984). Influence diagrams. In Readings on the Principles and Applications of Decision Analysis (pp. 721–762).

  19. Kaelbling L., Littman M., Cassandra A. (1998) Planning and acting in partially observable stochastic domains. Artificial Intelligence Journal 101(1–2): 99–134

    Article  MATH  MathSciNet  Google Scholar 

  20. Koller, D., & Milch, B. (2001). Multi-agent influence diagrams for representing and solving games. In International Joint Conference on Artificial Intelligence (IJCAI) (pp. 1027–1034). Seattle, Washington.

  21. Littman, M. (1994). Markov games as a framework for multiagent reinforcement learning. In International Conference on Machine Learning (ICML) (pp. 157–163). New Brunswick, New Jersey.

  22. MacQueen J. (1967) Some methods for classification and analysis of multivariate observations. In: LeCam L.M., Neyman J.(eds) Proceedings of the Fifth Berkeley Symposium on Mathematics, Statistics, and Probablity. UC Press, Berkeley, CA, pp 281–297

    Google Scholar 

  23. Mertens J.F., Zamir S. (1985) Formulation of bayesian analysis for games with incomplete information. International Journal of Game Theory 14: 1–29

    Article  MATH  MathSciNet  Google Scholar 

  24. Nair, R., Tambe, M., Yokoo, M., Pynadath, D., & Marsella, S. (2003). Taming decentralized pomdps: Towards efficient policy computation for multiagent settings. In International Joint Conference on Artificial Intelligence (IJCAI) (pp. 705–711). Acapulco, Mexico.

  25. Nilsson, D., & Lauritzen, S. (2000). Evaluating influence diagrams using limids. In Uncertainty in Artificial Intelligence (UAI) (pp. 436–445). Stanford, California.

  26. Pearl J. (1988). Probabilistic reasoning in intelligent systems: Networks of plausible inference. Morgan-Kaufmann: Los Altos, California.

  27. Pineau J., Gordon G., Thrun S. (2006) Anytime point-based approximations for large pomdps. Journal of Artificial Intelligence Research (JAIR) 27: 335–380

    MATH  Google Scholar 

  28. Polich, K., & Gmytrasiewicz, P. (2006). Interactive dynamic influence diagrams. In Game Theory and Decision Theory (GTDT) Workshop, AAMAS. Hakodate, Japan.

  29. Pynadath, D., & Marsella, S. (2007). Minimal mental models. In Twenty-Second Conference on Artificial Intelligence (AAAI) (pp. 1038–1044). Canada, Vancouver.

  30. Rathnas, B., Doshi, P., & Gmytrasiewicz, P. J. (2006). Exact solutions to interactive pomdps using behavioral equivalence. In Autonomous Agents and Multi-Agents Systems Conference (AAMAS) (pp. 1025–1032). Hakodate, Japan.

  31. Russell, S., & Norvig, P. (2003). Artificial intelligence: A modern approach (2nd edn). Prentice Hall.

  32. Seuken, S., & Zilberstein, S. (2008). Formal models and algorithms for decentralized decision making under uncertainty. Journal of Autonomous Agents and Multi-agent Systems. doi:10.1007/s10458-007-9026-5.

  33. Shachter R.D. (1986) Evaluating influence diagrams. Operations Research 34(6): 871–882

    Article  MathSciNet  Google Scholar 

  34. Smallwood R., Sondik E. (1973) The optimal control of partially observable markov decision processes over a finite horizon. Operations Research (OR) 21: 1071–1088

    Article  MATH  Google Scholar 

  35. Suryadi, D., & Gmytrasiewicz, P. (1999). Learning models of other agents using influence diagrams. In International Conference on User Modeling (pp. 223–232).

  36. Tatman J.A., Shachter R.D. (1990) Dynamic programming and influence diagrams. IEEE Transactions on Systems, Man, and Cybernetics 20(2): 365–379

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Prashant Doshi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Doshi, P., Zeng, Y. & Chen, Q. Graphical models for interactive POMDPs: representations and solutions. Auton Agent Multi-Agent Syst 18, 376–416 (2009). https://doi.org/10.1007/s10458-008-9064-7

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10458-008-9064-7

Keywords

Navigation