A Decision-Theoretic Approach to Collaboration: Principal Description Methods and Efficient Heuristic Approximations

  • Frans A. Oliehoek
  • Arnoud Visser
Part of the Studies in Computational Intelligence book series (SCI, volume 281)

Abstract

This chapter gives an overview of the state of the art in decision-theoretic models to describe cooperation between multiple agents in a dynamic environment. Making (near-) optimal decisions in such settings gets harder when the number of agents grows or the uncertainty about the environment increases. It is essential to have compact models, because otherwise just representing the decision problem becomes intractable. Several such model descriptions and approximate solution methods, studied in the Interactive Collaborative Information Systems project, are presented and illustrated in the context of crisis management.

Keywords

Burning Lution Nash Aliasing 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Barto, A.G., Mahadevan, S.: Recent advances in hierarchical reinforcement learning. Discrete event dynamic systems: Theory and applications 13, 343–379 (2003)MATHGoogle Scholar
  2. 2.
    Becker, R., Zilberstein, S., Lesser, V., Goldman, C.V.: Solving transition independent decentralized Markov decision processes. Journal of Artificial Intelligence Research 22, 423–455 (2004)MATHMathSciNetGoogle Scholar
  3. 3.
    Bernstein, D.S., Givan, R., Immerman, N., Zilberstein, S.: The complexity of decentralized control of Markov decision processes. Mathematics of Operations Research 27(4), 819–840 (2002)MATHCrossRefMathSciNetGoogle Scholar
  4. 4.
    Bernstein, D.S., Zilberstein, S., Immerman, N.: The complexity of decentralized control of Markov decision processes. In: Proc. of Uncertainty in Artificial Intelligence, pp. 32–37 (2000)Google Scholar
  5. 5.
    Bertsekas, D.P.: Dynamic Programming and Optimal Control, 3rd edn., vol. I. Athena Scientific, Belmont (2005)MATHGoogle Scholar
  6. 6.
    Boularias, A., Chaib-draa, B.: Exact dynamic programming for decentralized POMDPs with lossless policy compression. In: Proc. of the International Conference on Automated Planning and Scheduling (2008)Google Scholar
  7. 7.
    Boutilier, C.: Planning, learning and coordination in multiagent decision processes. In: Proc. of the 6th Conference on Theoretical Aspects of Rationality and Knowledge, pp. 195–210 (1996)Google Scholar
  8. 8.
    Boutilier, C., Dean, T., Hanks, S.: Decision-theoretic planning: Structural assumptions and computational leverage. Journal of Artificial Intelligence Research 11, 1–94 (1999)MATHMathSciNetGoogle Scholar
  9. 9.
    van den Broek, B., Wiegerinck, W., Kappen, B.: Graphical models inference in optimal control of stochastic multi-agent systems. Journal of Artificial Intelligence Research 32, 95–122 (2008)MATHGoogle Scholar
  10. 10.
    Carlin, A., Zilberstein, S.: Value-based observation compression for DEC-POMDPs. In: Proc. of the International Joint Conference on Autonomous Agents and Multi Agent Systems, pp. 501–508 (2008)Google Scholar
  11. 11.
    Cohen, P.R., Levesque, H.J.: Intention is choice with commitment. Artificial Intelligence 42(3), 213–261 (1990)MATHCrossRefMathSciNetGoogle Scholar
  12. 12.
    Cohen, P.R., Levesque, H.J.: Confirmations and joint action. In: Proc. of the International Joint Conference on Artificial Intelligence, pp. 951–957. Morgan Kaufmann, San Francisco (1991)Google Scholar
  13. 13.
    Cohen, P.R., Levesque, H.J.: Teamwork. Nous 25(4) (1991)Google Scholar
  14. 14.
    Dawes, R.M.: Rational Choice in an Uncertain World. Hartcourt Brace Jovanovich (1988)Google Scholar
  15. 15.
    Dean, T., Givan, R.: Model minimization in Markov decision processes. In: Proc. of the National Conference on Artificial Intelligence, pp. 106–111 (1997)Google Scholar
  16. 16.
    Dietterich, T.G.: Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artificial Intelligence Research 13, 227–303 (2000)MATHMathSciNetGoogle Scholar
  17. 17.
    Doshi, P.: Approximate state estimation in multiagent settings with continuous or large discrete state spaces. In: Proc. of the International Joint Conference on Autonomous Agents and Multi Agent Systems, p. 13 (2007)Google Scholar
  18. 18.
    Doshi, P., Gmytrasiewicz, P.J.: A particle filtering based approach to approximating interactive POMDPs. In: Proc. of the National Conference on Artificial Intelligence, pp. 969–974 (2005)Google Scholar
  19. 19.
    Doshi, P., Perez, D.: Generalized point based value iteration for interactive POMDPs. In: Proc. of the National Conference on Artificial Intelligence, pp. 63–68 (2008)Google Scholar
  20. 20.
    Doshi, P., Zeng, Y., Chen, Q.: Graphical models for interactive POMDPs: representations and solutions. Autonomous Agents and Multi-Agent Systems 18(3), 376–416 (2008)CrossRefGoogle Scholar
  21. 21.
    Druzdzel, M.J., Flynn, R.R.: Decision Support Systems. In: Encyclopedia of Library and Information Science. The Taylor & Francis, Inc., New York (2003)Google Scholar
  22. 22.
    Emery-Montemerlo, R., Gordon, G., Schneider, J., Thrun, S.: Approximate solutions for partially observable stochastic games with common payoffs. In: Proc. of the International Joint Conference on Autonomous Agents and Multi Agent Systems, pp. 136–143 (2004)Google Scholar
  23. 23.
    Emery-Montemerlo, R., Gordon, G., Schneider, J., Thrun, S.: Game theoretic control for robot teams. In: Proc. of the IEEE International Conference on Robotics and Automation, pp. 1175–1181 (2005)Google Scholar
  24. 24.
    Feng, Z., Hansen, E.: An approach to state aggregation for POMDPs. In: AAAI 2004 Workshop on Learning and Planning in Markov Processes – Advances and Challenges, pp. 7–12 (2004)Google Scholar
  25. 25.
    Gal, Y., Pfeffer, A.: Networks of influence diagrams: A formalism for representing agents’ beliefs and decision-making processes. Journal of Artificial Intelligence Research 33, 109–147 (2008)MATHMathSciNetGoogle Scholar
  26. 26.
    Georgeff, M.P., Pell, B., Pollack, M.E., Tambe, M., Wooldridge, M.: The belief-desire-intention model of agency. In: Rao, A.S., Singh, M.P., Müller, J.P. (eds.) ATAL 1998. LNCS (LNAI), vol. 1555, pp. 1–10. Springer, Heidelberg (1999)CrossRefGoogle Scholar
  27. 27.
    Givan, R., Dean, T., Greig, M.: Equivalence notions and model minimization in Markov decision processes. Artificial Intelligence 147(1-2), 163–223 (2003)MATHMathSciNetGoogle Scholar
  28. 28.
    Gmytrasiewicz, P.J., Doshi, P.: A framework for sequential planning in multi-agent settings. Journal of Artificial Intelligence Research 24, 49–79 (2005)MATHGoogle Scholar
  29. 29.
    Gmytrasiewicz, P.J., Durfee, E.H.: A rigorous, operational formalization of recursive modeling. In: Proc. of the International Conference on Multiagent Systems, pp. 125–132 (1995)Google Scholar
  30. 30.
    Gmytrasiewicz, P.J., Noh, S., Kellogg, T.: Bayesian update of recursive agent models. User Modeling and User-Adapted Interaction 8(1-2), 49–69 (1998)CrossRefGoogle Scholar
  31. 31.
    Goldman, C.V., Zilberstein, S.: Optimizing information exchange in cooperative multi-agent systems. In: Proc. of the International Joint Conference on Autonomous Agents and Multi Agent Systems, pp. 137–144 (2003)Google Scholar
  32. 32.
    Goldman, C.V., Zilberstein, S.: Decentralized control of cooperative systems: Categorization and complexity analysis. Journal of Artificial Intelligence Research 22, 143–174 (2004)MATHMathSciNetGoogle Scholar
  33. 33.
    Grosz, B.J., Kraus, S.: Collaborative plans for complex group action. Artificial Intelligence 86(2), 269–357 (1996)CrossRefMathSciNetGoogle Scholar
  34. 34.
    Grosz, B.J., Sidner, C.: Plans for discourse. In: Intentions in Communication. MIT Press, Cambridge (1990)Google Scholar
  35. 35.
    Guestrin, C., Koller, D., Parr, R.: Multiagent planning with factored MDPs. In: Advances in Neural Information Processing Systems, vol. 14, pp. 1523–1530 (2002)Google Scholar
  36. 36.
    Guestrin, C., Koller, D., Parr, R., Venkataraman, S.: Efficient solution algorithms for factored MDPs. Journal of Artificial Intelligence Research 19, 399–468 (2003)MATHMathSciNetGoogle Scholar
  37. 37.
    Hansen, E.A., Bernstein, D.S., Zilberstein, S.: Dynamic programming for partially observable stochastic games. In: Proc. of the National Conference on Artificial Intelligence, pp. 709–715 (2004)Google Scholar
  38. 38.
    Jennings, N.R.: Controlling cooperative problem solving in industrial multi-agent systems using joint intentions. Artificial Intelligence 75(2), 195–240 (1995)CrossRefGoogle Scholar
  39. 39.
    Kaelbling, L.P., Littman, M.L., Cassandra, A.R.: Planning and acting in partially observable stochastic domains. Artificial Intelligence 101(1-2), 99–134 (1998)MATHCrossRefMathSciNetGoogle Scholar
  40. 40.
    Kitano, H., Tadokoro, S., Noda, I., Matsubara, H., Takahashi, T., Shinjoh, A., Shimada, S.: RoboCup rescue: Search and rescue in large-scale disasters as a domain for autonomous agents research. In: Proc. of the International Conference on Systems, Man and Cybernetics, pp. 739–743 (1999)Google Scholar
  41. 41.
    Kok, J.R., Spaan, M.T.J., Vlassis, N.: Non-communicative multi-robot coordination in dynamic environments. Robotics and Autonomous Systems 50(2-3), 99–114 (2005)CrossRefGoogle Scholar
  42. 42.
    Kok, J.R., Vlassis, N.: Collaborative multiagent reinforcement learning by payoff propagation. Journal of Machine Learning Research 7, 1789–1828 (2006)MathSciNetGoogle Scholar
  43. 43.
    Koller, D., Megiddo, N., von Stengel, B.: Fast algorithms for finding randomized strategies in game trees. In: Proc. of the 26th ACM Symposium on Theory of Computing, pp. 750–759 (1994)Google Scholar
  44. 44.
    Koller, D., Milch, B.: Multi-agent influence diagrams for representing and solving games. Games and Economic Behavior 45(1), 181–221 (2003)MATHCrossRefMathSciNetGoogle Scholar
  45. 45.
    Madani, O., Hanks, S., Condon, A.: On the undecidability of probabilistic planning and infinite-horizon partially observable Markov decision problems. In: Proc. of the National Conference on Artificial Intelligence, pp. 541–548 (1999)Google Scholar
  46. 46.
    Nair, R., Tambe, M.: Hybrid BDI-POMDP framework for multiagent teaming. Journal of Artificial Intelligence Research 23, 367–420 (2005)MATHGoogle Scholar
  47. 47.
    Nair, R., Varakantham, P., Tambe, M., Yokoo, M.: Networked distributed POMDPs: A synthesis of distributed constraint optimization and POMDPs. In: Proc. of the National Conference on Artificial Intelligence, pp. 133–139 (2005)Google Scholar
  48. 48.
    Oliehoek, F.A., Spaan, M.T.J., Vlassis, N.: Optimal and approximate Q-value functions for decentralized POMDPs. Journal of Artificial Intelligence Research 32, 289–353 (2008)MATHMathSciNetGoogle Scholar
  49. 49.
    Oliehoek, F.A., Spaan, M.T.J., Whiteson, S., Vlassis, N.: Exploiting locality of interaction in factored POMDPs. In: Proc. of the International Joint Conference on Autonomous Agents and Multi Agent Systems, December 2008, pp. 517–524 (2008)Google Scholar
  50. 50.
    Oliehoek, F.A., Visser, A.: A hierarchical model for decentralized fighting of large scale urban fires. In: Proc. of the AAMAS 2006 Workshop on Hierarchical Autonomous Agents and Multi-Agent Systems (H-AAMAS), pp. 14–21 (2006)Google Scholar
  51. 51.
    Oliehoek, F.A., Whiteson, S., Spaan, M.T.J.: Lossless clustering of histories in decentralized POMDPs. In: Proc. of the International Joint Conference on Autonomous Agents and Multi Agent Systems, pp. 577–584 (2009)Google Scholar
  52. 52.
    Osborne, M.J., Rubinstein, A.: A Course in Game Theory. The MIT Press, Cambridge (1994)Google Scholar
  53. 53.
    Papadimitriou, C.H., Tsitsiklis, J.N.: The complexity of Markov decision processes. Mathematics of Operations Research 12(3), 441–451 (1987)MATHCrossRefMathSciNetGoogle Scholar
  54. 54.
    Post, S., Fassaert, M.: A communication and coordination model for ‘RoboCupRescue’ agents. Master’s thesis, University of Amsterdam (2004)Google Scholar
  55. 55.
    Poupart, P.: Exploiting structure to efficiently solve large scale partially observable Markov decision processes. Ph.D. thesis, Department of Computer Science, University of Toronto (2005)Google Scholar
  56. 56.
    Puterman, M.L.: Markov Decision Processes—Discrete Stochastic Dynamic Programming. John Wiley & Sons, Inc., Chichester (1994)MATHGoogle Scholar
  57. 57.
    Pynadath, D.V., Tambe, M.: The communicative multiagent team decision problem: Analyzing teamwork theories and models. Journal of Artificial Intelligence Research 16, 389–423 (2002)MATHMathSciNetGoogle Scholar
  58. 58.
    Pynadath, D.V., Tambe, M.: An automated teamwork infrastructure for heterogeneous software agents and humans. Autonomous Agents and Multi-Agent Systems 7(1-2), 71–100 (2003)CrossRefGoogle Scholar
  59. 59.
    Rabinovich, Z., Goldman, C.V., Rosenschein, J.S.: The complexity of multiagent systems: the price of silence. In: Proc. of the International Joint Conference on Autonomous Agents and Multi Agent Systems, pp. 1102–1103 (2003)Google Scholar
  60. 60.
    Rathnasabapathy, B., Doshi, P., Gmytrasiewicz, P.: Exact solutions of interactive POMDPs using behavioral equivalence. In: Proc. of the International Joint Conference on Autonomous Agents and Multi Agent Systems, pp. 1025–1032 (2006)Google Scholar
  61. 61.
    Russell, S., Norvig, P.: Artificial Intelligence: A Modern Approach, 2nd edn. Pearson Education, London (2003)Google Scholar
  62. 62.
    Seuken, S., Zilberstein, S.: Improved memory-bounded dynamic programming for decentralized POMDPs. In: Proc. of Uncertainty in Artificial Intelligence (2007)Google Scholar
  63. 63.
    Seuken, S., Zilberstein, S.: Memory-bounded dynamic programming for DEC-POMDPs. In: Proc. of the International Joint Conference on Artificial Intelligence, pp. 2009–2015 (2007)Google Scholar
  64. 64.
    Seuken, S., Zilberstein, S.: Formal models and algorithms for decentralized decision making under uncertainty. Autonomous Agents and Multi-Agent Systems 17(2), 190–250 (2008)CrossRefGoogle Scholar
  65. 65.
    Singh, S., James, M.R., Rudary, M.R.: Predictive state representations: a new theory for modeling dynamical systems. In: Proc. of Uncertainty in Artificial Intelligence, pp. 512–519 (2004)Google Scholar
  66. 66.
    Sontag, E.D.: Mathematical control theory: deterministic finite dimensional systems, 2nd edn. Textbooks in Applied Mathematics. Springer, New York (1998)MATHGoogle Scholar
  67. 67.
    Spaan, M.T.J., Gordon, G.J., Vlassis, N.: Decentralized planning under uncertainty for teams of communicating agents. In: Proc. of the International Joint Conference on Autonomous Agents and Multi Agent Systems, pp. 249–256 (2006)Google Scholar
  68. 68.
    Stone, P.: Layered Learning in Multiagent Systems: A Winning Approach to Robotic Soccer. MIT Press, Cambridge (2000)Google Scholar
  69. 69.
    Stone, P., Veloso, M.: Task decomposition, dynamic role assignment, and low-bandwidth communication for real-time strategic teamwork. Artificial Intelligence 110(2), 241–273 (1999)MATHCrossRefGoogle Scholar
  70. 70.
    Stone, P., Veloso, M.: Multiagent systems: A survey from a machine learning perspective. Autonomous Robots 8(3), 345–383 (2000)CrossRefGoogle Scholar
  71. 71.
    Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. The MIT Press, Cambridge (1998)Google Scholar
  72. 72.
    Sycara, K.P.: Multiagent systems. AI Magazine 19(2), 79–92 (1998)Google Scholar
  73. 73.
    Szer, D., Charpillet, F., Zilberstein, S.: MAA*: A heuristic search algorithm for solving decentralized POMDPs. In: Proc. of Uncertainty in Artificial Intelligence, pp. 576–583 (2005)Google Scholar
  74. 74.
    Tambe, M.: Towards flexible teamwork. Journal of Artificial Intelligence Research 7, 83–124 (1997)Google Scholar
  75. 75.
    Tierney, K.J., Goltz, J.D.: Emergency response: Lessons learned from the kobe earthquake. Tech. rep., Disaster Research Center (1997), http://dspace.udel.edu:8080/dspace/handle/19716/202
  76. 76.
    Varakantham, P., Marecki, J., Yabu, Y., Tambe, M., Yokoo, M.: Letting loose a SPIDER on a network of POMDPs: Generating quality guaranteed policies. In: Proc. of the International Joint Conference on Autonomous Agents and Multi Agent Systems (2007)Google Scholar
  77. 77.
    Visser, A., Xingrui-Ji, van Ittersum, M., González Jaime, L.A., Stancu, L.A.: Beyond frontier exploration. In: Visser, U., Ribeiro, F., Ohashi, T., Dellaert, F. (eds.) RoboCup 2007: Robot Soccer World Cup XI. LNCS (LNAI), vol. 5001, pp. 113–123. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  78. 78.
    Vlassis, N.: A Concise Introduction to Multiagent Systems and Distributed Artificial Intelligence. In: Synthesis Lectures on Artificial Intelligence and Machine Learning. Morgan & Claypool Publishers, San Francisco (2007)Google Scholar
  79. 79.
    Wooldridge, M.: An Introduction to MultiAgent Systems. Wiley, Chichester (2002)Google Scholar
  80. 80.
    Xuan, P., Lesser, V., Zilberstein, S.: Communication decisions in multi-agent cooperation: Model and experiments. In: Proc. of the International Conference on Autonomous Agents (2001)Google Scholar
  81. 81.
    Zeng, Y., Doshi, P., Chen, Q.: Approximate solutions of interactive dynamic influence diagrams using model clustering. In: Proc. of the National Conference on Artificial Intelligence, pp. 782–787 (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Frans A. Oliehoek
    • 1
  • Arnoud Visser
    • 1
  1. 1.Intelligent System Laboratory AmsterdamAmsterdamThe Netherlands

Personalised recommendations