Autonomous Agents and Multi-Agent Systems

, Volume 17, Issue 2, pp 190–250

Formal models and algorithms for decentralized decision making under uncertainty

Article

Abstract

Over the last 5 years, the AI community has shown considerable interest in decentralized control of multiple decision makers or “agents” under uncertainty. This problem arises in many application domains, such as multi-robot coordination, manufacturing, information gathering, and load balancing. Such problems must be treated as decentralized decision problems because each agent may have different partial information about the other agents and about the state of the world. It has been shown that these problems are significantly harder than their centralized counterparts, requiring new formal models and algorithms to be developed. Rapid progress in recent years has produced a number of different frameworks, complexity results, and planning algorithms. The objectives of this paper are to provide a comprehensive overview of these results, to compare and contrast the existing frameworks, and to provide a deeper understanding of their relationships with one another, their strengths, and their weaknesses. While we focus on cooperative systems, we do point out important connections with game-theoretic approaches. We analyze five different formal frameworks, three different optimal algorithms, as well as a series of approximation techniques. The paper provides interesting insights into the structure of decentralized problems, the expressiveness of the various models, and the relative advantages and limitations of the different solution techniques. A better understanding of these issues will facilitate further progress in the field and help resolve several open problems that we identify.

Keywords

Decentralized decision making Cooperative agents Multi-agent planning 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Amato, C., Bernstein, D., & Zilberstein, S. (2007). Optimizing memory-bounded controllers for decentralized POMDP. In Proceedings of the Twenty-Third Conference on Uncertainty in Artificial Intelligence (UAI). Vancouver, Canada, July 2007.Google Scholar
  2. 2.
    Amato, C., Bernstein, D. S., & Zilberstein, S. (2007). Solving POMDPs using quadratically constrained linear programs. In Proceedings of the Twentieth International Joint Conference on Artificial Intelligence (IJCAI) (pp. 2418–2424). Hyderabad, India, January 2007.Google Scholar
  3. 3.
    Aström K.J. (1965) Optimal control of Markov decision processes with incomplete state estimation. Journal of Mathematical Analysis and Applications 10: 174–205MATHCrossRefMathSciNetGoogle Scholar
  4. 4.
    Becker, R., Zilberstein, S., Lesser, V., & Goldman, C. V. (2004). Solving transition independent decentralized Markov decision processes. Journal of Artificial Intelligence Research (JAIR), 22, 423–455.Google Scholar
  5. 5.
    Bernstein, D. S. (2005). Complexity analysis and optimal algorithms for decentralized decision making. PhD thesis, Department of Computer Science, University of Massachusetts Amherst, Amherst, MA.Google Scholar
  6. 6.
    Bernstein D.S., Givan R., Immerman N., Zilberstein S. (2002) The complexity of decentralized control of Markov decision processes. Mathematics of Operations Research 27(4): 819–840MATHCrossRefMathSciNetGoogle Scholar
  7. 7.
    Bernstein, D. S., Hansen, E. A., & Zilberstein, S. (2005). Bounded policy iteration for decentralized POMDPs. In Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence (IJCAI) (pp. 1287–1292). Edinburgh, Scotland, July 2005.Google Scholar
  8. 8.
    Bernstein, D. S., Zilberstein, S., & Immerman, N. (2000). The complexity of decentralized control of Markov decision processes. In Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence (UAI) (pp. 32–37). Stanford, California, June 2000.Google Scholar
  9. 9.
    Beynier, A., & Mouaddib, A. (2006). An iterative algorithm for solving constrained decentralized Markov decision processes. In Proceedings of the Twenty-First National Conference on Artificial Intelligence (AAAI) (pp. 1089–1094). Boston, MA, July 2006.Google Scholar
  10. 10.
    Boutilier, C. (1996). Planning, learning and coordination in multiagent decision processes. In Proceedings of the Conference on Theoretical Aspects of Rationality and Knowledge (pp. 195–210). De Zeeuwse Stromen, The Netherlands.Google Scholar
  11. 11.
    Cogill, R., Rotkowitz, M., van Roy, B., & Lall, S. (2004). An approximate dynamic programming approach to decentralized control of stochastic systems. In Proceedings of the Allerton Conference on Communication, Control, and Computing (pp. 1040–1049). Urbana-Champaign, IL, 2004.Google Scholar
  12. 12.
    de Farias D.P., van Roy B. (2003) The linear programming approach to approximate dynamic programming. Operations Research 51(6): 850–865CrossRefMathSciNetMATHGoogle Scholar
  13. 13.
    Doshi, P., & Gmytrasiewicz, P. J. (2005). A particle filtering based approach to approximating interactive POMDPs. In Proceedings of the Twentieth National Conference on Artificial Intelligence (AAAI) (pp. 969–974). Pittsburg, Pennsylvania, July 2005.Google Scholar
  14. 14.
    Doshi, P., & Gmytrasiewicz, P. J. (2005). Approximating state estimation in multiagent settings using particle filters. In Proceedings of the Fourth International Joint Conference on Autonomous Agents and Multi-Agent Systems (AAMAS) (pp. 320–327). Utrecht, Netherlands, July 2005.Google Scholar
  15. 15.
    Emery-Montemerlo, R., Gordon, G., Schneider, J., & Thrun, S. (2004). Approximate solutions for partially observable stochastic games with common payoffs. In Proceedings of the Third International Joint Conference on Autonomous Agents and Multi-Agent Systems (AAMAS) (pp. 136–143). New York, NY, July 2004.Google Scholar
  16. 16.
    Gmytrasiewicz P.J., Doshi P. (2005) A framework for sequential planning in multiagent settings. Journal of Artificial Intelligence Research JAIR) 24: 49–79MATHGoogle Scholar
  17. 17.
    Goldman C.V., Allen M., Zilberstein S. (2007) Learning to communicate in a decentralized environment. Autonomous Agents and Multi-Agent Systems 15(1): 47–90CrossRefGoogle Scholar
  18. 18.
    Goldman, C. V., & Zilberstein, S. (2003). Optimizing information exchange in cooperative multi agent systems. In Proceedings of the Second International Joint Conference on Autonomous Agents and Multi-Agent Systems (AAMAS) (pp. 137–144). Melbourne, Australia, July 2003.Google Scholar
  19. 19.
    Goldman C.V., Zilberstein S. (2004) Decentralized control of cooperative systems: Categorization and complexity analysis. Journal of Artificial Intelligence Research (JAIR) 22: 143–174MATHMathSciNetGoogle Scholar
  20. 20.
    Goldman, C. V., & Zilberstein, S. (2004). Goal-oriented DEC-MDPs with direct communication. Technical Report 04-44, Department of Computer Science, University of Massachusetts Amherst.Google Scholar
  21. 21.
    Hansen, E. A. (1998). Finite-memory control of partially observable systems. PhD thesis, Department of Computer Science, University of Massachuetts Amherst.Google Scholar
  22. 22.
    Hansen, E. A., Bernstein, D. S., & Zilberstein, S. (2004). Dynamic programming for partially observable stochastic games. In Proceedings of the Nineteenth National Conference on Artificial Intelligence (AAAI) (pp. 709–715). San Jose, California, July 2004.Google Scholar
  23. 23.
    Kaelbling L.P., Littmann M.L., Cassandra A.R. (1998) Planning and acting in partially observable stochastic domains. Artificial Intelligence 101(2): 99–134MATHCrossRefMathSciNetGoogle Scholar
  24. 24.
    Kalai E., Lehrer E. (1993) Rational learning leads to nash equilibrium. Econometrica 1: 1231–1240CrossRefMathSciNetGoogle Scholar
  25. 25.
    Madani O., Hanks S., Condon A. (2003) On the undecidability of probabilistic planning and related stochastic optimization problems. Artificial Intelligence 147(1–2): 5–34MATHMathSciNetGoogle Scholar
  26. 26.
    Nair, R., Pynadath, D., Yokoo, M., Tambe, M., & Marsella, S. (2003). Taming decentralized POMDPs: Towards efficient policy computation for multiagent settings. In Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence (IJCAI) (pp. 705–711). Acapulco, Mexico, August 2003.Google Scholar
  27. 27.
    Nair, R., Varakantham, P., Tambe, M., & Yokoo, M. (2005). Networked distributed POMDPs: A synthesis of distributed constraint optimization and POMDPs. In Proceedings of the Twentieth National Conference on Artificial Intelligence (AAAI) (pp. 133–139). Pittsburgh, Pennsylvania, July 2005.Google Scholar
  28. 28.
    Nisan, N., Roughgarden, T., Tardos, E., Vazirani, V.V. (eds) (2007) Algorithmic Game Theory. Cambridge University Press., New York, NYMATHGoogle Scholar
  29. 29.
    Ooi, J. M., & Wornell, G. W. (1996). Decentralized control of a multiple access broadcast channel: Performance bounds. In Proceedings of the Thirty-Fifth Conference on Decision and Control (pp. 293–298). Kobe, Japan, December 1996.Google Scholar
  30. 30.
    Osborne M.J., Rubinstein A. (1994) A course in game theory. The MIT Press, Cambridge, MAGoogle Scholar
  31. 31.
    Papadimitriou C.H. (1994) Computational complexity. Addison Wesley, Reading, MAMATHGoogle Scholar
  32. 32.
    Papadimitriou C.H., Tsitsiklis J.N. (1986) Intractable problems in control theory. SIAM Journal on Control and Optimization 24(4): 639–654MATHCrossRefMathSciNetGoogle Scholar
  33. 33.
    Papadimitriou C.H., Tsitsiklis J.N. (1987) The complexity of Markov decision processes. Mathematics of Operations Research 12(3): 441–450MATHMathSciNetCrossRefGoogle Scholar
  34. 34.
    Parkes, D. C. (2008). Computational mechanism design. In Lecture notes of Tutorials at 10th Conf. on Theoretical Aspectsof Rationality and Knowledge (TARK-05). Institute of Mathematical Sciences, University of Singapore (to appear).Google Scholar
  35. 35.
    Peshkin, L., Kim, K.-E., Meuleau, N., & Kaelbling, L. P. (2000). Learning to cooperate via policy search. In Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence (UAI) (pp. 489–496). Stanford, CA, July 2000.Google Scholar
  36. 36.
    Petrik, M., & Zilberstein, S. (2007). Anytime coordination using separable bilinear programs. In Proceedings of the Twenty-Second Conference on Artificial Intelligence (AAAI) (pp. 750–755). Vancouver, Canada, July 2007.Google Scholar
  37. 37.
    Poupart, P., & Boutilier, C. (2003). Bounded finite state controllers. In Advances in Neural Information Processing Systems 16 (NIPS). Vancouver, Canada, December 2003.Google Scholar
  38. 38.
    Puterman M.L. (1994) Markov decision processes: discrete stochastic dynamic programming. Wiley, New York, NYMATHGoogle Scholar
  39. 39.
    Pynadath D.V., Tambe M. (2002) The communicative multiagent team decision problem: Analyzing teamwork theories and models. Journal of Artificial Intelligence Research (JAIR) 16: 389–423MATHMathSciNetGoogle Scholar
  40. 40.
    Rabinovich, Z., Goldman, C. V., & Rosenschein, J. S. (2003). The complexity of multiagent systems: The price of silence. In Proceedings of the Second International Joint Conference on Autonomous Agents and Multi-Agent Systems (AAMAS) (pp. 1102–1103). Melbourne, Australia, 2003.Google Scholar
  41. 41.
    Russell S., Norvig P. (2003) Artificial intelligence: A modern approach. Prentice Hall, Upper Saddle River, NJGoogle Scholar
  42. 42.
    Seuken, S., & Zilberstein, S. (2007). Memory-bounded dynamic programming for DEC-POMDPs. In Proceedings of the Twentieth International Joint Conference on Artificial Intelligence (IJCAI) (pp. 2009–2015). Hyderabad, India, January 2007.Google Scholar
  43. 43.
    Seuken, S., & Zilberstein, S. (2007). Improved memory-bounded dynamic programming for decentralized POMDPs. In Proceedings of the 23rd Conference on Uncertainty in Artificial Intelligence (UAI). Vancouver, Canada, July 2007.Google Scholar
  44. 44.
    Shen, J., Becker, R., & Lesser, V. (2006). Agent interaction in distributed MDPs and its implications on complexity. In Proceedings of the Fifth International Joint Conference on Autonomous Agents and Multi-Agent Systems (AAMAS). Hakodate, Japan, May 2006.Google Scholar
  45. 45.
    Suzuki I., Yamashita M. (1999) Distributed anonymous mobile robots: Formation of geometric patterns. SIAM Journal on Computing 28(4): 1347–1363MATHCrossRefMathSciNetGoogle Scholar
  46. 46.
    Szer, D., & Charpillet, F. (2005). An optimal best-first search algorithm for solving infinite horizon DEC-POMDPs. In Proceedings of the Sixteenth European Conference on Machine Learning (ECML) (pp. 389–399). Porto, Portugal, October 2005.Google Scholar
  47. 47.
    Szer, D., & Charpillet, F. (2006). Point-based dynamic programming for DEC-POMDPs. In Proceedings of the Twenty-First National Conference on Artificial Intelligence (AAAI) (pp. 1233–1238). Boston, MA, July 2006.Google Scholar
  48. 48.
    Szer, D., Charpillet, F., & Zilberstein, S. (2005). MAA*: A heuristic search algorithm for solving decentralized POMDPs. In Proceedings of the Twenty-First Conference on Uncertainty in Artificial Intelligence (UAI) (pp. 576–583). Edinburgh, Scotland, July 2005.Google Scholar
  49. 49.
    Tambe M. (1997) Towards flexible teamwork. Journal of Artificial Intelligence Research (JAIR) 7: 83–124Google Scholar
  50. 50.
    Washington, R., Golden, K., Bresina, J., Smith, D. E., Anderson, C., & Smith, T. (1999). Autonomous rovers for Mars exploration. In Proceedings of the IEEE Aerospace Conference (pp. 237–251). Snowmass, CO, March 1999.Google Scholar
  51. 51.
    Zilberstein, S., Washington, R.,Bernstein, D. S., & Mouaddib, A.-I. (2002). Decision-theoretic control of planetary rovers. In Beetz, M., Guibas, L., Hertzberg, J., Ghallab, M., & Pollack, M.E. (Eds.), Advances in Plan-Based Control of Robotic Agents, Lecture Notes in Computer Science (Vol. 2466, pp. 270–289). Berlin, Germany: Springer.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2008

Authors and Affiliations

  1. 1.School of Engineering and Applied SciencesHarvard UniversityCambridgeUSA
  2. 2.Department of Computer ScienceUniversity of MassachusettsAmherstUSA

Personalised recommendations