Skip to main content
Log in

Formal models and algorithms for decentralized decision making under uncertainty

  • Published:
Autonomous Agents and Multi-Agent Systems Aims and scope Submit manuscript

Abstract

Over the last 5 years, the AI community has shown considerable interest in decentralized control of multiple decision makers or “agents” under uncertainty. This problem arises in many application domains, such as multi-robot coordination, manufacturing, information gathering, and load balancing. Such problems must be treated as decentralized decision problems because each agent may have different partial information about the other agents and about the state of the world. It has been shown that these problems are significantly harder than their centralized counterparts, requiring new formal models and algorithms to be developed. Rapid progress in recent years has produced a number of different frameworks, complexity results, and planning algorithms. The objectives of this paper are to provide a comprehensive overview of these results, to compare and contrast the existing frameworks, and to provide a deeper understanding of their relationships with one another, their strengths, and their weaknesses. While we focus on cooperative systems, we do point out important connections with game-theoretic approaches. We analyze five different formal frameworks, three different optimal algorithms, as well as a series of approximation techniques. The paper provides interesting insights into the structure of decentralized problems, the expressiveness of the various models, and the relative advantages and limitations of the different solution techniques. A better understanding of these issues will facilitate further progress in the field and help resolve several open problems that we identify.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Amato, C., Bernstein, D., & Zilberstein, S. (2007). Optimizing memory-bounded controllers for decentralized POMDP. In Proceedings of the Twenty-Third Conference on Uncertainty in Artificial Intelligence (UAI). Vancouver, Canada, July 2007.

  2. Amato, C., Bernstein, D. S., & Zilberstein, S. (2007). Solving POMDPs using quadratically constrained linear programs. In Proceedings of the Twentieth International Joint Conference on Artificial Intelligence (IJCAI) (pp. 2418–2424). Hyderabad, India, January 2007.

  3. Aström K.J. (1965) Optimal control of Markov decision processes with incomplete state estimation. Journal of Mathematical Analysis and Applications 10: 174–205

    Article  MATH  MathSciNet  Google Scholar 

  4. Becker, R., Zilberstein, S., Lesser, V., & Goldman, C. V. (2004). Solving transition independent decentralized Markov decision processes. Journal of Artificial Intelligence Research (JAIR), 22, 423–455.

    Google Scholar 

  5. Bernstein, D. S. (2005). Complexity analysis and optimal algorithms for decentralized decision making. PhD thesis, Department of Computer Science, University of Massachusetts Amherst, Amherst, MA.

  6. Bernstein D.S., Givan R., Immerman N., Zilberstein S. (2002) The complexity of decentralized control of Markov decision processes. Mathematics of Operations Research 27(4): 819–840

    Article  MATH  MathSciNet  Google Scholar 

  7. Bernstein, D. S., Hansen, E. A., & Zilberstein, S. (2005). Bounded policy iteration for decentralized POMDPs. In Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence (IJCAI) (pp. 1287–1292). Edinburgh, Scotland, July 2005.

  8. Bernstein, D. S., Zilberstein, S., & Immerman, N. (2000). The complexity of decentralized control of Markov decision processes. In Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence (UAI) (pp. 32–37). Stanford, California, June 2000.

  9. Beynier, A., & Mouaddib, A. (2006). An iterative algorithm for solving constrained decentralized Markov decision processes. In Proceedings of the Twenty-First National Conference on Artificial Intelligence (AAAI) (pp. 1089–1094). Boston, MA, July 2006.

  10. Boutilier, C. (1996). Planning, learning and coordination in multiagent decision processes. In Proceedings of the Conference on Theoretical Aspects of Rationality and Knowledge (pp. 195–210). De Zeeuwse Stromen, The Netherlands.

  11. Cogill, R., Rotkowitz, M., van Roy, B., & Lall, S. (2004). An approximate dynamic programming approach to decentralized control of stochastic systems. In Proceedings of the Allerton Conference on Communication, Control, and Computing (pp. 1040–1049). Urbana-Champaign, IL, 2004.

  12. de Farias D.P., van Roy B. (2003) The linear programming approach to approximate dynamic programming. Operations Research 51(6): 850–865

    Article  MathSciNet  MATH  Google Scholar 

  13. Doshi, P., & Gmytrasiewicz, P. J. (2005). A particle filtering based approach to approximating interactive POMDPs. In Proceedings of the Twentieth National Conference on Artificial Intelligence (AAAI) (pp. 969–974). Pittsburg, Pennsylvania, July 2005.

  14. Doshi, P., & Gmytrasiewicz, P. J. (2005). Approximating state estimation in multiagent settings using particle filters. In Proceedings of the Fourth International Joint Conference on Autonomous Agents and Multi-Agent Systems (AAMAS) (pp. 320–327). Utrecht, Netherlands, July 2005.

  15. Emery-Montemerlo, R., Gordon, G., Schneider, J., & Thrun, S. (2004). Approximate solutions for partially observable stochastic games with common payoffs. In Proceedings of the Third International Joint Conference on Autonomous Agents and Multi-Agent Systems (AAMAS) (pp. 136–143). New York, NY, July 2004.

  16. Gmytrasiewicz P.J., Doshi P. (2005) A framework for sequential planning in multiagent settings. Journal of Artificial Intelligence Research JAIR) 24: 49–79

    MATH  Google Scholar 

  17. Goldman C.V., Allen M., Zilberstein S. (2007) Learning to communicate in a decentralized environment. Autonomous Agents and Multi-Agent Systems 15(1): 47–90

    Article  Google Scholar 

  18. Goldman, C. V., & Zilberstein, S. (2003). Optimizing information exchange in cooperative multi agent systems. In Proceedings of the Second International Joint Conference on Autonomous Agents and Multi-Agent Systems (AAMAS) (pp. 137–144). Melbourne, Australia, July 2003.

  19. Goldman C.V., Zilberstein S. (2004) Decentralized control of cooperative systems: Categorization and complexity analysis. Journal of Artificial Intelligence Research (JAIR) 22: 143–174

    MATH  MathSciNet  Google Scholar 

  20. Goldman, C. V., & Zilberstein, S. (2004). Goal-oriented DEC-MDPs with direct communication. Technical Report 04-44, Department of Computer Science, University of Massachusetts Amherst.

  21. Hansen, E. A. (1998). Finite-memory control of partially observable systems. PhD thesis, Department of Computer Science, University of Massachuetts Amherst.

  22. Hansen, E. A., Bernstein, D. S., & Zilberstein, S. (2004). Dynamic programming for partially observable stochastic games. In Proceedings of the Nineteenth National Conference on Artificial Intelligence (AAAI) (pp. 709–715). San Jose, California, July 2004.

  23. Kaelbling L.P., Littmann M.L., Cassandra A.R. (1998) Planning and acting in partially observable stochastic domains. Artificial Intelligence 101(2): 99–134

    Article  MATH  MathSciNet  Google Scholar 

  24. Kalai E., Lehrer E. (1993) Rational learning leads to nash equilibrium. Econometrica 1: 1231–1240

    Article  MathSciNet  Google Scholar 

  25. Madani O., Hanks S., Condon A. (2003) On the undecidability of probabilistic planning and related stochastic optimization problems. Artificial Intelligence 147(1–2): 5–34

    MATH  MathSciNet  Google Scholar 

  26. Nair, R., Pynadath, D., Yokoo, M., Tambe, M., & Marsella, S. (2003). Taming decentralized POMDPs: Towards efficient policy computation for multiagent settings. In Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence (IJCAI) (pp. 705–711). Acapulco, Mexico, August 2003.

  27. Nair, R., Varakantham, P., Tambe, M., & Yokoo, M. (2005). Networked distributed POMDPs: A synthesis of distributed constraint optimization and POMDPs. In Proceedings of the Twentieth National Conference on Artificial Intelligence (AAAI) (pp. 133–139). Pittsburgh, Pennsylvania, July 2005.

  28. Nisan, N., Roughgarden, T., Tardos, E., Vazirani, V.V. (eds) (2007) Algorithmic Game Theory. Cambridge University Press., New York, NY

    MATH  Google Scholar 

  29. Ooi, J. M., & Wornell, G. W. (1996). Decentralized control of a multiple access broadcast channel: Performance bounds. In Proceedings of the Thirty-Fifth Conference on Decision and Control (pp. 293–298). Kobe, Japan, December 1996.

  30. Osborne M.J., Rubinstein A. (1994) A course in game theory. The MIT Press, Cambridge, MA

    Google Scholar 

  31. Papadimitriou C.H. (1994) Computational complexity. Addison Wesley, Reading, MA

    MATH  Google Scholar 

  32. Papadimitriou C.H., Tsitsiklis J.N. (1986) Intractable problems in control theory. SIAM Journal on Control and Optimization 24(4): 639–654

    Article  MATH  MathSciNet  Google Scholar 

  33. Papadimitriou C.H., Tsitsiklis J.N. (1987) The complexity of Markov decision processes. Mathematics of Operations Research 12(3): 441–450

    Article  MATH  MathSciNet  Google Scholar 

  34. Parkes, D. C. (2008). Computational mechanism design. In Lecture notes of Tutorials at 10th Conf. on Theoretical Aspectsof Rationality and Knowledge (TARK-05). Institute of Mathematical Sciences, University of Singapore (to appear).

  35. Peshkin, L., Kim, K.-E., Meuleau, N., & Kaelbling, L. P. (2000). Learning to cooperate via policy search. In Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence (UAI) (pp. 489–496). Stanford, CA, July 2000.

  36. Petrik, M., & Zilberstein, S. (2007). Anytime coordination using separable bilinear programs. In Proceedings of the Twenty-Second Conference on Artificial Intelligence (AAAI) (pp. 750–755). Vancouver, Canada, July 2007.

  37. Poupart, P., & Boutilier, C. (2003). Bounded finite state controllers. In Advances in Neural Information Processing Systems 16 (NIPS). Vancouver, Canada, December 2003.

  38. Puterman M.L. (1994) Markov decision processes: discrete stochastic dynamic programming. Wiley, New York, NY

    MATH  Google Scholar 

  39. Pynadath D.V., Tambe M. (2002) The communicative multiagent team decision problem: Analyzing teamwork theories and models. Journal of Artificial Intelligence Research (JAIR) 16: 389–423

    MATH  MathSciNet  Google Scholar 

  40. Rabinovich, Z., Goldman, C. V., & Rosenschein, J. S. (2003). The complexity of multiagent systems: The price of silence. In Proceedings of the Second International Joint Conference on Autonomous Agents and Multi-Agent Systems (AAMAS) (pp. 1102–1103). Melbourne, Australia, 2003.

  41. Russell S., Norvig P. (2003) Artificial intelligence: A modern approach. Prentice Hall, Upper Saddle River, NJ

    Google Scholar 

  42. Seuken, S., & Zilberstein, S. (2007). Memory-bounded dynamic programming for DEC-POMDPs. In Proceedings of the Twentieth International Joint Conference on Artificial Intelligence (IJCAI) (pp. 2009–2015). Hyderabad, India, January 2007.

  43. Seuken, S., & Zilberstein, S. (2007). Improved memory-bounded dynamic programming for decentralized POMDPs. In Proceedings of the 23rd Conference on Uncertainty in Artificial Intelligence (UAI). Vancouver, Canada, July 2007.

  44. Shen, J., Becker, R., & Lesser, V. (2006). Agent interaction in distributed MDPs and its implications on complexity. In Proceedings of the Fifth International Joint Conference on Autonomous Agents and Multi-Agent Systems (AAMAS). Hakodate, Japan, May 2006.

  45. Suzuki I., Yamashita M. (1999) Distributed anonymous mobile robots: Formation of geometric patterns. SIAM Journal on Computing 28(4): 1347–1363

    Article  MATH  MathSciNet  Google Scholar 

  46. Szer, D., & Charpillet, F. (2005). An optimal best-first search algorithm for solving infinite horizon DEC-POMDPs. In Proceedings of the Sixteenth European Conference on Machine Learning (ECML) (pp. 389–399). Porto, Portugal, October 2005.

  47. Szer, D., & Charpillet, F. (2006). Point-based dynamic programming for DEC-POMDPs. In Proceedings of the Twenty-First National Conference on Artificial Intelligence (AAAI) (pp. 1233–1238). Boston, MA, July 2006.

  48. Szer, D., Charpillet, F., & Zilberstein, S. (2005). MAA*: A heuristic search algorithm for solving decentralized POMDPs. In Proceedings of the Twenty-First Conference on Uncertainty in Artificial Intelligence (UAI) (pp. 576–583). Edinburgh, Scotland, July 2005.

  49. Tambe M. (1997) Towards flexible teamwork. Journal of Artificial Intelligence Research (JAIR) 7: 83–124

    Google Scholar 

  50. Washington, R., Golden, K., Bresina, J., Smith, D. E., Anderson, C., & Smith, T. (1999). Autonomous rovers for Mars exploration. In Proceedings of the IEEE Aerospace Conference (pp. 237–251). Snowmass, CO, March 1999.

  51. Zilberstein, S., Washington, R.,Bernstein, D. S., & Mouaddib, A.-I. (2002). Decision-theoretic control of planetary rovers. In Beetz, M., Guibas, L., Hertzberg, J., Ghallab, M., & Pollack, M.E. (Eds.), Advances in Plan-Based Control of Robotic Agents, Lecture Notes in Computer Science (Vol. 2466, pp. 270–289). Berlin, Germany: Springer.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sven Seuken.

Additional information

This work was done while S. Seuken was a graduate student in the Computer Science Department of the University of Massachusetts, Amherst.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Seuken, S., Zilberstein, S. Formal models and algorithms for decentralized decision making under uncertainty. Auton Agent Multi-Agent Syst 17, 190–250 (2008). https://doi.org/10.1007/s10458-007-9026-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10458-007-9026-5

Keywords

Navigation