An Optimal Best-First Search Algorithm for Solving Infinite Horizon DEC-POMDPs

  • Daniel Szer
  • François Charpillet
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3720)


In the domain of decentralized Markov decision processes, we develop the first complete and optimal algorithm that is able to extract deterministic policy vectors based on finite state controllers for a cooperative team of agents. Our algorithm applies to the discounted infinite horizon case and extends best-first search methods to the domain of decentralized control theory. We prove the optimality of our approach and give some first experimental results for two small test problems. We believe this to be an important step forward in learning and planning in stochastic multi-agent systems.


State Controller Policy Iteration Horizon Problem Horizon Case Policy Iteration Algorithm 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Altman, E.: Applications of Markov Decision Processes in Communication Networks: A Survey. Technical Report RR-3984, INRIA Sophia-Antipolis (2000)Google Scholar
  2. 2.
    Becker, R., Zilberstein, S., Lesser, V., Goldman, C.V.: Solving Transition Independent Decentralized Markov Decision Processes. Journal of Artificial Intelligence Research 22, 423–455 (2004)zbMATHMathSciNetGoogle Scholar
  3. 3.
    Bernstein, D.S., Givan, R., Immerman, N., Zilberstein, S.: The Complexity of Decentralized Control of Markov Decision Processes. Mathematics of Operations Research 27(4), 819–840 (2002)zbMATHCrossRefMathSciNetGoogle Scholar
  4. 4.
    Bernstein, D.S., Hansen, E.A., Zilberstein, S.: Bounded Policy Iteration for Decentralized POMDPs. In: Proceedings of the 24th International Joint Conference on Artificial Intelligence (2005)Google Scholar
  5. 5.
    Boutilier, C.: Sequential Optimality and Coordination in Multiagent Systems. In: Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence, pp. 478–485 (1999)Google Scholar
  6. 6.
    Chadès, I., Scherrer, B., Charpillet, F.: A Heuristic Approach for Solving Decentralized-POMDP: Assessment on the Pursuit Problem. In: Proceedings of the 2002 ACM Symposium on Applied Computing (2002)Google Scholar
  7. 7.
    Chen, F.: Decentralized supply chains subject to information delays. Management Science 45, 1076–1090 (1999)CrossRefGoogle Scholar
  8. 8.
    Hansen, E.A.: An Improved Policy Iteration Algorithm for Partially Observable MDPs. In: Proceedings of the 10th Conference on Neural Information Processing Systems (1997)Google Scholar
  9. 9.
    Hansen, E.A., Bernstein, D.S., Zilberstein, S.: Dynamic Programming for Partially Observable Stochastic Games. In: Proceedings of the 19th National Conference on Artificial Intelligence (2004)Google Scholar
  10. 10.
    Kaelbling, L.P., Littman, M.L., Cassandra, A.: Planning and Acting in Partially Observable Stochastic Domains. Artificial Intelligence 101, 99–134 (1998)zbMATHCrossRefMathSciNetGoogle Scholar
  11. 11.
    Kitano, H., Tadoroko, S.: Robocup-rescue: A Grand Challenge for Multiagent and Intelligent Systems. AI Magazine 22(1), 39–51 (2001)Google Scholar
  12. 12.
    Mataric, M.J., Sukhatme, G.S.: Task-allocation and Coordination of Multiple Robots for Planetary Exploration. In: Proceedings of the 10th International Conference on Advanced Robotics (2001)Google Scholar
  13. 13.
    Meuleau, N., Kim, K.-E., Kaelbling, L.P., Cassandra, A.: Cassandra. Solving POMDPs by Searching the Space of Finite Policies. In: Proceedings of the 15th Annual Conference on Uncertainty in Artificial Intelligence (1999)Google Scholar
  14. 14.
    Nair, R., Tambe, M., Yokoo, M., Pynadath, D., Marsella, S.: Taming Decentralized POMDPs: Towards Efficient Policy Computation for Multiagent Settings. In: Proceedings of the 18th International Joint Conference on Artificial Intelligence (2003)Google Scholar
  15. 15.
    Peshkin, L., Kim, K.-E., Meuleau, N., Kaelbling, L.: Learning to Cooperate via Policy Search. In: Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence (2000)Google Scholar
  16. 16.
    Puterman, M.L.: Markov Decision Processes – Discrete Stochastic Dynamic Programming. John Wiley & Sons, Chichester (1994)zbMATHGoogle Scholar
  17. 17.
    Pynadath, D.V., Tambe, M.: The Communicative Multiagent Team Decision Problem: Analyzing Teamwork Theories and Models. Journal of Artificial Intelligence Research, 389–423 (2002)Google Scholar
  18. 18.
    Singh, S.P., Jaakkola, T., Jordan, M.I.: Learning Without State-Estimation in Partially Observable Markovian Decision Processes. In: Proceedings of the 11th International Conference on Machine Learning (1994)Google Scholar
  19. 19.
    Szer, D., Charpillet, F., Zilberstein, S.: MAA*: A heuristic search algorithm for solving decentralized POMDPs. In: Proceedings of the 21st Conference on Uncertainty in Artificial Intelligence (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Daniel Szer
    • 1
  • François Charpillet
    • 1
  1. 1.MAIA GroupINRIA Lorraine – LORIAVandœuvre-lès-NancyFrance

Personalised recommendations