Advertisement

Solving F\(^3\)MDPs: Collaborative Multiagent Markov Decision Processes with Factored Transitions, Rewards and Stochastic Policies

  • Julia Radoszycki
  • Nathalie Peyrard
  • Régis Sabbadin
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9387)

Abstract

Multiagent Markov Decision Processes provide a rich framework to model problems of multiagent sequential decision under uncertainty, as in robotics. However, when the state space is also factored and of high dimension, even dedicated solution algorithms (exact or approximate) do not apply when the dimension of the state space and the number of agents both exceed 30, except under strong assumptions about state transitions or value function. In this paper we introduce the F\(^3\)MDP framework and associated approximate solution algorithms which can tackle much larger problems. An F\(^3\)MDP is a collaborative multiagent MDP whose state space is factored, reward function is additively factored and solution policies are constrained to be factored and can be stochastic. The proposed algorithms belong to the family of Policy Iteration (PI) algorithms. On small problems, where the optimal policy is available, they provide policies close to optimal. On larger problems belonging to the subclass of GMDPs they compete well with state-of-the-art resolution algorithms in terms of quality. Finally, we show that our algorithms can tackle very large F\(^3\)MDPs, with 100 agents and a state space of size \(2^{100}\).

Keywords

Multiagent markov decision processes Policy gradient Inference in graphical models 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bernstein, D., Givan, R., Immerman, N., Zilberstein, S.: The complexity of Decentralized Control of Markov Decision Processes. Mathematics of Operations Research 27(4), 819–840 (2002)MathSciNetCrossRefMATHGoogle Scholar
  2. 2.
    Buffet, O., Aberdeen, D.: The Factored Policy-Gradient Planner. Artificial Intelligence 173, 722–747 (2009)MathSciNetCrossRefMATHGoogle Scholar
  3. 3.
    Cheng, Q., Liu, Q., Chen, F., Ihler, A.: Variational Planning for Graph-Based MDPs. Advances in Neural Information Processing Systems 26, 2976–2984 (2013)Google Scholar
  4. 4.
    Dibangoye, J.S., Amato, C., Buffet, O., Charpillet, F.: Exploiting separability in multiagent planning with continous-state MDPs. In: Proceedings of the 13th International Conference on Autonomous Agents and Multiagent Systems (2014)Google Scholar
  5. 5.
    Dibangoye, J. S., Amato, C., Doniec, A.: Scaling up decentralized MDPs through heuristic search. In: Proceedings of the 28th Conference on Uncertainty in Artificial Intelligence, pp. 217–226 (2012)Google Scholar
  6. 6.
    Forsell, N., Sabbadin, R.: Approximate linear-programming algorithms for graph-based Markov decision processes. In: Proceedings of the 17h European Conference on Artificial Intelligence, pp. 590–594 (2006)Google Scholar
  7. 7.
    Frey, B., Mackay, D.: A revolution: belief propagation in graphs with cycles. In: Advances in Neural Information Processing Systems, pp. 479–485 (1998)Google Scholar
  8. 8.
    Guestrin, C., Koller, D., Parr, R.: Multiagent Planning with factored MDPs. In: Advances in Neural Information Processing Systems, pp. 1523–1530 (2001)Google Scholar
  9. 9.
    Hoey, J., St-Aubin, R., Hu, A., Boutilier, C.: SPUDD: stochastic planning using algebraic decision diagrams. In: Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence, pp. 279–288 (1999)Google Scholar
  10. 10.
    Kim, K-E., Dean, T., Meuleau, N.: Approximate solutions to factored Markov decision processes via greedy search in the space of finite state controllers. In: Proceedings of the 5th International Conference on Artificial Intelligence Planning Systems, pp. 323–330 (2000)Google Scholar
  11. 11.
    Kim, K.-E., Dean, T.R.: Solving factored MDPs with large action space using algebraic decision diagrams. In: Ishizuka, M., Sattar, A. (eds.) PRICAI 2002. LNCS (LNAI), vol. 2417, pp. 80–89. Springer, Heidelberg (2002) CrossRefGoogle Scholar
  12. 12.
    Kumar, A., Zilberstein, S., Toussaint, M.: Scalable multiagent planning using probabilistic inference. In: Proceedings of the 22th International Joint Conference on Artificial Intelligence (2011)Google Scholar
  13. 13.
    Littman, M., Goldsmith, J., Mundhenk, M.: The Computational Complexity of Probabilistic Planning. Journal of Artificial Intelligence Research 9, 1–36 (1998)MathSciNetMATHGoogle Scholar
  14. 14.
    Luenberger, D.G., Ye, Y.: Linear and Nonlinear Programming, 3rd edition. Springer (2008)Google Scholar
  15. 15.
    Mooij, J.M.: libDAI: A Free and open Source C++ Library for Discrete Approximate Inference in Graphical Models. Journal of Machine Learning Research 11, 2169–2173 (2010)MATHGoogle Scholar
  16. 16.
    Murphy, K.: Dynamic Bayesian networks: representation, inference and learning. PhD Thesis, School of Computer Science, University of California, Berkeley (2002)Google Scholar
  17. 17.
    Oliehoek, F.A., Whiteson, S., Spaan, M.T.J.: Approximate solutions for factored dec-PODMPs with many agents. In: Proceedings of the 12th International Conference on Autonomous Agents and Multiagent Systems (2013)Google Scholar
  18. 18.
    Peyrard, N., Sabbadin, R.: Mean field approximation of the policy iteration algorithm for graph-based Markov decision processes. In: Proceedings of the European Conference on Artificial Intelligence, pp. 595–599 (2006)Google Scholar
  19. 19.
    Puterman, M.: Markov Decision Processes. John Wiley and Sons (1994)Google Scholar
  20. 20.
    Raghavan, A., Joshi, S., Fern, A., Tadepalli, P., Khardon, R.: Planning in factored action spaces with symbolic dynamic programming. In: Proceedings of the 26th AAAI Conference on Artificial Intelligence (2012)Google Scholar
  21. 21.
    Sabbadin, R., Peyrard, N., Forsell, N.: A Framework and a Mean-Field Algorithm For The Local Conrtol of Spatial Processes. International Journal of Approximate Reasoning 53(1), 66–86 (2012)MathSciNetCrossRefMATHGoogle Scholar
  22. 22.
    Sallans, B., Hinton, G.E.: Reinforcement Learning with Factored States and Actions. Journal of Machine Learning Research 5, 1063–1088 (2004)MathSciNetMATHGoogle Scholar
  23. 23.
    St-Aubin, R., Hoey, J., Boutilier, C.: APRICODD: approximate policy construction using decision diagrams. In: Advances in Neural Information Processing Systems, pp. 1089–1095 (2000)Google Scholar
  24. 24.
    Yedidia, J.S., Freeman, W.T., Weiss, Y.: Constructing Free-Energy Approximations and Generalized Belief Propagation Algorithms. IEEE Transactions on Information Theory 51(7), 2282–2312 (2005)MathSciNetCrossRefMATHGoogle Scholar
  25. 25.
    Kok, J.R., Vlassis, N.: Collaborative Multiagent Reinforcement Learning by Payoff Propagation. Journal of Machine Learning Rsearch 7, 1789–1828 (2006)MathSciNetMATHGoogle Scholar
  26. 26.
    Guestrin, C., Lagoudakis, M., Parr, R.: Coordinated reinforcement learning. In: Proceedings of the 19th International Conference on Machine Learning (2002)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Julia Radoszycki
    • 1
  • Nathalie Peyrard
    • 1
  • Régis Sabbadin
    • 1
  1. 1.INRA-MIAT (UR 875)Castanet-TolosanFrance

Personalised recommendations