Learning to Coordinate Using Commitment Sequences in Cooperative Multi-agent Systems
We report on an investigation of the learning of coordination in cooperative multi-agent systems. Specifically, we study solutions that are applicable to independent agents i.e. agents that do not observe one another’s actions. In previous research  we have presented a reinforcement learning approach that converges to the optimal joint action even in scenarios with high miscoordination costs. However, this approach failed in fully stochastic environments. In this paper, we present a novel approach based on reward estimation with a shared action-selection protocol. The new technique is applicable in fully stochastic environments where mutual observation of actions is not possible. We demonstrate empirically that our approach causes the agents to converge almost always to the optimal joint action even in difficult stochastic scenarios with high miscoordination penalties.
KeywordsReinforcement Learning Joint Action Multiagent System Stochastic Game Average Reward
Unable to display preview. Download preview PDF.
- 1.Boutilier, C.: Sequential optimality and coordination in multiagent systems. In: Proceedings of the Sixteenth International Joint Conference on Articial Intelligence (IJCAI 1999), pp. 478–485 (1999)Google Scholar
- 2.Claus, C., Boutilier, C.: The dynamics of reinforcement learning in cooperative multiagent systems. In: Proceedings of the Fifteenth National Conference on Articial Intelligence, pp. 746–752 (1998)Google Scholar
- 4.Hu, J., Wellman, M.P.: Multiagent q-learning. Machine Learning Research (2002)Google Scholar
- 5.Kapetanakis, S., Kudenko, D.: Reinforcement learning of coordination in cooperative multi-agent systems. In: Proceedings of the Eighteenth National Conference on Artificial Intelligence, AAAI 2002 (2002)Google Scholar
- 6.Lauer, M., Riedmiller, M.: An algorithm for distributed reinforcement learning in cooperative multi-agent systems. In: Proceedings of the Seventeenth International Conference in Machine Learning (2000)Google Scholar
- 7.Nowé, A., Parent, J., Verbeeck, K.: Social agents playing a periodical policy. In: Proceedings of the 12th European Conference on Machine Learning, Freiburg, Germany (2001)Google Scholar
- 8.Peshkin, L., Kim, K.-E., Meuleau, N., Kaelbling, L.: Learning to cooperate via policy search. In: Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence (2000)Google Scholar
- 9.Sen, S., Sekaran, M., Hale, J.: Learning to coordinate without sharing information. In: Proceedings of the Twelfth National Conference on Artificial Intelligence, Seattle, WA, pp. 426–431 (1994)Google Scholar
- 10.Wang, X., Sandholm, T.: Reinforcement learning to play an optimal nash equilibrium in team markov games. In: Proceedings of the 16th Neural Information Processing Systems: Natural and Synthetic (NIPS) conference, Vancouver, Canada (2002)Google Scholar
- 11.Weiss, G.: Learning to coordinate actions in multi-agent systems. In: Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence, vol. 1, pp. 311–316. Morgan Kaufmann Publ., San Francisco (1993)Google Scholar