A modular approach to multi-agent reinforcement learning
Several attempts have been reported to let multiple monolithic reinforcement-learning agents synthesize coordinated decision policies needed to accomplish their common goal effectively. Most of these straightforward reinforcement-learning approaches, however, scale poorly to more complex multi-agent learning problems, because the state space for each learning agent grows exponentially in the number of its partner agents engaged in the joint task. To remedy the exponentially large state space in multi-agent reinforcement learning, we previously proposed a modular approach and demonstrated its effectiveness through the application to a modified version of the pursuit problem. In this paper, the effectiveness of the proposed idea is further demonstrated using several variants of the pursuit problem. Just as in the previous case, our modular Q-learning hunters can successfully capture a randomly-evading prey agent, by synthesizing and taking advantage of effective coordinated behavior.
Unable to display preview. Download preview PDF.
- 1.Benda, M., V.Jagannathan, and R.Dodhiawalla: On Optimal Cooperation of Knowledge Sources, Technical Report BCS-G2010-28, Boeing AI Center, 1985.Google Scholar
- 2.Drogoul, A., J.Ferber, B.Corbara, and D.Fresneau: A Behavioral Simulation Model for the Study of Emergent Social Structures, F.J.Varela, et al. (Eds.): Toward a Practice of Autonomous Systems: Proc. of the First European Conference on Artificial Life, The MIT Press, 1991.Google Scholar
- 3.Gasser, L. et al.: Representing and Using Organizational Knowledge in Distributed AI Systems, L.Gasser, and M.N.Huhns (Eds.): Distributed Artificial Intelligence, Vol.II, Morgan Kaufmann Publishers, Inc., 1989.Google Scholar
- 4.Levy, R., and J.S.Rosenschein: A Game Theoretic Approach to Distributed Artificial Intelligence, MAAMAW'94 Pre-Proc. of the 3rd European Workshop on Modeling Autonomous Agents in a Multi-Agent World (available as technical document D-91-10 of German Research Center on AI), 1991.Google Scholar
- 5.Ono, N., T.Ohira, and A.T.Rahmani: Emergent Organization of Interspecies Communication in Q-learning Artificial Organisms, in F.Móran et al.: (Eds.) Advances in Artificial Life: Proc. of the 3rd European Conference on Artificial Life, Springer, 1995.Google Scholar
- 6.Ono, N., and K.Fukumoto: Collective Behavior by Modular Reinforcement-Learning Animats, P.Maes et al.(Eds.): From Animals to Animats 4: Proc. of the 4th International Conference on Simulation of Adaptive Behavior, The MIT Press, 1996.Google Scholar
- 7.Ono, N., and K.Fukumoto: Multi-agent Reinforcement Learning: A Modular Approach, Proc, of the 2nd International Conference on Multi-agent Systems, AAAI Press, 1996.Google Scholar
- 8.Rahmani, A.T., and N.Ono: Co-Evolution of Communication in Artificial Organisms, Proc. of the 12th International Workshop on Distributed Artificial Intelligence, 1993.Google Scholar
- 9.Tan, M.: Multi-agent Reinforcement Learning: Independent vs. Cooperative Agents, Proc. of the 10th International Conference on Machine Learning, 1993.Google Scholar
- 10.Yanco, H., and L.A.Stein: An Adaptive Communication Protocol for Cooperating Mobile Robots, From Animals to Animats 2, The MIT Press, 1992.Google Scholar
- 11.Watkins, C.J.C.H.: Learning With Delayed Rewards, Ph.D.thesis, Cambridge University, 1989.Google Scholar
- 12.Whitehead, S. et al.: Learning Multiple Goal Behavior via Task Decomposition and Dynamic Policy Merging, in J.H.Connell et al. (Eds.): Robot Learning, Kluwer Academic Press, 1993.Google Scholar