Learning coordinated behavior in a continuous environment
Interesting efforts have been made to let multiple agents learn to appropriately interact, using various reinforcement-learning algorithms. In most of these cases, however, the state space for each agent is supposed discrete. It is not clear how effectively multiple reinforcementlearning agents are able to acquire appropriate coordinated behavior in continuous state spaces. The objective of this research is to explore the potential applicability of Q-learning in multi-agent continuous environments, when applied in conjunction with a generalization technique based on CMAC. We consider a modified version of the multi-agent block pushing problem, where two learning agents are interacting in a continuous environment to accomplish their common goal. To allow our agent to treat two-dimensional vector-valued inputs, we applied a CMAC-based Q-learning algorithm. This is a variant of L.-J.Lin's QCON algorithm. The objective is to incrementally elaborate a set of CMACs which can approximately provide the action value function under an optimal policy for the learning agent. The performance of our block pushing CMAC-based Q-learning agents is evaluated quantitatively and qualitatively through simulation runs. Although it is not intended to model any particular real world problem, the results are encouraging.
Unable to display preview. Download preview PDF.
- 1.Albus, J.S.: Brain, Behavior, and Robotics, Byte Book, Chapter 6, pp.139–179, 1981.Google Scholar
- 2.Drogoul, A., J.Ferber, B.Corbara, and D.Fresneau: A Behavioral Simulation Model for the Study of Emergent Social Structures, F.J.Varela, et al. (Eds.): Toward a Practice of Autonomous Systems: Proc. of the First European Conference on Artificial Life, The MIT Press, 1991.Google Scholar
- 3.Lin, L.-J.: Self-Improving Reactive Agents Based On Reinforcement Learning, Planning and Teaching, Machine Learning, Vol.8, 1992.Google Scholar
- 4.Ono, N., and A.T. Rahmani: Self-Organization of Communication in Distributed Learning Classifier Systems, R.F. Albrecht et al. (Eds.): Artificial Neural Nets and Genetic Algorithms: Proc. of International Conference on Artificial Neural Nets and Genetic Algorithms, Springer-Verlag Wien New York, 1993.Google Scholar
- 5.Ono, N., T.Ohira, and A.T.Rahmani: Emergent Organization of Interspecies Communication in Q-learning Artificial Organisms, in F.Móran et al.: (Eds.) Advances in Artificial Life: Proc. of the 3rd European Conference on Artificial Life, Springer, 1995.Google Scholar
- 6.Ono, N., and K.Fukumoto: Collective Behavior by Modular Reinforcement-Learning Animats, P.Maes et al.(Eds.): From Animals to Animats 4: Proc. of the 4th International Conference on Simulation of Adaptive Behavior, The MIT Press, 1996.Google Scholar
- 7.Ono, N., and K.Fukumoto: Multi-agent Reinforcement Learning: A Modular Approach, Proc. of the 2nd International Conference on Multi-agent Systems, AAAI Press, 1996.Google Scholar
- 8.Sen, S., M.Sekaran, and J.Hale: Learning to Coordinate without Sharing Information, Proc. of AAAI-94, 1994.Google Scholar
- 9.Sen, S., and M.Sekaran: Multiagent Coordination with Learning Classifier Systems, G.Weiß and S.Sen (Eds.): Adaption and Learning in Multi-agent Systems, Springer, 1996.Google Scholar
- 10.Tan, M.: Multi-agent Reinforcement Learning: Independent vs. Cooperative Agents, Proc. of the 10th International Conference on Machine Learning, 1993.Google Scholar
- 11.Yanco, H., and L.A.Stein: An Adaptive Communication Protocol for Cooperating Mobile Robots, J.-A. Meyer, et al. (Eds.): From Animals to Animats 2: Proc. of the 2nd International Conference on Simulation of Adaptive Behavior, The MIT Press, 1992.Google Scholar
- 12.Watkins, C.J.C.H.: Learning With Delayed Rewards, Ph.D.thesis, Cambridge University, 1989.Google Scholar