On learning soccer strategies
We use simulated soccer to study multiagent learning. Each team's players (agents) share action set and policy but may behave differently due to position-dependent inputs. All agents making up a team are rewarded or punished collectively in case of goals. We conduct simulations with varying team sizes, and compare two learning algorithms: TD-Q learning with linear neural networks (TD-Q) and Probabilistic Incremental Program Evolution (PIPE). TD-Q is based on evaluation functions (EFs) mapping input/action pairs to expected reward, while PIPE searches policy space directly. PIPE uses an adaptive probability distribution to synthesize programs that calculate action probabilities from current inputs. Our results show that TD-Q has difficulties to learn appropriate shared EFs. PIPE, however, does not depend on EFs and finds good policies faster and more reliably.
Unable to display preview. Download preview PDF.
- 1.D. P. Bertsekas. Neuro-Dynamic Programming. Athena Scientific, Belmont, MA, 1996.Google Scholar
- 2.N. L. Cramer. A representation for the adaptive generation of simple sequential programs. In J.J. Grefenstette, editor, Proceedings of an International Conference on Genetic Algorithms and Their Applications, pages 183–187, Hillsdale NJ, 1985. Lawrence Erlbaum Associates.Google Scholar
- 3.L. A. Levin. Universal sequential search problems. Problems of Information Transmission, 9(3):265–266, 1973.Google Scholar
- 4.L. J. Lin. Reinforcement Learning for Robots Using Neural Networks. PhD thesis, Carnegie Mellon University, Pittsburgh, January 1993.Google Scholar
- 5.M. L. Littman. Markov games as a framework for multi-agent reinforcement learning. In A. Prieditis and S. Russell, editors, Machine Learning: Proceedings of the Eleventh International Conference, pages 157–163. Morgan Kaufmann Publishers, San Francisco, CA, 1994.Google Scholar
- 6.R. P. Sałustowicz and J. Schmidhuber. Probabilistic incremental program evolution. Evolutionary Computation, to appear, 1997. See ftp://ftp.idsia.ch/pub/rafal/PIPE.ps.gz.Google Scholar
- 7.R. P. Sałustowicz, M. A. Wiering, and J. Schmidhuber. Learning team strategies with multiple policy-sharing agents: A soccer case study. Technical Report IDSIA-29-97, IDSIA, 1997. See ftp://ftp.idsia.ch/pub/rafal/soccer.ps.gz.Google Scholar