Advertisement

Machine Learning

, Volume 33, Issue 2–3, pp 263–282 | Cite as

Learning Team Strategies: Soccer Case Studies

  • Rafał P. Sałustowicz
  • Marco A. Wiering
  • Jürgen Schmidhuber
Article

Abstract

We use simulated soccer to study multiagent learning. Each team's players (agents) share action set and policy, but may behave differently due to position-dependent inputs. All agents making up a team are rewarded or punished collectively in case of goals. We conduct simulations with varying team sizes, and compare several learning algorithms: TD-Q learning with linear neural networks (TD-Q), Probabilistic Incremental Program Evolution (PIPE), and a PIPE version that learns by coevolution (CO-PIPE). TD-Q is based on learning evaluation functions (EFs) mapping input/action pairs to expected reward. PIPE and CO-PIPE search policy space directly. They use adaptive probability distributions to synthesize programs that calculate action probabilities from current inputs. Our results show that linear TD-Q encounters several difficulties in learning appropriate shared EFs. PIPE and CO-PIPE, however, do not depend on EFs and find good policies faster and more reliably. This suggests that in some multiagent learning scenarios direct search in policy space can offer advantages over EF-based approaches.

multiagent reinforcement learning soccer TD-Q learning evaluation functions probabilistic incremental program evolution coevolution 

References

  1. Albus, J.S. (1975). A new approach to manipulator control: The cerebellar model articulation controller (CMAC). Dynamic Systems, Measurement and Control, 97, 220–227.Google Scholar
  2. Asada, M., Uchibe, E., Noda, S., Tawaratsumida, S., & Hosoda, K. (1994). A vision-based reinforcement learning for coordination of soccer playing behaviors. Proceedings of AAAI-94 Workshop on AI and A-life and Entertainment (pp. 16–21).Google Scholar
  3. Baluja, S. (1994). Population-based incremental learning: A method for integrating genetic search based function optimization and competitive learning (Technical Report CMU-CS-94-163), Carnegie Mellon University, Pittsburgh.Google Scholar
  4. Baluja, S., & Caruana, R. (1995). Removing the genetics from the standard genetic algorithm. In A. Prieditis, & S. Russell (Eds.), Machine Learning: Proceedings of the Twelfth International Conference (pp. 38–46). San Francisco, CA: Morgan Kaufmann Publishers.Google Scholar
  5. Bertsekas, D.P., & Tsitsiklis, J.N. (1996). Neuro-Dynamic Programming. Belmont, MA: Athena Scientific.Google Scholar
  6. Cramer, N.L. (1985). A representation for the adaptive generation of simple sequential programs. In J. Grefenstette (Ed.), Proceedings of an International Conference on Genetic Algorithms and their Applications (pp. 183–187). Hillsdale, NJ: Lawrence Erlbaum Associates.Google Scholar
  7. Crites, R., & Barto, A. (1996). Improving elevator performance using reinforcement learning. In D. Touretzky, M. Mozer, & M. Hasselmo (Eds.), Advances in Neural Information Processing Systems (Vol. 8, pp. 1017–1023). Cambridge, MA: MIT Press.Google Scholar
  8. Dickmanns, D., Schmidhuber, J., & Winklhofer, A. (1987). Der genetische Algorithmus: Eine Implementierung in Prolog. Fortgeschrittenenpraktikum, Institut für Informatik, Lehrstuhl Prof. Radig, Technische Universität München.Google Scholar
  9. Gallant, S.I. (1993). Neural Network Learning and Expert Systems. Cambridge, MA: MIT Press.Google Scholar
  10. Koza, J.R. (1992). Genetic Programming—On the Programming of Computers by Means of Natural Selection. Cambridge, MA: MIT Press.Google Scholar
  11. Levin, L.A. (1973). Universal sequential search problems. Problems of Information Transmission, 9(3), 265–266.Google Scholar
  12. Levin, L.A. (1984). Randomness conservation inequalities: Information and independence in mathematical theories. Information and Control, 61, 15–37.Google Scholar
  13. Li, M., & Vitányi, P.M.B. (1993). An Introduction to Kolmogorov Complexity and its Applications. New York, NY: Springer-Verlag.Google Scholar
  14. Lin, L.J. (1993). Reinforcement learning for robots using neural networks. Ph.D. thesis, Carnegie Mellon University, Pittsburgh, PA.Google Scholar
  15. Littman, M.L. (1994). Markov games as a framework for multi-agent reinforcement learning. In A. Prieditis, & S. Russell (Eds.), Machine Learning: Proceedings of the Eleventh International Conference (pp. 157–163). San Francisco, CA: Morgan Kaufmann Publishers.Google Scholar
  16. Luke, S., Hohn, C., Farris, J., Jackson, G., & Hendler, J. (1997). Co-evolving soccer softbot team coordination with genetic programming. Proceedings of the First International Workshop on RoboCup, at the International Joint Conference on Artificial Intelligence (IJCAI-97).Google Scholar
  17. Matsubara, H., Noda, I., & Hiraki, K. (1996). Learning of cooperative actions in multi-agent systems: A case study of pass play in soccer. In S. Sen (Ed.), Working Notes for the AAAI-96 Spring Symposium on Adaptation, Coevolution and Learning in Multi-agent Systems (pp. 63–67). Menlo Park, CA: AAAI Press.Google Scholar
  18. Nadella, R., & Sen, S. (1996). Correlating internal parameters and external performance: Learning soccer agents. In G. Weiss (Ed.), Distributed Artificial Intelligence Meets Machine Learning. Learning in Multi-Agent Environments (pp. 137–150). Berlin: Springer-Verlag.Google Scholar
  19. Nowlan, S.J., & Hinton, G.E. (1992). Simplifying neural networks by soft weight sharing. Neural Computation, 4, 173–193.Google Scholar
  20. Peng, J., & Williams, R. (1996). Incremental multi-step Q-learning. Machine Learning, 22, 283–290.Google Scholar
  21. Sahota, M. (1993). Real-time intelligent behaviour in dynamic environments: Soccer-playing robots. Master's thesis, University of British Columbia.Google Scholar
  22. Sałustowicz, R.P., & Schmidhuber, J. (1997). Probabilistic incremental program evolution. Evolutionary Compu-tation, 5(2), 123–141.Google Scholar
  23. Sałustowicz, R.P., Wiering, M.A., & Schmidhuber, J. (1997a). Evolving soccer strategies. Proceedings of the Fourth International Conference on Neural Information Processing (ICONIP'97) (pp. 502–506). Singapore: Springer-Verlag.Google Scholar
  24. Sałustowicz, R.P., Wiering, M.A., & Schmidhuber, J. (1997b). On learning soccer strategies. In W. Gerstner, A. Germond, M. Hasler, & J.-D. Nicoud (Eds.), Proceedings of the Seventh International Conference on Artificial Neural Networks (ICANN'97), volume 1327 of Lecture Notes in Computer Science (pp. 769–774). Berlin Heidelberg: Springer-Verlag.Google Scholar
  25. Schmidhuber, J. (1997a). Discovering neural nets with low Kolmogorov complexity and high generalization capability. Neural Networks, 10(5), 857–873.Google Scholar
  26. Schmidhuber, J. (1997b). A general method for incremental self-improvement and multi-agent learning in unrestricted environments. In X. Yao (Ed.), Evolutionary Computation: Theory and Applications. Singapore: Scientific Publ. Co., in press.Google Scholar
  27. Schmidhuber, J., Zhao, J., & Schraudolph, N. (1997a). Reinforcement learning with self-modifying policies. In S. Thrun & L. Pratt (Eds.), Learning to Learn (pp. 293–309). Boston, MA: Kluwer.Google Scholar
  28. Schmidhuber, J., Zhao, J., & Wiering, M. (1997b). Shifting inductive bias with success-story algorithm, adaptive Levin search, and incremental self-improvement. Machine Learning, 28, 105–130.Google Scholar
  29. Solomonoff, R. (1986). An application of algorithmic probability to problems in artificial intelligence. In L.N. Kanal & J.F. Lemmer (Eds.), Uncertainty in Artificial Intelligence (pp. 473–491). Elsevier Science Publishers.Google Scholar
  30. Stone, P., & Veloso, M. (1996a). Beating a defender in robotic soccer: Memory-based learning of a continuous function. In G. Tesauro, D.S. Touretzky, & T.K. Leen (Eds.), Advances in Neural Information Processing Systems (Vol. 8, pp. 896–902). Cambridge, MA: MIT Press.Google Scholar
  31. Stone, P., & Veloso, M. (1996b). A layered approach to learning client behaviors in the robocup soccer server. Applied Artificial Intelligence (AAI), 1998, to appear.Google Scholar
  32. Sutton, R.S. (1988). Learning to predict by the methods of temporal differences. Machine Learning, 3, 9–44.Google Scholar
  33. Sutton, R.S. (1996). Generalization in reinforcement learning: Successful examples using sparse coarse coding. In D.S. Touretzky, M.C. Mozer, & M.E. Hasselmo (Eds.), Advances in Neural Information Processing Systems (Vol. 8, pp. 1038–1045). Cambridge, MA: MIT Press.Google Scholar
  34. Sutton, R.S. (1997). Personal communication at the Seventh International Conference on Artificial Neural Networks (ICANN'97).Google Scholar
  35. Tesauro, G. (1994). TD-gammon, a self-teaching backgammon program, achieves master-level play. Neural Computation, 6(2), 215–219.Google Scholar
  36. Versino, C., & Gambardella, L.M. (1997). Learning real team solutions. In G. Weiss (Ed.), DAI Meets Machine Learning, volume 1221 of Lecture Notes in Artificial Intelligence (pp. 40–61). Berlin: Springer-Verlag.Google Scholar
  37. Watkins, C. (1989). Learning from Delayed Rewards. PhD thesis, King's College, Cambridge.Google Scholar
  38. Weiss, G. (1996). Adaptation and learning in multi-agent systems: Some remarks and a bibliography. In G.Weiss & S. Sen (Eds.), Adaptation and Learning in Multi-Agent Systems, volume 1042 of Lecture Notes in Artificial Intelligence (pp. 1–21). Berlin Heidelberg: Springer-Verlag.Google Scholar
  39. Widrow, B., & Hoff, M.E. (1960). Adaptive switching circuits. 1960 IRE WESCON Convention Record (Vol. 4, pp. 96–104). New York: IRE. Reprinted in Anderson and Rosenfeld (1988).Google Scholar
  40. Wiering, M.A., & Schmidhuber, J. (1996). Solving POMDPs with Levin search and EIRA. In L. Saitta (Ed.), Machine Learning: Proceedings of the Thirteenth International Conference (pp. 534–542). San Francisco, CA: Morgan Kaufmann Publishers.Google Scholar
  41. Wiering, M.A., & Schmidhuber, J. (1997). Fast online Q(λ) (Technical Report IDSIA-21-97), IDSIA,Lugano, Switzerland.Google Scholar

Copyright information

© Kluwer Academic Publishers 1998

Authors and Affiliations

  • Rafał P. Sałustowicz
    • 1
  • Marco A. Wiering
    • 1
  • Jürgen Schmidhuber
    • 1
  1. 1.IDSIALuganoSwitzerland

Personalised recommendations