Abstract
Reinforcement learning (RL) tasks have often assumed a Markov decision process, which is to say, state information is ‘complete’, hence there is no need to learn what to learn from. However, recent advances—such as visual reinforcement learning—have enabled the tasks typically addressed using RL to expand to include significant amounts of partial observability. This implies that the representation needs to support multiple forms of memory, thus credit assignment needs to: find efficient ways to encode high dimensional data, as well has, determining under what conditions to save and recall specific pieces of information, and for what purpose. In this work, we assume the tangled program graph (TPG) formulation for genetic programming, where this has already demonstrated competitiveness with deep learning solutions to multiple RL tasks (under complete information). In this work, TPG is augmented with indexed memory using a probabilistic formulation of a write operation (defines long and short term memory) and an indexed read. Moreover, the register information specific to the programs co-operating within a program is used to provide the low dimensional encoding of state. We demonstrate that TPG can then successfully evolve a behaviour for a hero bot in the Dota 2 game engine when playing in a single lane 1-on-1 configuration with the game engine hero bot as the opponent. Specific recommendations are made regarding the design of an appropriate fitness function. We show that TPG without indexed memory completely fails to learn any useful behaviour. Only with indexed memory are useful hero behaviours discovered.
This is a preview of subscription content, access via your institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
- 2.
- 3.
- 4.
The computational resources used by OpenAI are in the order of 180 years of gameplay per day.
- 5.
- 6.
LSTM is widely used as it addresses one of the potential pathologies of recurrent connectivity under gradient decent, that of vanishing gradients.
- 7.
Conditional instructions could change this [6].
- 8.
Indexed memory is initialized once at generation zero with NULL content.
- 9.
Up to 20,000 features are available, however, we are only interested in the case of a 1-on-1 single lane configuration of the game (as opposed to 5 heroes per team over 3 lanes).
- 10.
- 11.
- 12.
Heat map produced using [5].
References
Agapitos, A., Brabazon, A., O’Neill, M.: Genetic programming with memory for financial trading. In: EvoApplications, LNCS, vol. 9597, pp. 19–34 (2016)
Aiyer, S.V.B., Niranjan, N., Fallside, F.: A theoretical investigation into the performance of the Hopfield model. IEEE Transactions on Neural Networks 15, 204–215 (1990)
Andersson, B., Nordin, P., Nordahl, M.: Reactive and memory-based genetic programming for robot control. In: European Conference on Genetic Programming, LNCS, vol. 1598, pp. 161–172 (1999)
Andre, D.: Evolution of mapmaking ability: Strategies for the evolution of learning, planning, and memory using genetic programming. In: IEEE World Congress on Computational Intelligence, pp. 250–255 (1994)
Babicki, S., Arndt, D., Marcu, A., Liang, Y., Grant, J.R., Maciejewski, A., Wishart, D.S.: Heatmapper: web-enabled heat mapping for all. Nucleic Acids Research (2016). http://www.heatmapper.ca/
Brameier, M., Banzhaf, W.: Linear Genetic Programming. Springer (2007)
Brave, S.: The evolution of memory and mental models using genetic programming. In: Proceedings of the Annual Conference on Genetic Programming (1996)
Elman, J.L.: Finding structure in time. Cognitive Science 14, 179–211 (1990)
Graves, A., Wayne, G., Danihelka, I.: Neural turing machines. CoRR abs/1410.5401 (2014)
Graves, A., Wayne, G., Reynolds, M., Harley, T., Danihelka, I., Grabska-Barwinska, A., Colmenarejo, S.G., Grefenstette, E., Ramalho, T., Agapiou, J., Badia, A.P., Hermann, K.M., Zwols, Y., Ostrovski, G., Cain, A., King, H., Summerfield, C., Blunsom, P., Kavukcuoglu, K., Hassabis, D.: Hybrid computing using a neural network with dynamic external memory. Nature 538(7626), 471–476 (2016)
Greff, K., Srivastava, R.K., Koutník, J., Steunebrink, B.R., Schmidhuber, J.: LSTM: A search space odyssey. IEEE Transactions on Neural Networks and Learning Systems 28(10), 2222–2231 (2017)
Greve, R.B., Jacobsen, E.J., Risi, S.: Evolving neural turing machines for reward-based learning. In: ACM Genetic and Evolutionary Computation Conference, pp. 117–124 (2016)
Grossberg, S.: Content-addressable memory storage by neural networks: A general model and global Liapunov method. In: E.L. Schwartz (ed.) Computational Neuroscience, pp. 56–65. MIT Press (1990)
Haddadi, F., Kayacik, H.G., Zincir-Heywood, A.N., Heywood, M.I.: Malicious automatically generated domain name detection using stateful-SBB. In: EvoApplications, LNCS, vol. 7835, pp. 529–539 (2013)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Computation 9(8), 1735–1780 (1997)
Huelsbergen, L.: Toward simulated evolution of machine language iteration. In: Proceedings of the Annual Conference on Genetic Programming, pp. 315–320 (1996)
Jaderberg, M., Czarnecki, W.M., Dunning, I., Marris, L., Lever, G., Castañeda, A.G., Beattie, C., Rabinowitz, N.C., Morcos, A.S., Ruderman, A., Sonnerat, N., Green, T., Deason, L., Leibo, J.Z., Silver, D., Hassabis, D., Kavukcuoglu, K., Graepel, T.: Human-level performance in 3D multiplayer games with population-based reinforcement learning. Science 364, 859–865 (2019)
Kelly, S., Banzhaf, W.: Temporal memory sharing in visual reinforcement learning. In: W. Banzhaf, E. Goodman, L. Sheneman, L. Trujillo, B. Worzel (eds.) Genetic Programming Theory and Practice, vol. XVII. Springer (2020)
Kelly, S., Heywood, M.I.: Emergent tangled graph representations for Atari game playing agents. In: European Conference on Genetic Programming, LNCS, vol. 10196, pp. 64–79 (2017)
Kelly, S., Heywood, M.I.: Multi-task learning in Atari video games with emergent tangled program graphs. In: ACM Genetic and Evolutionary Computation Conference, pp. 195–202 (2017)
Kelly, S., Heywood, M.I.: Emergent solutions to high-dimensional multitask reinforcement learning. Evolutionary Computation 26(3), 347–380 (2018)
Kelly, S., Smith, R.J., Heywood, M.I.: Emergent policy discovery for visual reinforcement learning through tangled program graphs: A tutorial. In: W. Banzhaf, L. Spector, L. Sheneman (eds.) Genetic Programming Theory and Practice, vol. XVI, chap. 3, pp. 37–57. Springer (2019)
Langdon, W.B.: Genetic Programming and Data Structures. Kluwer Academic (1998)
Lichodzijewski, P., Heywood, M.I.: Symbiosis, complexification and simplicity under GP. In: Proceedings of the ACM Genetic and Evolutionary Computation Conference, pp. 853–860 (2010)
Machado, M.C., Bellemare, M.G., Talvitie, E., Veness, J., Hausknecht, M., Bowling, M.: Revisiting the arcade learning environment: evaluation protocols and open problems for general agents. Journal of Artificial Intelligence Research 61, 523–562 (2018)
Merrild, J., Rasmussen, M.A., Risi, S.: Hyperntm: Evolving scalable neural turing machines through hyperneat. In: EvoApplications, pp. 750–766 (2018)
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., Hassabis, D.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Nordin, P.: A compiling genetic programming system that directly manipulates the machine code. In: K.E. Kinnear (ed.) Advances in Genetic Programming, pp. 311–332. MIT Press (1994)
Poli, R., McPhee, N.F., Citi, L., Crane, E.: Memory with memory in genetic programming. Journal of Artificial Evolution and Applications (2009)
Salimans, T., Ho, J., Chen, X., Sutskever, I.: Evolution strategies as a scalable alternative to reinforcement learning. CoRR abs/1703.03864 (2016)
Sapienza, A., Peng, H., Ferrara, E.: Performance dynamics and success in online games. In: IEEE International Conference on Data Mining Workshops, pp. 902–909 (2017)
Smith, R.J., Heywood, M.I.: Scaling tangled program graphs to visual reinforcement learning in ViZDoom. In: European Conference on Genetic Programming, LNCS, vol. 10781, pp. 135–150 (2018)
Smith, R.J., Heywood, M.I.: Evolving Dota 2 Shadow Fiend bots using genetic programming with external memory. In: Proceedings of the ACM Genetic and Evolutionary Computation Conference (2019)
Smith, R.J., Heywood, M.I.: A model of external memory for navigation in partially observable visual reinforcement learning tasks. In: European Conference on Genetic Programming, LNCS, vol. 11451, pp. 162–177 (2019)
Spector, L., Luke, S.: Cultural transmission of information in genetic programming. In: Annual Conference on Genetic Programming, pp. 209–214 (1996)
Such, F.P., Madhavan, V., Conti, E., Lehman, J., Stanley, K.O., Clune, J.: Deep neuroevolution: Genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning. CoRR abs/1712.06567 (2018)
Teller, A.: The evolution of mental models. In: K.E. Kinnear (ed.) Advances in Genetic Programming, pp. 199–220. MIT Press (1994)
Teller, A.: Turing completeness in the language of genetic programming with indexed memory. In: IEEE Congress on Evolutionary Computation, pp. 136–141 (1994)
Wayne, G., Hung, C.C., Amos, D., Mirza, M., Ahuja, A., Grabska-Barwińska, A., Rae, J., Mirowski, P., Leibo, J.Z., Santoro, A., Gemici, M., Reynolds, M., Harley, T., Abramson, J., Mohamed, S., Rezende, D., Saxton, D., Cain, A., Hillier, C., Silver, D., Kavukcuoglu, K., Botvinick, M., Hasssbis, D., Lillicrap, T.: Unsupervised predictive memory in a goal-directed agent. CoRR abs/1803.10760 (2018)
Wydmuch, M., Kempka, M., Jaśkowski, W.: ViZDoom competitions: Playing doom from pixels. IEEE Transactions on Games to appear (2019)
Acknowledgements
We gratefully acknowledge support from the NSERC CRD program (Canada).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Smith, R.J., Heywood, M.I. (2020). Evolving a Dota 2 Hero Bot with a Probabilistic Shared Memory Model. In: Banzhaf, W., Goodman, E., Sheneman, L., Trujillo, L., Worzel, B. (eds) Genetic Programming Theory and Practice XVII. Genetic and Evolutionary Computation. Springer, Cham. https://doi.org/10.1007/978-3-030-39958-0_17
Download citation
DOI: https://doi.org/10.1007/978-3-030-39958-0_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-39957-3
Online ISBN: 978-3-030-39958-0
eBook Packages: Computer ScienceComputer Science (R0)