Skip to main content

Evolving a Dota 2 Hero Bot with a Probabilistic Shared Memory Model

  • 474 Accesses

Part of the Genetic and Evolutionary Computation book series (GEVO)

Abstract

Reinforcement learning (RL) tasks have often assumed a Markov decision process, which is to say, state information is ‘complete’, hence there is no need to learn what to learn from. However, recent advances—such as visual reinforcement learning—have enabled the tasks typically addressed using RL to expand to include significant amounts of partial observability. This implies that the representation needs to support multiple forms of memory, thus credit assignment needs to: find efficient ways to encode high dimensional data, as well has, determining under what conditions to save and recall specific pieces of information, and for what purpose. In this work, we assume the tangled program graph (TPG) formulation for genetic programming, where this has already demonstrated competitiveness with deep learning solutions to multiple RL tasks (under complete information). In this work, TPG is augmented with indexed memory using a probabilistic formulation of a write operation (defines long and short term memory) and an indexed read. Moreover, the register information specific to the programs co-operating within a program is used to provide the low dimensional encoding of state. We demonstrate that TPG can then successfully evolve a behaviour for a hero bot in the Dota 2 game engine when playing in a single lane 1-on-1 configuration with the game engine hero bot as the opponent. Specific recommendations are made regarding the design of an appropriate fitness function. We show that TPG without indexed memory completely fails to learn any useful behaviour. Only with indexed memory are useful hero behaviours discovered.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-030-39958-0_17
  • Chapter length: 22 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   139.00
Price excludes VAT (USA)
  • ISBN: 978-3-030-39958-0
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   179.99
Price excludes VAT (USA)
Hardcover Book
USD   179.99
Price excludes VAT (USA)
Fig. 17.1
Fig. 17.2
Fig. 17.3
Fig. 17.4
Fig. 17.5

Notes

  1. 1.

    http://blog.dota2.com/?l=english.

  2. 2.

    https://openai.com/blog/how-to-train-your-openai-five/.

  3. 3.

    https://openai.com/blog/openai-baselines-ppo/.

  4. 4.

    The computational resources used by OpenAI are in the order of 180 years of gameplay per day.

  5. 5.

    https://dota2.gamepedia.com/Heroes.

  6. 6.

    LSTM is widely used as it addresses one of the potential pathologies of recurrent connectivity under gradient decent, that of vanishing gradients.

  7. 7.

    Conditional instructions could change this [6].

  8. 8.

    Indexed memory is initialized once at generation zero with NULL content.

  9. 9.

    Up to 20,000 features are available, however, we are only interested in the case of a 1-on-1 single lane configuration of the game (as opposed to 5 heroes per team over 3 lanes).

  10. 10.

    https://dota2.gamepedia.com/Shadow_Fiend.

  11. 11.

    https://dota2.gamepedia.com/Creep_control_techniques.

  12. 12.

    Heat map produced using [5].

References

  1. Agapitos, A., Brabazon, A., O’Neill, M.: Genetic programming with memory for financial trading. In: EvoApplications, LNCS, vol. 9597, pp. 19–34 (2016)

    Google Scholar 

  2. Aiyer, S.V.B., Niranjan, N., Fallside, F.: A theoretical investigation into the performance of the Hopfield model. IEEE Transactions on Neural Networks 15, 204–215 (1990)

    CrossRef  Google Scholar 

  3. Andersson, B., Nordin, P., Nordahl, M.: Reactive and memory-based genetic programming for robot control. In: European Conference on Genetic Programming, LNCS, vol. 1598, pp. 161–172 (1999)

    Google Scholar 

  4. Andre, D.: Evolution of mapmaking ability: Strategies for the evolution of learning, planning, and memory using genetic programming. In: IEEE World Congress on Computational Intelligence, pp. 250–255 (1994)

    Google Scholar 

  5. Babicki, S., Arndt, D., Marcu, A., Liang, Y., Grant, J.R., Maciejewski, A., Wishart, D.S.: Heatmapper: web-enabled heat mapping for all. Nucleic Acids Research (2016). http://www.heatmapper.ca/

  6. Brameier, M., Banzhaf, W.: Linear Genetic Programming. Springer (2007)

    Google Scholar 

  7. Brave, S.: The evolution of memory and mental models using genetic programming. In: Proceedings of the Annual Conference on Genetic Programming (1996)

    Google Scholar 

  8. Elman, J.L.: Finding structure in time. Cognitive Science 14, 179–211 (1990)

    CrossRef  Google Scholar 

  9. Graves, A., Wayne, G., Danihelka, I.: Neural turing machines. CoRR abs/1410.5401 (2014)

    Google Scholar 

  10. Graves, A., Wayne, G., Reynolds, M., Harley, T., Danihelka, I., Grabska-Barwinska, A., Colmenarejo, S.G., Grefenstette, E., Ramalho, T., Agapiou, J., Badia, A.P., Hermann, K.M., Zwols, Y., Ostrovski, G., Cain, A., King, H., Summerfield, C., Blunsom, P., Kavukcuoglu, K., Hassabis, D.: Hybrid computing using a neural network with dynamic external memory. Nature 538(7626), 471–476 (2016)

    CrossRef  Google Scholar 

  11. Greff, K., Srivastava, R.K., Koutník, J., Steunebrink, B.R., Schmidhuber, J.: LSTM: A search space odyssey. IEEE Transactions on Neural Networks and Learning Systems 28(10), 2222–2231 (2017)

    MathSciNet  CrossRef  Google Scholar 

  12. Greve, R.B., Jacobsen, E.J., Risi, S.: Evolving neural turing machines for reward-based learning. In: ACM Genetic and Evolutionary Computation Conference, pp. 117–124 (2016)

    Google Scholar 

  13. Grossberg, S.: Content-addressable memory storage by neural networks: A general model and global Liapunov method. In: E.L. Schwartz (ed.) Computational Neuroscience, pp. 56–65. MIT Press (1990)

    Google Scholar 

  14. Haddadi, F., Kayacik, H.G., Zincir-Heywood, A.N., Heywood, M.I.: Malicious automatically generated domain name detection using stateful-SBB. In: EvoApplications, LNCS, vol. 7835, pp. 529–539 (2013)

    Google Scholar 

  15. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Computation 9(8), 1735–1780 (1997)

    CrossRef  Google Scholar 

  16. Huelsbergen, L.: Toward simulated evolution of machine language iteration. In: Proceedings of the Annual Conference on Genetic Programming, pp. 315–320 (1996)

    Google Scholar 

  17. Jaderberg, M., Czarnecki, W.M., Dunning, I., Marris, L., Lever, G., Castañeda, A.G., Beattie, C., Rabinowitz, N.C., Morcos, A.S., Ruderman, A., Sonnerat, N., Green, T., Deason, L., Leibo, J.Z., Silver, D., Hassabis, D., Kavukcuoglu, K., Graepel, T.: Human-level performance in 3D multiplayer games with population-based reinforcement learning. Science 364, 859–865 (2019)

    MathSciNet  CrossRef  Google Scholar 

  18. Kelly, S., Banzhaf, W.: Temporal memory sharing in visual reinforcement learning. In: W. Banzhaf, E. Goodman, L. Sheneman, L. Trujillo, B. Worzel (eds.) Genetic Programming Theory and Practice, vol. XVII. Springer (2020)

    Google Scholar 

  19. Kelly, S., Heywood, M.I.: Emergent tangled graph representations for Atari game playing agents. In: European Conference on Genetic Programming, LNCS, vol. 10196, pp. 64–79 (2017)

    Google Scholar 

  20. Kelly, S., Heywood, M.I.: Multi-task learning in Atari video games with emergent tangled program graphs. In: ACM Genetic and Evolutionary Computation Conference, pp. 195–202 (2017)

    Google Scholar 

  21. Kelly, S., Heywood, M.I.: Emergent solutions to high-dimensional multitask reinforcement learning. Evolutionary Computation 26(3), 347–380 (2018)

    CrossRef  Google Scholar 

  22. Kelly, S., Smith, R.J., Heywood, M.I.: Emergent policy discovery for visual reinforcement learning through tangled program graphs: A tutorial. In: W. Banzhaf, L. Spector, L. Sheneman (eds.) Genetic Programming Theory and Practice, vol. XVI, chap. 3, pp. 37–57. Springer (2019)

    Google Scholar 

  23. Langdon, W.B.: Genetic Programming and Data Structures. Kluwer Academic (1998)

    Google Scholar 

  24. Lichodzijewski, P., Heywood, M.I.: Symbiosis, complexification and simplicity under GP. In: Proceedings of the ACM Genetic and Evolutionary Computation Conference, pp. 853–860 (2010)

    Google Scholar 

  25. Machado, M.C., Bellemare, M.G., Talvitie, E., Veness, J., Hausknecht, M., Bowling, M.: Revisiting the arcade learning environment: evaluation protocols and open problems for general agents. Journal of Artificial Intelligence Research 61, 523–562 (2018)

    MathSciNet  CrossRef  Google Scholar 

  26. Merrild, J., Rasmussen, M.A., Risi, S.: Hyperntm: Evolving scalable neural turing machines through hyperneat. In: EvoApplications, pp. 750–766 (2018)

    Google Scholar 

  27. Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., Hassabis, D.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)

    CrossRef  Google Scholar 

  28. Nordin, P.: A compiling genetic programming system that directly manipulates the machine code. In: K.E. Kinnear (ed.) Advances in Genetic Programming, pp. 311–332. MIT Press (1994)

    Google Scholar 

  29. Poli, R., McPhee, N.F., Citi, L., Crane, E.: Memory with memory in genetic programming. Journal of Artificial Evolution and Applications (2009)

    Google Scholar 

  30. Salimans, T., Ho, J., Chen, X., Sutskever, I.: Evolution strategies as a scalable alternative to reinforcement learning. CoRR abs/1703.03864 (2016)

    Google Scholar 

  31. Sapienza, A., Peng, H., Ferrara, E.: Performance dynamics and success in online games. In: IEEE International Conference on Data Mining Workshops, pp. 902–909 (2017)

    Google Scholar 

  32. Smith, R.J., Heywood, M.I.: Scaling tangled program graphs to visual reinforcement learning in ViZDoom. In: European Conference on Genetic Programming, LNCS, vol. 10781, pp. 135–150 (2018)

    Google Scholar 

  33. Smith, R.J., Heywood, M.I.: Evolving Dota 2 Shadow Fiend bots using genetic programming with external memory. In: Proceedings of the ACM Genetic and Evolutionary Computation Conference (2019)

    Google Scholar 

  34. Smith, R.J., Heywood, M.I.: A model of external memory for navigation in partially observable visual reinforcement learning tasks. In: European Conference on Genetic Programming, LNCS, vol. 11451, pp. 162–177 (2019)

    Google Scholar 

  35. Spector, L., Luke, S.: Cultural transmission of information in genetic programming. In: Annual Conference on Genetic Programming, pp. 209–214 (1996)

    Google Scholar 

  36. Such, F.P., Madhavan, V., Conti, E., Lehman, J., Stanley, K.O., Clune, J.: Deep neuroevolution: Genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning. CoRR abs/1712.06567 (2018)

    Google Scholar 

  37. Teller, A.: The evolution of mental models. In: K.E. Kinnear (ed.) Advances in Genetic Programming, pp. 199–220. MIT Press (1994)

    Google Scholar 

  38. Teller, A.: Turing completeness in the language of genetic programming with indexed memory. In: IEEE Congress on Evolutionary Computation, pp. 136–141 (1994)

    Google Scholar 

  39. Wayne, G., Hung, C.C., Amos, D., Mirza, M., Ahuja, A., Grabska-Barwińska, A., Rae, J., Mirowski, P., Leibo, J.Z., Santoro, A., Gemici, M., Reynolds, M., Harley, T., Abramson, J., Mohamed, S., Rezende, D., Saxton, D., Cain, A., Hillier, C., Silver, D., Kavukcuoglu, K., Botvinick, M., Hasssbis, D., Lillicrap, T.: Unsupervised predictive memory in a goal-directed agent. CoRR abs/1803.10760 (2018)

    Google Scholar 

  40. Wydmuch, M., Kempka, M., Jaśkowski, W.: ViZDoom competitions: Playing doom from pixels. IEEE Transactions on Games to appear (2019)

    Google Scholar 

Download references

Acknowledgements

We gratefully acknowledge support from the NSERC CRD program (Canada).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Malcolm I. Heywood .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Verify currency and authenticity via CrossMark

Cite this chapter

Smith, R.J., Heywood, M.I. (2020). Evolving a Dota 2 Hero Bot with a Probabilistic Shared Memory Model. In: Banzhaf, W., Goodman, E., Sheneman, L., Trujillo, L., Worzel, B. (eds) Genetic Programming Theory and Practice XVII. Genetic and Evolutionary Computation. Springer, Cham. https://doi.org/10.1007/978-3-030-39958-0_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-39958-0_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-39957-3

  • Online ISBN: 978-3-030-39958-0

  • eBook Packages: Computer ScienceComputer Science (R0)