A Model of External Memory for Navigation in Partially Observable Visual Reinforcement Learning Tasks

  • Robert J. SmithEmail author
  • Malcolm I. Heywood
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11451)


Visual reinforcement learning implies that, decision making policies are identified under delayed rewards from an environment. Moreover, state information takes the form of high-dimensional data, such as video. In addition, although the video might characterize a 3D world in high resolution, partial observability will place significant limits on what the agent can actually perceive of the world. This means that the agent also has to: (1) provide efficient encodings of state, (2) store the encodings of state efficiently in some form of memory, (3) recall such memories after arbitrary delays for decision making. In this work, we demonstrate how an external memory model facilitates decision making in the complex world of multi-agent ‘deathmatches’ in the ViZDoom first person shooter environment. The ViZDoom environment provides a complex environment of multiple rooms and resources in which agents are spawned from multiple different locations. A unique approach is adopted to defining external memory for genetic programming agents in which: (1) the state of memory is shared across all programs. (2) Writing is formulated as a probabilistic process, resulting in different regions of memory having short- versus long-term memory. (3) Read operations are indexed, enabling programs to identify regions of external memory with specific temporal properties. We demonstrate that agents purposefully navigate the world when external memory is provided, whereas those without external memory are limited to merely ‘flight or fight’ behaviour.


External memory Visual reinforcement learning First person shooter Partially observable Tangled program graphs 



This research was supported by NSERC grant CRDJ 499792.


  1. 1.
    Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)CrossRefGoogle Scholar
  2. 2.
    Hessel, M., et al.: Rainbow: combining improvements in deep reinforcement learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 3215–3222 (2018)Google Scholar
  3. 3.
    Kelly, S., Heywood, M.I.: Emergent tangled graph representations for Atari game playing agents. In: McDermott, J., Castelli, M., Sekanina, L., Haasdijk, E., García-Sánchez, P. (eds.) EuroGP 2017. LNCS, vol. 10196, pp. 64–79. Springer, Cham (2017). Scholar
  4. 4.
    Kelly, S., Heywood, M.I.: Emergent solutions to high-dimensional multitask reinforcement learning. Evol. Comput. 26(3), 347–380 (2018)CrossRefGoogle Scholar
  5. 5.
    Wilson, D.G., Cussat-Blanc, S., Luga, H., Miller, J.F.: Evolving simple programs for playing Atari games. In: ACM Genetic and Evolutionary Computation Conference, pp. 229–236 (2018)Google Scholar
  6. 6.
    Bellemare, M.G., Naddaf, Y., Veness, J., Bowling, M.: The arcade learning environment: an evaluation platform for general agents. J. Artif. Intell. Res. 47, 253–279 (2013)CrossRefGoogle Scholar
  7. 7.
    Graves, A., Wayne, G., Danihelka, I.: Neural Turing machines. CoRR abs/1410.5401 (2014)Google Scholar
  8. 8.
    Greve, R.B., Jacobsen, E.J., Risi, S.: Evolving neural Turing machines for reward-based learning. In: ACM Genetic and Evolutionary Computation Conference, pp. 117–124 (2016)Google Scholar
  9. 9.
    Merrild, J., Rasmussen, M.A., Risi, S.: HyperNTM: evolving scalable neural Turing machines through HyperNEAT. In: Sim, K., Kaufmann, P. (eds.) EvoApplications 2018. LNCS, vol. 10784, pp. 750–766. Springer, Cham (2018). Scholar
  10. 10.
    Jaderberg, M., et al.: Human-level performance in first-person multiplayer games with population-based deep reinforcement learning. CoRR abs/1807.01281 (2018)Google Scholar
  11. 11.
    Nordin, P.: A compiling genetic programming system that directly manipulates the machine code. In: Kinnear, K.E. (ed.) Advances in Genetic Programming, pp. 311–332. MIT Press, Amsterdam (1994)Google Scholar
  12. 12.
    Huelsbergen, L.: Toward simulated evolution of machine language iteration. In: Proceedings of the Annual Conference on Genetic Programming, pp. 315–320 (1996)Google Scholar
  13. 13.
    Haddadi, F., Kayacik, H.G., Zincir-Heywood, A.N., Heywood, M.I.: Malicious automatically generated domain name detection using stateful-SBB. In: Esparcia-Alcázar, A.I. (ed.) EvoApplications 2013. LNCS, vol. 7835, pp. 529–539. Springer, Heidelberg (2013). Scholar
  14. 14.
    Agapitos, A., Brabazon, A., O’Neill, M.: Genetic programming with memory for financial trading. In: Squillero, G., Burelli, P. (eds.) EvoApplications 2016. LNCS, vol. 9597, pp. 19–34. Springer, Cham (2016). Scholar
  15. 15.
    Teller, A.: Turing completeness in the language of genetic programming with indexed memory. In: IEEE Congress on Evolutionary Computation, pp. 136–141 (1994)Google Scholar
  16. 16.
    Teller, A.: The evolution of mental models. In: Kinnear, K.E. (ed.) Advances in Genetic Programming, pp. 199–220. MIT Press, Amsterdam (1994)Google Scholar
  17. 17.
    Langdon, W.B.: Genetic Programming and Data Structures. Kluwer Academic, Dordrecht (1998)CrossRefGoogle Scholar
  18. 18.
    Andre, D.: Evolution of mapmaking ability: strategies for the evolution of learning, planning, and memory using genetic programming. In: IEEE World Congress on Computational Intelligence, pp. 250–255 (1994)Google Scholar
  19. 19.
    Brave, S.: The evolution of memory and mental models using genetic programming. In: Proceedings of the Annual Conference on Genetic Programming (1996)Google Scholar
  20. 20.
    Nordin, P., Banzhaf, W., Brameier, M.: Evolution of world model for a minature robot using genetic programming. Robot. Auton. Syst. 25, 105–116 (1998)CrossRefGoogle Scholar
  21. 21.
    Spector, L., Luke, S.: Cultural transmission of information in genetic programming. In: Annual Conference on Genetic Programming, pp. 209–214 (1996)Google Scholar
  22. 22.
    Kelly, S., Heywood, M.I.: Multi-task learning in Atari video games with emergent tangled program graphs. In: ACM Genetic and Evolutionary Computation Conference, pp. 195–202 (2017)Google Scholar
  23. 23.
    Lichodzijewski, P., Heywood, M.I.: Symbiosis, complexification and simplicity under GP. In: Proceedings of the ACM Genetic and Evolutionary Computation Conference, pp. 853–860 (2010)Google Scholar
  24. 24.
    Brameier, M., Banzhaf, W.: Linear Genetic Programming. Springer, New York (2007). Scholar
  25. 25.
    Kempka, M., Wydmuch, M., Runc, G., Toczek, J., Jaśkowski, W.: ViZDoom: a doom-based AI research platform for visual reinforcement learning. In: IEEE Conference on Computational Intelligence and Games, pp. 1–8 (2016)Google Scholar
  26. 26.
    Smith, R.J., Heywood, M.I.: Scaling tangled program graphs to visual reinforcement learning in ViZDoom. In: Castelli, M., Sekanina, L., Zhang, M., Cagnoni, S., García-Sánchez, P. (eds.) EuroGP 2018. LNCS, vol. 10781, pp. 135–150. Springer, Cham (2018). Scholar
  27. 27.
    Quiroga, R.Q., Kreiman, G., Koch, C., Fried, I.: Sparse but not ‘grandmonther-cell’ coding in the medial temporal lobe. Trends Cogn. Sci. 12(3), 87–91 (2008)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Dalhousie UniversityHalifaxCanada

Personalised recommendations