Emergent Tangled Graph Representations for Atari Game Playing Agents
- 8 Citations
- 2 Mentions
- 1.1k Downloads
Abstract
Organizing code into coherent programs and relating different programs to each other represents an underlying requirement for scaling genetic programming to more difficult task domains. Assuming a model in which policies are defined by teams of programs, in which team and program are represented using independent populations and coevolved, has previously been shown to support the development of variable sized teams. In this work, we generalize the approach to provide a complete framework for organizing multiple teams into arbitrarily deep/wide structures through a process of continuous evolution; hereafter the Tangled Program Graph (TPG). Benchmarking is conducted using a subset of 20 games from the Arcade Learning Environment (ALE), an Atari 2600 video game emulator. The games considered here correspond to those in which deep learning was unable to reach a threshold of play consistent with that of a human. Information provided to the learning agent is limited to that which a human would experience. That is, screen capture sensory input, Atari joystick actions, and game score. The performance of the proposed approach exceeds that of deep learning in 15 of the 20 games, with 7 of the 15 also exceeding that associated with a human level of competence. Moreover, in contrast to solutions from deep learning, solutions discovered by TPG are also very ‘sparse’. Rather than assuming that all of the state space contributes to every decision, each action in TPG is resolved following execution of a subset of an individual’s graph. This results in significantly lower computational requirements for model building than presently the case for deep learning.
Keywords
Reinforcement learning Task decomposition Emergent modularity Arcade learning environmentNotes
Acknowledgments
S. Kelly gratefully acknowledges support from the Nova Scotia Graduate Scholarship program. M. Heywood gratefully acknowledges support from the NSERC Discovery program. All runs were completed on cloud computing infrastructure provided by ACENET, the regional computing consortium for universities in Atlantic Canada. The TPG code base is not in any way parallel, but in adopting ACENET the five independent runs for each of the 20 games were conducted in parallel.
References
- 1.Bellemare, M.G., Naddaf, Y., Veness, J., Bowling, M.: The arcade learning environment: an evaluation platform for general agents. J. Artif. Intell. Res. 47, 253–279 (2013)Google Scholar
- 2.Nolfi, S.: Using emergent modularity to develop control systems for mobile robots. Adapt. Behav. 5(3–4), 343–363 (1997)CrossRefGoogle Scholar
- 3.Hausknecht, M., Lehman, J., Miikkulainen, R., Stone, P.: A neuroevolution approach to general Atari game playing. IEEE Trans. Comput. Intell. AI in Games 6(4), 355–366 (2014)CrossRefGoogle Scholar
- 4.Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., Hassabis, D.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)CrossRefGoogle Scholar
- 5.Koza, J.R.: Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge (1992)zbMATHGoogle Scholar
- 6.Rosca, J.: Towards automatic discovery of building blocks in genetic programming. In: Working Notes for the AAAI Symposium on Genetic Programming, AAAI, pp. 78–85, 10–12 1995Google Scholar
- 7.Spector, L., Martin, B., Harrington, K., Helmuth, T.: Tag-based modules in genetic programming. In: Proceedings of the 13th Annual Conference on Genetic and Evolutionary Computation, pp. 1419–1426. ACM (2011)Google Scholar
- 8.Brameier, M., Banzhaf, W.: Evolving teams of predictors with linear genetic programming. Genet. Program. Evolvable Mach. 2(4), 381–407 (2001)CrossRefzbMATHGoogle Scholar
- 9.Imamura, K., Soule, T., Heckendorn, R.B., Foster, J.A.: Behavioural diversity and probabilistically optimal GP ensemble. Genet. Program. Evolvable Mach. 4(3), 235–254 (2003)CrossRefGoogle Scholar
- 10.Wu, S.X., Banzhaf, W.: Rethinking multilevel selection in genetic programming. In: Proceedings of the ACM Genetic and Evolutionary Computation Conference, pp. 1403–1410 (2011)Google Scholar
- 11.Thomason, R., Soule, T.: Novel ways of improving cooperation and performance in ensemble classifiers. In: Proceedings of the ACM Genetic and Evolutionary Computation Conference, pp. 1708–1715 (2007)Google Scholar
- 12.Lichodzijewski, P., Heywood, M.I.: Managing team-based problem solving with symbiotic bid-based genetic programming. In: Proceedings of the ACM Genetic and Evolutionary Computation Conference, pp. 863–870 (2008)Google Scholar
- 13.Lichodzijewski, P., Heywood, M.I.: Symbiosis, complexification and simplicity under GP. In: Proceedings of the ACM Genetic and Evolutionary Computation Conference, pp. 853–860 (2010)Google Scholar
- 14.Kelly, S., Heywood, M.I.: On diversity, teaming, and hierarchical policies: observations from the keepaway soccer task. In: Nicolau, M., Krawiec, K., Heywood, M.I., Castelli, M., García-Sánchez, P., Merelo, J.J., Rivas Santos, V.M., Sim, K. (eds.) EuroGP 2014. LNCS, vol. 8599, pp. 75–86. Springer, Heidelberg (2014). doi: 10.1007/978-3-662-44303-3_7 Google Scholar
- 15.Kelly, S., Heywood, M.I.: Genotypic versus behavioural diversity for teams of programs under the 4-v-3 keepaway soccer task. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 3110–3111 (2014)Google Scholar
- 16.Lichodzijewski, P., Heywood, M.I.: The Rubik cube and GP temporal sequence learning: an initial study. In: Riolo, R., McConaghy, T., Vladislavleva, E. (eds.) Genetic Programming Theory and Practice VIII, 35–54. GEC. Springer, Heidelberg (2011)Google Scholar
- 17.Doucette, J.A., Lichodzijewski, P., Heywood, M.I.: Hierarchical task decomposition through symbiosis in reinforcement learning. In: Proceedings of the ACM Genetic and Evolutionary Computation Conference, pp. 97–104 (2012)Google Scholar
- 18.Kelly, S., Lichodzijewski, P., Heywood, M.I.: On run time libraries and hierarchical symbiosis. In: IEEE Congress on Evolutionary Computation, pp. 3245–3252 (2012)Google Scholar
- 19.Steenkiste, S., Koutník, J., Driessens, K., Schmidhuber, J.: A wavelet-based encoding for neuroevolution. In: Proceedings of the ACM Genetic and Evolutionary Computation Conference, pp. 517–524 (2016)Google Scholar
- 20.Brameier, M., Banzhaf, W.: Linear Genetic Programming, 1st edn. Springer, Heidelberg (2007)zbMATHGoogle Scholar
- 21.Pepels, T., Winands, M.H.M.: Enhancements for monte-carlo tree search in Ms Pac-Man. In: IEEE Symposium on Computational Intelligence in Games, pp. 265–272 (2012)Google Scholar
- 22.Schrum, J., Miikkulainen, R.: Discovering multimodal behavior in Ms. Pac-Man through evolution of modular neural networks. IEEE Trans. Comput. Intell. AI in Games 8(1), 67–81 (2016)CrossRefGoogle Scholar
- 23.Kashtan, N., Noor, E., Alon, U.: Varying environments can speed up evolution. Proc. Nat. Acad. Sci. 104(34), 13711–13716 (2007)CrossRefGoogle Scholar
- 24.Parter, M., Kashtan, N., Alon, U.: Facilitated variation: how evolution learns from past environments to generalize to new environments. PLoS Comput. Biol. 4(11), e1000206 (2008)CrossRefGoogle Scholar