Finding Simple Solutions to Multi-Task Visual Reinforcement Learning Problems with Tangled Program Graphs

Bayer, Caleidgh; Amaral, Ryan; Smith, Robert J.; Ianta, Alexandru; Heywood, Malcolm I.

doi:10.1007/978-981-16-8113-4_1

Caleidgh Bayer⁷,
Ryan Amaral⁷,
Robert J. Smith⁷,
Alexandru Ianta⁷ &
…
Malcolm I. Heywood⁷

Part of the book series: Genetic and Evolutionary Computation ((GEVO))

519 Accesses
3 Citations

Abstract

Tangled Program Graphs (TPG) represents a genetic programming framework in which emergent modularity incrementally composes programs into teams of programs into graphs of teams of programs. To date, the framework has been demonstrated on reinforcement learning tasks with stochastic partially observable state spaces or time series prediction. However, evolving solutions to reinforcement tasks often requires agents to demonstrate/ juggle multiple properties simultaneously. Hence, we are interesting in maintaining a population of diverse agents. Specifically, agent performance on a reinforcement learning task controls how much of the task they are exposed to. Premature convergence might therefore preclude solving aspects of a task that the agent only later encounters. Moreover, ‘pointless complexity’ may also result in which graphs largely consist of hitchhikers. In this research we benchmark the utilization of rampant mutation (multiple mutations applied simultaneously for offspring creation) and action programs (multiple actions per state). Several parameterizations are also introduced that potentially penalize the introduction of hitchhikers. Benchmarking over five VizDoom tasks demonstrates that rampant mutation reduces the likelihood of encountering pathologically bad offspring while action programs appears to improve performance in four out of five tasks. Finally, use of TPG parameterizations that actively limit the complexity of solutions appears to result in very efficient low dimensional solutions that generalize best across all combinations of 3, 4 and 5 VizDoom tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Implies that the interaction represents the special case of an episodic task [24].
2.
Although a minimum of two learners (with different actions) is necessary to avoid defining a degenerate team Sect. 1.2.2.
3.
An arc marking scheme has since been proposed [9], however, for the purpose of this work the original team formulation was assumed.
4.
Stochastic nature of each subtask requires that agents are evaluated over multiple initializations.
5.
https://github.com/mwydmuch/ViZDoom/tree/master/scenarios.
6.
Reflected in the parameterization of the ‘Rampant Magnitude’ row in Table 1.1.
7.
Includes introns and hitchhikers.

References

Bjedov, I., Tenaillon, O., Gerard, B., Souza, V., Denamur, E., Radman, M., Taddei, F., Matic, I.: Stress-induced mutagenesis in bacteria. Science 300, 1404–1409 (2003)
Google Scholar
Brameier, M., Banzhaf, W.: Linear Genetic Programming. Springer (2007)
Google Scholar
Branke, J.: Evolutionary approaches to dynamic environments—a survey. In: GECCO Workshop on Dynamic Optimization Problems, pp. 134–137 (1999)
Google Scholar
Cobb, H.G.: An investigation into the use of hypermutation as an adaptive operating in genetic algorithms having continuous, time-dependent non-stationary environments. Technical Report TR AIC-90-001, Naval research Laboratory (1990)
Google Scholar
Demsar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
MathSciNet MATH Google Scholar
Ghosh, A., Tstutsui, S., Tanaka, H.: Function optimization in non-stationary environment using steady state genetic algorithms with aging of individuals. In: IEEE Congress on Evolutionary Computation, pp. 666–671 (1998)
Google Scholar
Grefenstette, J.J.: Genetic algorithms for changing environments. In: PPSN, pp. 137–144 (1992)
Google Scholar
Hwangbo, J., Lee, J., Dosovitskiy, A., Bellicoso, D., Tsounis, V., Koltun, V., Hutter, M.: Learning agile and dynamic motor skills for legged robots. CoRR (2019). arXiv:abs/1901.08652
Ianta, A., Amaral, R., Bayer, C., Smith, R.J., Heywood, M.I.: On the impact of tangled program graph marking schemes under the atari reinforcement learning benchmark. In: Proceedings of the ACM Genetic and Evolutionary Computation Conference, p. to appear (2021)
Google Scholar
Jaderberg, M., Czarnecki, W.M., Dunning, I., Marris, L., Lever, G., Castañeda, A.G., Beattie, C., Rabinowitz, N.C., Morcos, A.S., Ruderman, A., Sonnerat, N., Green, T., Deason, L., Leibo, J.Z., Silver, D., Hassabis, D., Kavukcuoglu, K., Graepel, T.: Human-level performance in 3D multiplayer games with population-based reinforcement learning. Science 364, 859–865 (2019)
Google Scholar
Kelly, S., Heywood, M.I.: Emergent tangled graph representations for atari game playing agents. In: European Conference on Genetic Programming, LNCS, vol. 10196, pp. 64–79 (2017)
Google Scholar
Kelly, S., Heywood, M.I.: Emergent solutions to high-dimensional multitask reinforcement learning. Evol. Comput. 26(3), 347–380 (2018)
Google Scholar
Kelly, S., Newsted, J., Banzhaf, W., Gondro, C.: A modular memory framework for time series prediction. In: Proceedings of the ACM Genetic and Evolutionary Computation Conference, pp. 949–957 (2020)
Google Scholar
Kelly, S., Smith, R.J., Heywood, M.I.: Emergent policy discovery for visual reinforcement learning through tangled program graphs: a tutorial. In: Banzhaf, W., Spector, L., Sheneman L (eds.) Genetic Programming Theory and Practice XVI, Genetic and Evolutionary Computation, pp. 37–57 (2018)
Google Scholar
Kelly, S., Smith, R.J., Heywood, M.I., Banzhaf, W.: Emergent tangled program graphs in partially observable recursive forecasting and ViZDoom navigation tasks. ACM Trans. Evol. Learn. Optim. 1 (2021)
Google Scholar
Kempka, M., Wydmuch, M., Runc, G., Toczek, J., Jaskowski, W.: ViZDoom: A Doom-based AI research platform for visual reinforcement learning. In: IEEE Conference on Computational Intelligence and Games, pp. 1–8 (2016)
Google Scholar
Koza, J.R.: Genetic Programming—On the Programming of Computers by Means of Natural Selection. MIT Press, Complex Adaptive Systems (1993)
Google Scholar
Moriarty, D.E., Schultz, A.C., Grefenstette, J.J.: Evolutionary algorithms for reinforcement learning. J. Artif. Intell. Res. 11, 199–229 (1999)
Google Scholar
Parter, M., Kashtan, N., Alon, U.: Facilitated variation: how evolution learns from past environments to generalize to new environments. PLOS Comput. Biol. 4(11), 1–15 (2008)
Google Scholar
Smith, R.J., Heywood, M.I.: Scaling tangled program graphs to visual reinforcement learning in ViZDoom. In: European Conference on Genetic Programming, Lecture LNCS, vol. 10781, pp. 135–150 (2018)
Google Scholar
Smith, R.J., Heywood, M.I.: Evolving Dota 2 shadow fiend bots using genetic programming with external memory. In: Proceedings of the ACM Genetic and Evolutionary Computation Conference, pp. 179–187 (2019)
Google Scholar
Smith, R.J., Heywood, M.I.: A model of external memory for navigation in partially observable visual reinforcement learning tasks. In: European Conference on Genetic Programming, LNCS, vol. 11451, pp. 162–177 (2019)
Google Scholar
Sünderhauf, N., Brock, O., Scheirer, W.J., Hadsell, R., Fox, D., Leitner, J., Upcroft, B., Abbeel, P., Burgard, W., Milford, M., Corke, P.: The limits and potentials of deep learning for robotics. Int. J. Robot. Res. 37(4–5), 405–420 (2018)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT (2018)
Google Scholar
Teng, G., Popavasiliou, F.N.: Immunoglobulin somatic hypermutation. Annu. Rev. Genet. 41, 107–120 (2007)
Google Scholar

Download references

Acknowledgements

We gratefully acknowledge support from the NSERC CRD and Discovery programs (Canada).

Author information

Authors and Affiliations

Faculty of Computer Science, Dalhousie University, Halifax, NS, Canada
Caleidgh Bayer, Ryan Amaral, Robert J. Smith, Alexandru Ianta & Malcolm I. Heywood

Authors

Caleidgh Bayer
View author publications
You can also search for this author in PubMed Google Scholar
Ryan Amaral
View author publications
You can also search for this author in PubMed Google Scholar
Robert J. Smith
View author publications
You can also search for this author in PubMed Google Scholar
Alexandru Ianta
View author publications
You can also search for this author in PubMed Google Scholar
Malcolm I. Heywood
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Malcolm I. Heywood .

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, Michigan State University, East Lansing, MI, USA
Wolfgang Banzhaf
Tecnológico Nacional de México/IT de Tijuana, Tijuana, Baja California, Mexico
Leonardo Trujillo
School of Informatics, Communications and Media, University of Applied Sciences Upper Austria, Hagenberg, Austria
Stephan Winkler
Evolution Enterprise, Ann Arbor, MI, USA
Bill Worzel

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Bayer, C., Amaral, R., Smith, R.J., Ianta, A., Heywood, M.I. (2022). Finding Simple Solutions to Multi-Task Visual Reinforcement Learning Problems with Tangled Program Graphs. In: Banzhaf, W., Trujillo, L., Winkler, S., Worzel, B. (eds) Genetic Programming Theory and Practice XVIII. Genetic and Evolutionary Computation. Springer, Singapore. https://doi.org/10.1007/978-981-16-8113-4_1

Download citation

DOI: https://doi.org/10.1007/978-981-16-8113-4_1
Published: 11 February 2022
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-8112-7
Online ISBN: 978-981-16-8113-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics