Skip to main content

Evolutionary Computation and the Reinforcement Learning Problem

  • Chapter
  • First Online:
Handbook of Evolutionary Machine Learning

Abstract

Evolution by natural selection has built a vast array of highly efficient lifelong learning organisms, as evidenced by the spectacular diversity of species that rapidly adapt to environmental change and acquire new problem-solving skills through experience. Reinforcement Learning (RL) is a machine learning problem in which an agent must learn how to map situations to actions in an unknown world in order to maximise the sum of future rewards. There are no labelled examples of situation\(\rightarrow \)action mappings to learn from and we assume that no model of environment dynamics is available. As such, learning requires active trial-and-error interaction with the world. Evolutionary Reinforcement Learning (EvoRL), the application of evolutionary computation in RL, models this search process at multiple time scales: individual learning during the lifetime of an agent (i.e., operant conditioning) and population-wide learning through natural selection. Both modes of adaptation are wildly creative and fundamental to natural systems. This chapter discusses how EvoRL addresses some critical challenges in RL including the computational cost of extended interactions, the temporal credit assignment problem, partial-observability of state, nonstationary and multi-task environments, transfer learning, and hierarchical problem decomposition. In each case, the unique potential of EvoRL is highlighted in parallel with open challenges and research opportunities.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    In real-world tasks, observations \(\overrightarrow{obs}_t\) and actions \(\vec {a}_t\) are likely to be multidimensional and expressed as a mix of discrete and continuous values.

  2. 2.

    In this case ES might be considered a form of neuroevolution.

References

  1. Sasha, A., Geoff, Nitschke.: Scalable evolutionary hierarchical reinforcement learning. In: Proceedings of the Genetic and Evolutionary Computation Conference Companion, GECCO ’22, pp. 272–275. Association for Computing Machinery, New York, NY, USA (2022)

    Google Scholar 

  2. Adami, C.: Making artificial brains: Components, topology, and optimization. Artif. Life 28(1), 157–166 (2022)

    Google Scholar 

  3. Alexandros, A., Julian, T., Simon Mark, L.: Evolving controllers for simulated car racing using object oriented genetic programming. In: Proceedings of the 9th Annual Conference on Genetic and Evolutionary Computation, GECCO ’07, pp. 1543–1550. Association for Computing Machinery, New York, NY, USA (2007)

    Google Scholar 

  4. Agogino, A., Tumer, K.: Efficient evaluation functions for evolving coordination. Evol. Comput. 16(2), 257–288 (2008)

    Google Scholar 

  5. Agogino, A., Tumer, K.: Efficient evaluation functions for evolving coordination. Evol. Comput. 16(2), 257–288 (2008)

    Article  Google Scholar 

  6. Andre, D.: Evolution of mapmaking: learning, planning, and memory using genetic programming. In: Proceedings of the First IEEE Conference on Evolutionary Computation. IEEE World Congress on Computational Intelligence, vol. 1, pp. 250–255 (1994)

    Google Scholar 

  7. David, A., Stuart, J.R.: State abstraction for programmable reinforcement learning agents. In: Eighteenth National Conference on Artificial Intelligence, pp. 119–125. American Association for Artificial Intelligence , USA (2002)

    Google Scholar 

  8. André, M.S.B., Douglas, A.A., Helio, J.C.B.: On the characteristics of sequential decision problems and their impact on evolutionary computation and reinforcement learning. In: Pierre, C., Nicolas, M., Pierrick, L., Marc, S., Evelyne, L. (eds.) Artifical Evolution, pp. 194–205. Springer, Berlin (2010)

    Google Scholar 

  9. Bai, H., Cheng, R., Jin, Y.: Evolutionary reinforcement learning: A survey. Intell. Comput. 2, 0025 (2023)

    Google Scholar 

  10. Hui, B., Ruimin, S., Yue, L., Botian, X., Ran, C.: Lamarckian platform: Pushing the boundaries of evolutionary reinforcement learning towards asynchronous commercial games. IEEE Trans. Games 1–14 (2022)

    Google Scholar 

  11. Mark Baldwin, J.: A new factor in evolution. In: Adaptive Individuals in Evolving Populations: Models and Algorithms, pp. 59–80 (1896)

    Google Scholar 

  12. Banzhaf, W., et al.: Defining and simulating open-ended novelty: Requirements, guidelines, and challenges. Theory Biosci. 135(3), 131–161 (2016)

    Google Scholar 

  13. Wolfgang, B., Peter, N., Robert, E.K., Frank, D.F.: Genetic Programming: An Introduction: On the Automatic Evolution of Computer Programs and its Applications. Morgan Kaufmann Publishers Inc. (1998)

    Google Scholar 

  14. Aspen, H.Y., Anne, G.E.C.: How working memory and reinforcement learning are intertwined: a cognitive, neural, and computational perspective. J. Cogn. Neurosci. 34(4), 551–568 (2022)

    Article  Google Scholar 

  15. Badcock, P.B., et al.: The hierarchically mechanistic mind: An evolutionary systems theory of the human brain, cognition, and behavior. Cognitive Affect. Behav. Neurosci. 19(6), 1319–1351 (2019)

    Article  Google Scholar 

  16. Bai, H., Cheng, R., Jin, Y.: Evolutionary reinforcement learning: A survey. Intell. Comput. 2, 0025 (2023)

    Article  Google Scholar 

  17. Banzhaf, W., et al.: Defining and simulating open-ended novelty: Requirements, guidelines, and challenges. Theory Biosci. 135(3), 131–161 (2016)

    Article  Google Scholar 

  18. Bishop, C.M.: Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag, Berlin (2006)

    MATH  Google Scholar 

  19. Boden, M.A.: Creative Mind: Myths and Mechanisms, 2nd edn. Routledge, USA (2003)

    Google Scholar 

  20. Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., Zaremba, W.: OpenAI Gym. arXiv 1606, 01540 (2016)

    Google Scholar 

  21. Clifford, B., Douglas, K., Arend, H.: Understanding memories of the past in the context of different complex neural network architectures. Neural Comput. 34(3), 754–780 (2022)

    Google Scholar 

  22. Buchanan, B.G.: Creativity at the metalevel: Aaai-2000 presidential address. AI Mag. 22(3), 13 (2001)

    Google Scholar 

  23. Josh, B.: Behavior chaining: Incremental behavior integration for evolutionary robotics. Artif. Life 11(64), 01 (2008)

    Google Scholar 

  24. Jessica, P.C.B., Stephen, K., Andrew, R.M., Malcolm, I.: Heywood. On synergies between diversity and task decomposition in constructing complex systems with gp. In: Proceedings of the 2016 on Genetic and Evolutionary Computation Conference Companion, GECCO ’16 Companion, pp. 969–976. Association for Computing Machinery, New York, NY, USA (2016)

    Google Scholar 

  25. Matthew, M.B., Yael, N., Andew, G.B.: Hierarchically organized behavior and its neural foundations: A reinforcement learning perspective. Cognition 113(3), 262–280 (2009) Reinforcement learning and higher cognition

    Google Scholar 

  26. Markus, B., Wolfgang, B.: Linear Genetic Programming. Springer (2007)

    Google Scholar 

  27. Christopher, C., Wesley, P., Greg, S., Caleb, B., Benjamin, H.: Using fpga devices to accelerate tree-based genetic programming: A preliminary exploration with recent technologies. In: Gisele, P., Mario, G., Zdenek, V. (eds.) Genetic Programming, pp. 182–197. Springer Nature Switzerland, Cham (2023)

    Google Scholar 

  28. Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., Zaremba, W.: OpenAI Gym. arXiv 1606, 01540 (2016)

    Google Scholar 

  29. Clifford, B., Douglas, K., Arend, H.: Understanding memories of the past in the context of different complex neural network architectures. Neural Comput. 34(3), 754–780 (2022)

    Article  Google Scholar 

  30. Colin, T.R., Belpaeme, T., Cangelosi, A., Hemion, N.: Hierarchical reinforcement learning as creative problem solving. Robot. Auton. Syst. 86, 196–206 (2016)

    Article  Google Scholar 

  31. John, C., Seth, B.: Combating coevolutionary disengagement by reducing parasite virulence. Evol. Comput. 12(2), 193–222 (2004)

    Google Scholar 

  32. Cully, A., Clune, J., Tarapore, D., Mouret, J.-B.: Robots that can adapt like animals. Nature 521(7553), 503–507 (2015)

    Article  Google Scholar 

  33. Cussat-Blanc, S., Harrington, K., Banzhaf, W.: Artificial gene regulatory networks-a review. Artif. Life 24(4), 296–328 (2019)

    Article  Google Scholar 

  34. Cédric, C., Vashisht, M., Joost, H., Jeff, C.: Scaling map-elites to deep neuroevolution. In: Proceedings of the 2020 Genetic and Evolutionary Computation Conference, GECCO ’20, pp. 67–75. Association for Computing Machinery, New York, NY, USA (2020)

    Google Scholar 

  35. Colin, T.R., Belpaeme, T., Cangelosi, A., Hemion, N.: Hierarchical reinforcement learning as creative problem solving. Robot. Auton. Syst. 86, 196–206 (2016)

    Google Scholar 

  36. Edoardo, C., Vashisht, M., Felipe, P.S., Joel, L., Kenneth, O.S., Jeff, C.: Improving exploration in evolution strategies for deep reinforcement learning via a population of novelty-seeking agents. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS’18, pp. 5032–5043. Curran Associates Inc, Red Hook, NY, USA (2018)

    Google Scholar 

  37. Christopher, C., Wesley, P., Greg, S., Caleb, B., Benjamin, H.: Using fpga devices to accelerate tree-based genetic programming: A preliminary exploration with recent technologies. In: Gisele, P., Mario, G., Zdenek, V. (eds) Genetic Programming, pp. 182–197. Springer Nature Switzerland, Cham (2023)

    Google Scholar 

  38. Daw, N.D., Niv, Y., Dayan, P.: Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nat. Neurosci. 8(12), 1704–1711 (2005)

    Article  Google Scholar 

  39. Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A fast and elitist multiobjective genetic algorithm: Nsga-ii. IEEE Trans. Evol. Comput. 6(2), 182–197 (2002)

    Article  Google Scholar 

  40. Cussat-Blanc, S., Harrington, K., Banzhaf, W.: Artificial gene regulatory networks-a review. Artif. Life 24(4), 296–328 (2019)

    Google Scholar 

  41. Daw, N.D., Niv, Y., Dayan, P.: Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nat. Neurosci. 8(12), 1704–1711 (2005)

    Google Scholar 

  42. Diederik, P.: Kingma and Jimmy Ba. A Method for Stochastic Optimization, Adam (2017)

    Google Scholar 

  43. Dietterich, T.G.: Hierarchical reinforcement learning with the maxq value function decomposition. J. Artif. Int. Res. 13(1), 227–303 (2000)

    MathSciNet  MATH  Google Scholar 

  44. Karol, D., Nicolas, S., Pierre-Yves, R., Olivier, G., Maxime, P.: Gegelati: Lightweight artificial intelligence through generic and evolvable tangled program graphs. In: Workshop on Design and Architectures for Signal and Image Processing (DASIP), International Conference Proceedings Series (ICPS). ACM, Budapest, Hungary (2021)

    Google Scholar 

  45. Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: First return, then explore. Nature 590(7847), 580–586 (2021)

    Article  Google Scholar 

  46. Stephane, D., Jean-Baptiste, M.: Behavioral diversity with multiple behavioral distances. In: 2013 IEEE Congress on Evolutionary Computation, pp. 1427–1434 (2013)

    Google Scholar 

  47. Stephane, D., Giuseppe, P., Alban, L., Alexandre, C.: Novelty search makes evolvability inevitable. In: Proceedings of the 2020 Genetic and Evolutionary Computation Conference, GECCO ’20, pp. 85–93. Association for Computing Machinery, New York, NY, USA (2020)

    Google Scholar 

  48. Elfwing, S., Uchibe, E., Doya, K., Christensen, H.I.: Evolutionary development of hierarchical learning structures. IEEE Trans. Evol. Comput. 11(2), 249–264 (2007)

    Article  Google Scholar 

  49. Elfwing, S., Uchibe, E., Doya, K., Christensen, H.I.: Co-evolution of shaping rewards and meta-parameters in reinforcement learning. Adapt. Behav. 16(6), 400–412 (2008)

    Article  Google Scholar 

  50. Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K.O., Clune, J.: First return, then explore. Nature 590(7847), 580–586 (2021)

    Google Scholar 

  51. Elfwing, S., Uchibe, E., Doya, K., Christensen, H.I.: Evolutionary development of hierarchical learning structures. IEEE Trans. Evol. Comput. 11(2), 249–264 (2007)

    Google Scholar 

  52. Elfwing, S., Uchibe, E., Doya, K., Christensen, H.I.: Co-evolution of shaping rewards and meta-parameters in reinforcement learning. Adapt. Behav. 16(6), 400–412 (2008)

    Google Scholar 

  53. William, F., Barret, Z., Noam, S.: Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity. J. Mach. Learn. Res. 23(1) (2022)

    Google Scholar 

  54. Floreano, D., Urzelai, J.: Evolutionary robots with on-line self-organization and behavioral fitness. Neural Networks 13(4), 431–443 (2000)

    Google Scholar 

  55. Evgenia, P., Jan, C., Bart, J.: A Systematic Literature Review of the Successors of “NeuroEvolution of Augmenting Topologies’’. Evol. Comput. 29(1), 1–73 (2021)

    Article  Google Scholar 

  56. Daniel Freeman, C., Erik, F., Anton, R., Sertan, G., Igor, M., Olivier, B.: Brax–a differentiable physics engine for large scale rigid body simulation. ArXiv preprint arXiv:2106.13281 (2021)

  57. Andrea, G., Jeff, D.: Munet: Evolving Pretrained Deep Neural Networks into Scalable Auto-tuning Multitask Systems (2022)

    Google Scholar 

  58. Faustino, G., Jürgen, S., Risto, M.: Accelerated neural evolution through cooperatively coevolved synapses. J. Mach. Learn. Res. 9, 937–965 (2008)

    MathSciNet  MATH  Google Scholar 

  59. Floreano, D., Urzelai, J.: Evolutionary robots with on-line self-organization and behavioral fitness. Neural Networks 13(4), 431–443 (2000)

    Article  Google Scholar 

  60. Georgios, N.: Yannakakis and Julian Togelius. Springer, Artificial Intelligence and Games (2018)

    Google Scholar 

  61. Faustino, G., Jürgen, S., Risto, M.: Accelerated neural evolution through cooperatively coevolved synapses. J. Mach. Learn. Res. 9, 937–965 (2008)

    Google Scholar 

  62. Faustino, J.G., Risto, M.: Solving non-markovian control tasks with neuroevolution. In: Proceedings of the 16th International Joint Conference on Artificial Intelligence–Volume 2, IJCAI’99, pp. 1356–1361. Morgan Kaufmann Publishers Inc, San Francisco, CA, USA (1999)

    Google Scholar 

  63. Gomez, F., Mikkulainen, R.: Incremental evolution of complex general behavior. Adapt. Behav. 5(3–4), 317–342 (1997)

    Article  Google Scholar 

  64. Gravina, D., Liapis, A., Yannakakis, G.N.: Quality diversity through surprise. IEEE Trans. Evol. Comput. 23(4), 603–616 (2019)

    Article  Google Scholar 

  65. Greenfield, P.M.: Language, tools and brain: The ontogeny and phylogeny of hierarchically organized sequential behavior. Behav. Brain Sci. 14(4), 531–551 (1991)

    Article  Google Scholar 

  66. Gupta, A., Savarese, S., Ganguli, S., Fei-Fei, L.: Embodied intelligence via learning and evolution. Nat. Commun. 12(1), 5721 (2021)

    Article  Google Scholar 

  67. Harrison, G.D.: Stated meeting. Trans. New York Acad. Sci. 15, 141–143 (1896)

    Google Scholar 

  68. Hawkins, J., Ahmad, S., Cui, Y.: A theory of how columns in the neocortex enable learning the structure of the world. Front. Neural Circuits 11, 81 (2017)

    Article  Google Scholar 

  69. Arend, H., Christoph, A.: Neuroevolution gives rise to more focused information transfer compared to backpropagation in recurrent neural networks. In: Neural Computing and Applications (2022)

    Google Scholar 

  70. Holland, J.H.: Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology. Control and Artificial Intelligence. MIT Press, Cambridge, MA, USA (1992)

    Book  Google Scholar 

  71. Arend, H., Jeffrey, A.E., Randal, S.O., David, B.K., Jory, S., Larissa, A., Ali, T., Peter, D.K., Leigh, S., Heather, G., Clifford, B., Christoph, A.: Markov brains: A technical introduction. CoRR arXiv:abs/1709.05601 (2017)

  72. Arend, H., Jory, S.: Towards an fpga accelerator for markov brains. In: Artificial Life Conference Proceedings 34, vol. 2022, p. 34. MIT Press One Rogers Street, Cambridge, MA 02142–1209, USA (2022)

    Google Scholar 

  73. Sepp, H., Yoshua, B., Paolo, F., Jürgen, S., et al. Gradient Flow in Recurrent Nets: The Difficulty of Learning LongTerm Dependencies, pp. 237–243. In: A Field Guide to Dynamical Recurrent Neural Networks. IEEE Press In (2001)

    Google Scholar 

  74. John, H.H.: Properties of the bucket brigade. In: Proceedings of the 1st International Conference on Genetic Algorithms, pp. 1–7. L. Erlbaum Associates Inc, USA (1985)

    Google Scholar 

  75. Holland, J.H.: Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology. Control and Artificial Intelligence. MIT Press, Cambridge, MA, USA (1992)

    Google Scholar 

  76. William, H.H., Scott, J.H., Edwin, R., Christopher, A.Z.: Empirical comparison of incremental learning strategies for genetic programming-based keep-away soccer agents. In: Papers from the 2004 AAAI Fall Symposium (2004)

    Google Scholar 

  77. Jianjun, H., Goodman, E., Seo, K., Fan, Z., Rosenberg, R.: The hierarchical fair competition (hfc) framework for sustainable evolutionary algorithms. Evol. Comput. 13(2), 241–277 (2005)

    Google Scholar 

  78. Jianjun, H., Goodman, E., Seo, K., Fan, Z., Rosenberg, R.: The hierarchical fair competition (hfc) framework for sustainable evolutionary algorithms. Evol. Comput. 13(2), 241–277 (2005)

    Article  Google Scholar 

  79. Joel, L., et al.: The surprising creativity of digital evolution: A collection of anecdotes from the evolutionary computation and artificial life research communities. Artif. Life 26(2), 274–306 (2020)

    Article  Google Scholar 

  80. Aditya, J., Aditya, M., Akshansh, R., Sanjay, K.: A systematic study of deep q-networks and its variations. In: 2022 2nd International Conference on Advance Computing and Innovative Technologies in Engineering (ICACITE), pp. 2157–2162 (2022)

    Google Scholar 

  81. Chi, J., Zeyuan, A-Z., Sebastien, B., Michael, I.J.: Is q-learning provably efficient? In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 31. Curran Associates, Inc. (2018)

    Google Scholar 

  82. Nicholas, K.J., Peter, S.: State abstraction discovery from irrelevant state variables. In: Proceedings of the 19th International Joint Conference on Artificial Intelligence, IJCAI’05, pp. 752–757. Morgan Kaufmann Publishers Inc, San Francisco, CA, USA (2005)

    Google Scholar 

  83. Leslie, P.K., Michael, L.L., Andrew, W.M.: Reinforcement learning: A survey. J. Artif. Int. Res. 4(1), 237–285 (1996)

    Google Scholar 

  84. Kashtan, N., Noor, E., Alon, U.: Varying environments can speed up evolution. Proceed. Nat. Acad. Sci. 104(34), 13711–13716 (2007)

    Google Scholar 

  85. John, C., Seth, B.: Combating coevolutionary disengagement by reducing parasite virulence. Evol. Comput. 12(2), 193–222 (2004)

    Article  Google Scholar 

  86. Stephen, K., Malcolm, I.H.: Discovering agent behaviors through code reuse: Examples from half-field offense and ms. pac-man. IEEE Trans. Games 10(2), 195–208 (2018)

    Google Scholar 

  87. Kelly, S., Heywood, M.I.: Emergent solutions to high-dimensional multitask reinforcement learning. Evol. Comput. 26(3), 347–380 (2018)

    Google Scholar 

  88. Stephen, K., Robert, J.S., Malcolm, I.H.: Emergent Policy Discovery for Visual Reinforcement Learning Through Tangled Program Graphs: A Tutorial, pp. 37–57. Springer International Publishing, Cham (2019)

    Google Scholar 

  89. Josh, B.: Behavior chaining: Incremental behavior integration for evolutionary robotics. Artif. Life 11(64), 01 (2008)

    Google Scholar 

  90. Julian, F.M.: IMPROBED: Multiple problem-solving brain via evolved developmental programs. Artif. Life 27(3–4), 300–335 (2022)

    Google Scholar 

  91. Shauharda, K., Kagan, T.: Evolution-guided policy gradient in reinforcement learning. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS’18, pp. 1196–1208. Curran Associates Inc, Red Hook, NY, USA (2018)

    Google Scholar 

  92. Diederik, P.: Kingma and Jimmy Ba. A Method for Stochastic Optimization. Adam (2017)

    Google Scholar 

  93. Douglas, K., Arend, H.: The role of ambient noise in the evolution of robust mental representations in cognitive systems. In: ALIFE 2019: The 2019 Conference on Artificial Life, pp. 432–439. MIT Press (2019)

    Google Scholar 

  94. Matt, K., Kagan, T.: Coevolution of heterogeneous multi-robot teams. In: Proceedings of the 12th Annual Conference on Genetic and Evolutionary Computation, GECCO ’10, pp. 127–134. Association for Computing Machinery, New York, NY, USA (2010)

    Google Scholar 

  95. Kashtan, N., Noor, E., Alon, U.: Varying environments can speed up evolution. Proceed. Nat. Acad. Sci. 104(34), 13711–13716 (2007)

    Article  Google Scholar 

  96. Kelly, S., Heywood, M.I.: Emergent solutions to high-dimensional multitask reinforcement learning. Evol. Comput. 26(3), 347–380 (2018)

    Article  Google Scholar 

  97. Kelly, S., Voegerl, T., Banzhaf, W., Gondro, C.: Evolving hierarchical memory-prediction machines in multi-task reinforcement learning. Genet. Program. Evol. Mach. 22(4), 573–605 (2021)

    Article  Google Scholar 

  98. Koza, J.R., Andre, D., Bennett, F.H., Keane, M.A.: Genetic Programming III: Darwinian Invention & Problem Solving. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (1999)

    Google Scholar 

  99. Koza, J.R., Andre, D., Bennett, F.H., Keane, M.A.: Genetic Programming III: Darwinian Invention & Problem Solving. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (1999)

    MATH  Google Scholar 

  100. Dhireesha, K., Mario, A-S., Jonathan, B., Maxim, B., Douglas, B., Josh, B., Andrew, P.B., Suraj, C.R., Nick, C., Jeff, C., Anurag, D., Stefano, F., Peter, H., Leslie, K., Nicholas, K., Zsolt, K., Soheil, K., Jeffrey, L.K., Sam, K., Michael, L., Sandeep, M., Santosh, M., Ali, M., Bruce, M., Risto, M., Zaneta, N., Tej, P., Alice, P., Praveen, K.P., Sebastian, R., Terrence, J.S., Andrea, S., Nicholas, S., Andreas, S., Tolias, D.U., Francisco, J.V-C., Gido, M.V., Joshua, T., Vogelstein, F.W., Ron, W., Angel, Y-G., Xinyun, Z., Hava, S.: Biological underpinnings for lifelong learning machines. Nat. Mach. Intell. 4(3), 196–210 (2022)

    Google Scholar 

  101. Landi, F., Baraldi, L., Cornia, M., Cucchiara, R.: Working memory connections for lstm. Neural Networks 144, 334–341 (2021)

    Article  Google Scholar 

  102. Lehman, J., Stanley, K.O.: Abandoning objectives: Evolution through the search for novelty alone. Evol. Comput. 19(2), 189–223 (2011)

    Article  Google Scholar 

  103. Leslie, P.K., Michael, L.L., Andrew, W.M.: Reinforcement learning: A survey. J. Artif. Int. Res. 4(1), 237–285 (1996)

    Google Scholar 

  104. Kyunghyun, L., Byeong-Uk, L., Ukcheol, S., In So, K.: An efficient asynchronous method for integrating evolutionary and gradient-based policy search. In: Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS’20. Curran Associates Inc, Red Hook, NY, USA (2020)

    Google Scholar 

  105. Joel, L., et al.: The surprising creativity of digital evolution: A collection of anecdotes from the evolutionary computation and artificial life research communities. Artif. Life 26(2), 274–306 (2020)

    Google Scholar 

  106. Marco, A.W.: Convergence and divergence in standard and averaging reinforcement learning. In: Jean-François, B., Floriana, E., Fosca, G., Dino, P. (eds.) Machine Learning: ECML 2004, pp. 477–488. Springer, Berlin (2004)

    Google Scholar 

  107. Eric, L., Richard, L., Robert, N., Philipp, M., Roy, F., Ken, G., Joseph, G., Michael, J., Ion, S.: RLlib: Abstractions for distributed reinforcement learning. In: Jennifer, D., Andreas, K., (eds.) Proceedings of the 35th International Conference on Machine Learning, vol. 80 of Proceedings of Machine Learning Research, pp. 3053–3062. PMLR, 10–15 (2018)

    Google Scholar 

  108. Bryan, L., Maxime, A., Luca, G., Antoine, C.: Accelerated quality-diversity for robotics through massive parallelism. arXiv preprint arXiv:2202.01258 (2022)

  109. Soo, L.L., Peter, J.B.: The “agent-based modeling for human behavior” special issue. Artif. Life 29(1), 1–2 (2023)

    Google Scholar 

  110. Qinjie, L., Han, L., Biswa, S.: Switch Trajectory Transformer with Distributional Value Approximation for Multi-task Reinforcement Learning (2022)

    Google Scholar 

  111. Siqi, L., Guy, L., Josh, M., Saran, T., Nicolas, H., Thore, G.: Emergent coordination through competition. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net (2019)

    Google Scholar 

  112. Qian, L, Zihan, Z, Abhinav, G, Fei, F, Yi, W., Xiaolong, W.: Evolutionary population curriculum for scaling multi-agent reinforcement learning. In: 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26–30, 2020. OpenReview.net (2020)

    Google Scholar 

  113. Max, J., Wojciech, M.C., Iain, D., Luke, M., Guy, L., Antonio, G.C., Charles, B., Neil, C.R., Ari, S.M., Avraham, R., Nicolas, S., Tim, G., Louise, D., Joel, Z.L., David, S., Demis, H., Koray, K., Thore, G.: Human-level performance in 3d multiplayer games with population-based reinforcement learning. Science 364(6443), 859–865 (2019)

    Article  MathSciNet  Google Scholar 

  114. Maynard Smith, J.: Group selection and kin selection. Nature 201(4924), 1145–1147 (1964)

    Article  Google Scholar 

  115. Merav, P., Nadav, K., Uri, A.: Facilitated variation: How evolution learns from past environments to generalize to new environments. PLOS Comput. Biol. 4(11), 1–15 (2008)

    Google Scholar 

  116. Mihyar Al, M., Malcolm, H.: Benchmarking ensemble genetic programming with a linked list external memory on scalable partially observable tasks. Genet. Program. Evolvable Mach. 23(Suppl 1), 1–29 (2022)

    Google Scholar 

  117. Miikkulainen, R.: Creative ai through evolutionary computation: Principles and examples. SN Comput. Sci. 2(3), 163 (2021)

    Article  Google Scholar 

  118. Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., Hassabis, D.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)

    Article  Google Scholar 

  119. Moriarty, D.E., Miikkulainen, R.: Forming neural networks through efficient and adaptive coevolution. Evol. Comput. 5(4), 373–399 (1997)

    Article  Google Scholar 

  120. Moriarty, D.E., Schultz, A.C., Grefenstette, J.J.: Evolutionary algorithms for reinforcement learning. J. Artif. Int. Res. 11(1), 241–276 (1999)

    MATH  Google Scholar 

  121. Mouret, J.B., Doncieux, S.: Encouraging behavioral diversity in evolutionary robotics: An empirical study. Evol. Comput. 20(1), 91–133 (2012)

    Article  Google Scholar 

  122. Vernon, B.M.: The columnar organization of the neocortex. Brain 120, 701–722 (1997)

    Google Scholar 

  123. Mouret, J.B., Doncieux, S.: Encouraging behavioral diversity in evolutionary robotics: An empirical study. Evol. Comput. 20(1), 91–133 (2012)

    Google Scholar 

  124. Jean-Baptiste, M., Jeff, C.: Illuminating search spaces by mapping elites. CoRR arXiv:1504.04909 (2015)

  125. Niekum, S., Barto, A.G., Spector, L.: Genetic programming for reward function search. IEEE Trans. Autonom. Mental Develop. 2(2), 83–90 (2010)

    Article  Google Scholar 

  126. Nordin, P., Banzhaf, W., Brameier, M.: Evolution of a world model for a miniature robot using genetic programming. Robot. Autonom. Syst. 25, 105–116 (1998)

    Article  Google Scholar 

  127. Niekum, S., Barto, A.G., Spector, L.: Genetic programming for reward function search. IEEE Trans. Autonom. Mental Develop. 2(2), 83–90 (2010)

    Google Scholar 

  128. Yael, N.: Reinforcement learning in the brain. J. Math. Psychol. 53(3), 139–154 (2009). Special Issue: Dynamic Decision Making

    Google Scholar 

  129. Jason, N., Richard, A.W.: Pareto coevolution: Using performance against coevolved opponents in a game as dimensions for pareto selection. In: Proceedings of the 3rd Annual Conference on Genetic and Evolutionary Computation, GECCO’01, pp. 493–500. Morgan Kaufmann Publishers Inc, San Francisco, CA, USA (2001)

    Google Scholar 

  130. Nordin, P., Banzhaf, W., Brameier, M.: Evolution of a world model for a miniature robot using genetic programming. Robot. Autonom. Syst. 25, 105–116 (1998)

    Google Scholar 

  131. Pollack, J.B., Blair, A.D.: Co-evolution in the successful learning of backgammon strategy. Mach. Learn. 32(3), 225–240 (1998)

    Article  MATH  Google Scholar 

  132. Evgenia, P., Jan, C., Bart, J.: A Systematic Literature Review of the Successors of “NeuroEvolution of Augmenting Topologies”. Evol. Comput. 29(1), 1–73 (2021)

    Google Scholar 

  133. Merav, P., Nadav, K., Uri, A.: Facilitated variation: How evolution learns from past environments to generalize to new environments. PLOS Comput. Biol. 4(11), 1–15 (2008)

    Google Scholar 

  134. Jan, P., Stefan, S.: Natural actor-critic. Neurocomputing 71(7), 1180–1190 (2008). Progress in Modeling, Theory, and Application of Computational Intelligenc

    Google Scholar 

  135. Pollack, J.B., Blair, A.D.: Co-evolution in the successful learning of backgammon strategy. Mach. Learn. 32(3), 225–240 (1998)

    Google Scholar 

  136. Risi, S., Stanley, K.O.: Deep innovation protection: Confronting the credit assignment problem in training heterogeneous neural architectures. Proceed. AAAI Conf. Artif. Intell. 35(14), 12391–12399 (2021)

    Google Scholar 

  137. Aditya, R., Risto, M.: Evolving deep lstm-based memory networks using an information maximization objective. In: Proceedings of the Genetic and Evolutionary Computation Conference 2016, GECCO ’16, pp. 501–508. Association for Computing Machinery, New York, NY, USA (2016)

    Google Scholar 

  138. Risi, S., Stanley, K.O.: Deep innovation protection: Confronting the credit assignment problem in training heterogeneous neural architectures. Proceed. AAAI Conf. Artif. Intell. 35(14), 12391–12399 (2021)

    Google Scholar 

  139. Tim, S., Jonathan, H., Xi, C., Szymon, S., Ilya, S.: Evolution strategies as a scalable alternative to reinforcement learning (2017)

    Google Scholar 

  140. Schmidhuber, J.: Curious model-building control systems. In: Proceedings 1991 IEEE International Joint Conference on Neural Networks, vol.2, pp. 1458–1463 (1991)

    Google Scholar 

  141. Rodney, A.: Brooks. Intelligence without representation. Artif. Intell. 47(1), 139–159 (1991)

    Google Scholar 

  142. Jory, S., Bamshad, S., Arend, H.: Incentivising cooperation by rewarding the weakest member. ArXiv preprint arXiv:2212.00119 (2022)

  143. John, S., Filip, W., Prafulla, D., Alec, R., Oleg, K.: Proximal policy optimization algorithms. CoRR arXiv:1707.06347 (2017)

  144. Sheneman, L., Hintze, A.: Evolving autonomous learning in cognitive networks. Sci. Rep. 7(1), 16712 (2017)

    Google Scholar 

  145. Olivier, S.: Combining evolution and deep reinforcement learning for policy search: A survey. ACM Trans. Evol. Learn. Optim. (2022) Just Accepted

    Google Scholar 

  146. Schmidhuber, J.: Formal theory of creativity, fun, and intrinsic motivation (1990–2010). IEEE Trans. Autonom. Mental Develop. 2(3), 230–247 (2010)

    Article  Google Scholar 

  147. Luca, S., Stefano, N.: Achieving long-term progress in competitive co-evolution. In: 2017 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 1–8 (2017)

    Google Scholar 

  148. Singh, S., Lewis, R.L., Barto, A.G., Sorg, J.: Intrinsically motivated reinforcement learning: An evolutionary perspective. IEEE Trans. Autonom. Mental Develop. 2(2), 70–82 (2010)

    Google Scholar 

  149. Skinner, B.F.: The Behavior of Organisms. Appleton-Century-Crofts, New York, NY (1938)

    Google Scholar 

  150. Sheneman, L., Hintze, A.: Evolving autonomous learning in cognitive networks. Sci. Rep. 7(1), 16712 (2017)

    Article  Google Scholar 

  151. Robert, J.S., Malcolm, I.H.: Evolving dota 2 shadow fiend bots using genetic programming with external memory. In: Proceedings of the Genetic and Evolutionary Computation Conference, GECCO ’19, pp. 179–187. Association for Computing Machinery, New York, NY, USA (2019)

    Google Scholar 

  152. Robert, J.S., Malcolm, I.H.: Evolving a Dota 2 Hero Bot with a Probabilistic Shared Memory Model, pp. 345–366. Springer International Publishing, Cham (2020)

    Google Scholar 

  153. Soltoggio, A., Stanley, K.O., Risi, S.: Born to learn: The inspiration, progress, and future of evolved plastic artificial neural networks. Neural Networks 108, 48–67 (2018)

    Google Scholar 

  154. Xingyou, S., Wenbo, G., Yuxiang, Y., Krzysztof, C., Aldo, P., Yunhao, T.: Es-maml: Simple hessian-free meta learning. In: International Conference on Learning Representations (2020)

    Google Scholar 

  155. Silverman, B.: The phantom fish tank: An ecology of mind. Montreal, Logo Computer Systems (1987)

    Google Scholar 

  156. Singh, S., Lewis, R.L., Barto, A.G., Sorg, J.: Intrinsically motivated reinforcement learning: An evolutionary perspective. IEEE Trans. Autonom. Mental Develop. 2(2), 70–82 (2010)

    Article  Google Scholar 

  157. Stanley, K.O., Miikkulainen, R.: Competitive coevolution through evolutionary complexification. J. Artif. Int. Res. 21(1), 63–100 (2004)

    Google Scholar 

  158. Skinner, B.F.: The Behavior of Organisms. Appleton-Century-Crofts, New York, NY (1938)

    Google Scholar 

  159. Soltoggio, A., Stanley, K.O., Risi, S.: Born to learn: The inspiration, progress, and future of evolved plastic artificial neural networks. Neural Networks 108, 48–67 (2018)

    Article  Google Scholar 

  160. Soo, L.L., Peter, J.B.: The “agent-based modeling for human behavior’’ special issue. Artif. Life 29(1), 1–2 (2023)

    Article  Google Scholar 

  161. Peter Herald, S., Manuela, M.V.: Layered Learning in Multiagent Systems. PhD thesis, Carnegie Mellon University, USA (1998). AAI9918612

    Google Scholar 

  162. Stanley, K.O., Miikkulainen, R.: Evolving neural networks through augmenting topologies. Evol. Comput. 10(2), 99–127 (2002)

    Article  Google Scholar 

  163. Stanley, K.O., Miikkulainen, R.: Competitive coevolution through evolutionary complexification. J. Artif. Int. Res. 21(1), 63–100 (2004)

    Google Scholar 

  164. Stanley, K.O., Clune, J., Lehman, J., Miikkulainen, R.: Designing neural networks through neuroevolution. Nat. Mach. Intell. 1(1), 24–35 (2019)

    Article  Google Scholar 

  165. Marcin, S., Wojciech, J., Krzysztof, K.: Coevolutionary temporal difference learning for othello. In: 2009 IEEE Symposium on Computational Intelligence and Games, pp. 104–111 (2009)

    Google Scholar 

  166. Szubert, M., Jaśkowski, W., Krawiec, K.: On scalability, generalization, and hybridization of coevolutionary learning: A case study for othello. IEEE Trans. Comput. Intell. AI Games 5(3), 214–226 (2013)

    Google Scholar 

  167. Tan, H., Zhou, Y., Tao, Q., Rosen, J., van Dijken, S.: Bioinspired multisensory neural network with crossmodal integration and recognition. Nat. Commun. 12(1), 1120 (2021)

    Google Scholar 

  168. Yujin, T., Yingtao, T., David, H.: Evojax: Hardware-accelerated neuroevolution. arXiv preprint arXiv:2202.05008 (2022)

  169. Rohan, T., Danilo, P.M., Anthony, G.C.: Pearl: Parallel evolutionary and reinforcement learning library (2022)

    Google Scholar 

  170. Taylor, M.E., Stone, P.: Transfer learning for reinforcement learning domains: A survey. J. Mach. Learn. Res. 10, 1633–1685 (2009)

    Google Scholar 

  171. Stone, P.: Layered Learning in Multiagent Systems: A Winning Approach to Robotic Soccer. MIT Press, Cambridge, MA, USA (2000)

    Book  Google Scholar 

  172. Sutton, R.S., Precup, D., Singh, S.: Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning. Artif. Intell. 112(1), 181–211 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  173. Szubert, M., Jaśkowski, W., Krawiec, K.: On scalability, generalization, and hybridization of coevolutionary learning: A case study for othello. IEEE Trans. Comput. Intell. AI Games 5(3), 214–226 (2013)

    Article  Google Scholar 

  174. Tan, H., Zhou, Y., Tao, Q., Rosen, J., van Dijken, S.: Bioinspired multisensory neural network with crossmodal integration and recognition. Nat. Commun. 12(1), 1120 (2021)

    Article  Google Scholar 

  175. Taylor, M.E., Stone, P.: Transfer learning for reinforcement learning domains: A survey. J. Mach. Learn. Res. 10, 1633–1685 (2009)

    MathSciNet  MATH  Google Scholar 

  176. Adam, T., Kourosh, N.: Evolving neural network agents to play atari games with compact state representations. In: Proceedings of the 2020 Genetic and Evolutionary Computation Conference Companion, GECCO ’20, pp. 99–100. Association for Computing Machinery, New York, NY, USA (2020)

    Google Scholar 

  177. Zdenek, V., Lukas, S.: Hardware accelerators for cartesian genetic programming. In: Michael, O., Leonardo, V., Steven, G., Anna Isabel, E.A., Ivanoe, D.F., Antonio, D.C., Ernesto, T. (eds.) Genetic Programming, pp. 230–241. Springer, Berlin (2008)

    Google Scholar 

  178. Vassiliades, V., Chatzilygeroudis, K., Mouret, J.-B.: Using centroidal voronoi tessellations to scale up the multidimensional archive of phenotypic elites algorithm. IEEE Trans. Evol. Comput. 22(4), 623–630 (2018)

    Google Scholar 

  179. Vassiliades, V., Chatzilygeroudis, K., Mouret, J.-B.: Using centroidal voronoi tessellations to scale up the multidimensional archive of phenotypic elites algorithm. IEEE Trans. Evol. Comput. 22(4), 623–630 (2018)

    Article  Google Scholar 

  180. Verbancsics, P., Stanley, K.O.: Evolving static representations for task transfer. J. Mach. Learn. Res. 11, 1737–1769 (2010)

    MathSciNet  MATH  Google Scholar 

  181. Vernon, B.M.: The columnar organization of the neocortex. Brain 120, 701–722 (1997)

    Article  Google Scholar 

  182. Wang, J., Zhang, Y., Kim, T.-K., Yunjie, G.: Shapley q-value: A local reward approach to solve global reward games. Proceed. AAAI Conf. Artif. Intell. 34, 7285–7292 (2020)

    Google Scholar 

  183. Rui, W., Joel, L., Jeff, C., Kenneth, O.S.: Poet: Open-ended coevolution of environments and their optimized solutions. In: Proceedings of the Genetic and Evolutionary Computation Conference, GECCO ’19, pp. 142–151. Association for Computing Machinery, New York, NY, USA (2019)

    Google Scholar 

  184. Wang, J., Zhang, Y., Kim, T.-K., Yunjie, G.: Shapley q-value: A local reward approach to solve global reward games. Proceed. AAAI Conf. Artif. Intell. 34, 7285–7292 (2020)

    Google Scholar 

  185. Watson, R.A., Pollack, J.B.: Modular interdependency in complex dynamical systems. Artif. Life 11(4), 445–457 (2005)

    Article  Google Scholar 

  186. Whiteson, S., Kohl, N., Miikkulainen, R., Stone, P.: Evolving soccer keepaway players through task decomposition. Mach. Learn. 59(1), 5–30 (2005)

    Google Scholar 

  187. Whiteson, S., Kohl, N., Miikkulainen, R., Stone, P.: Evolving soccer keepaway players through task decomposition. Mach. Learn. 59(1), 5–30 (2005)

    Article  MATH  Google Scholar 

  188. Whitley, D., Dominic, S., Das, R., Anderson, C.W.: Genetic reinforcement learning for neurocontrol problems. Mach. Learn. 13(2–3), 259–284 (1993)

    Article  Google Scholar 

  189. Geraint, A.W.: A preliminary framework for description, analysis and comparison of creative systems. Knowl. Based Syst. 19(7), 449–458 (2006) Creative Systems

    Google Scholar 

  190. Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 8(3–4), 229–256 (1992)

    Google Scholar 

  191. Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 8(3–4), 229–256 (1992)

    Article  MATH  Google Scholar 

  192. Georgios, N.: Yannakakis and Julian Togelius. Artificial Intelligence and Games. Springer (2018)

    Google Scholar 

  193. Aspen, H.Y., Anne, G.E.C.: How working memory and reinforcement learning are intertwined: a cognitive, neural, and computational perspective. J. Cogn. Neurosci. 34(4), 551–568 (2022)

    Google Scholar 

  194. Wenhao, Y.C., Karen, L., Greg, T.: Policy transfer with strategy optimization. In: International Conference on Learning Representations (2019)

    Google Scholar 

  195. Shanglin, Z., Michael, S., Jiannis, T., Peyman, G., Dean, V.B.: Multiplexing working memory and time in the trajectories of neural networks. In: Nature Human Behaviour (2023)

    Google Scholar 

  196. Zdenek, V., Lukas, S.: Hardware accelerators for cartesian genetic programming. In: Michael, O., Leonardo, V., Steven, G., Anna Isabel, E.A., Ivanoe, D.F., Antonio, D.C., Ernesto, T. (eds.) Genetic Programming, pp. 230–241. Springer, Berlin (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stephen Kelly .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Kelly, S., Schossau, J. (2024). Evolutionary Computation and the Reinforcement Learning Problem. In: Banzhaf, W., Machado, P., Zhang, M. (eds) Handbook of Evolutionary Machine Learning. Genetic and Evolutionary Computation. Springer, Singapore. https://doi.org/10.1007/978-981-99-3814-8_4

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-3814-8_4

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-3813-1

  • Online ISBN: 978-981-99-3814-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics