Co-evolution of Rewards and Meta-parameters in Embodied Evolution

  • Stefan Elfwing
  • Eiji Uchibe
  • Kenji Doya
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5436)


Embodied evolution is a methodology for evolutionary robotics that mimics the distributed, asynchronous, and autonomous properties of biological evolution. The evaluation, selection, and reproduction are carried out by cooperation and competition of the robots, without any need for human intervention. An embodied evolution framework is therefore well suited to study the adaptive learning mechanisms for artificial agents that share the same fundamental constraints as biological agents: self-preservation and self-reproduction. In this paper we propose a framework for performing embodied evolution with a limited number of robots, by utilizing time-sharing in subpopulations of virtual agents. Within this framework, we explore the combination of within-generation learning of basic survival behaviors by reinforcement learning, and evolutionary adaptations over the generations of the basic behavior selection policy, the reward functions, and meta-parameters for reinforcement learning. We apply a biologically inspired selection scheme, in which there is no explicit communication of the individuals’ fitness information. The individuals can only reproduce offspring by mating, a pair-wise exchange of genotypes, and the probability that an individual reproduces offspring in its own subpopulation is dependent on the individual’s “health”, i.e., energy level, at the mating occasion. We validate the proposed method by comparing the proposed method with evolution using standard centralized selection, in simulation, and by transferring the obtained solutions to hardware using two real robots.


Embodied Evolution Evolutionary Robotics Reinforcement Learning Meta-learning Shaping Rewards Meta-parameters 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Baldwin, J.: A New Factor in Evolution. American Naturalist 30, 441–451 (1896)CrossRefGoogle Scholar
  2. 2.
    Doya, K.: Metalearning and Neuromodulation. Neural Networks 15, 4 (2002)Google Scholar
  3. 3.
    Doya, K., Uchibe, E.: The Cyber Rodent Project: Exploration of Adaptive Mechanisms for Self-Preservation and Self-Reproduction. Adaptive Behavior 13(2), 149–160 (2005)CrossRefGoogle Scholar
  4. 4.
    Elfwing, S., Uchibe, E., Doya, K., Christensen, H.I.: Darwinian Embodied Evolution of the Learning Ability for Survival. Adaptive Behavior (in Press)Google Scholar
  5. 5.
    Floreano, D., Mondada, F.: Evolution of Plastic Neurocontrollers for Situated Agents. In: International Conference on Simulation of Adaptive Behavior, pp. 401–410 (1996)Google Scholar
  6. 6.
    Hinton, G., Nowlan, S.: How Learning can Guide Evolution. Complex Systems 1, 495–502 (1987)Google Scholar
  7. 7.
    Laud, A., DeJong, G.: The Influence of Reward on the Speed of Reinforcement Learning: An Analysis of Shaping. In: International Conference on Machine Learning, pp. 440–447 (2003)Google Scholar
  8. 8.
    Nehmzow, U.: Physically Embedded Genetic Algorithm Learning in Multi-Robot Scenarios: The PEGA algorithm. In: International Workshop on Epigenetic Robotics and Robotics (2002)Google Scholar
  9. 9.
    Niv, Y., Joel, D., Meilijson, I., Ruppin, E.: Evolution of Reinforcement Learning in Uncertain Environments: A Simple Explanation for Complex Foraging Behaviors. Adaptive Behavior 10(1), 5–24 (2002)CrossRefGoogle Scholar
  10. 10.
    Ng, A., Harada, D., Russell, S.: Policy Invariance under Reward Transformations: Theory and Application to Reward Shaping. In: International Conference on Machine Learning, pp. 278–287 (1999)Google Scholar
  11. 11.
    Nolfi, S., Floreano, D.: Evolutionary Robotics. MIT Press, Cambridge (2000)Google Scholar
  12. 12.
    Rummery, G.A., Niranjan, M.: On-line Q-learning using Connectionist Systems. Technical report CUED/F-INFENG/TR 166, Cambridge University Engineering Department (1994)Google Scholar
  13. 13.
    Ruppin, E.: Evolutionary Autonomous Agents: A Neuroscience Perspective. Nature Review Neuroscience 3, 132–141 (2002)CrossRefGoogle Scholar
  14. 14.
    Paenke, I., Sendhoff, B., Kawecki, T.: Influence of Plasticity and Learning on Evolution under Directional Selection. American Naturalist 170(2), 1–12 (2007)CrossRefGoogle Scholar
  15. 15.
    Paenke, I., Kawecki, T., Sendhoff, B.: The Influence of Learning on Evolution - A Mathematical Framework. Artificial Life (2008)Google Scholar
  16. 16.
    Singh, S.P., Sutton, R.S.: Reinforcement Learning with Replacing Eligibility Traces. Machine Learning 22(1-3), 123–158 (1996)CrossRefGoogle Scholar
  17. 17.
    Sutton, R. S.: Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding. In: Advances in Neural Information Processing Systems 8, pp. 1038–1044 (1996)Google Scholar
  18. 18.
    Sutton, R., Barto, A.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)Google Scholar
  19. 19.
    Turney, P., Whitley, D., Anderson, R.: Introduction to the Special Issue: Evolution, Learning, and Instinct: 100 Years of the Baldwin Effect. Evolutionary Computation 4(3), iv–viii (1996)Google Scholar
  20. 20.
    Urzelai, J., Floreano, D.: Evolutionary Robotics: Coping with Environmental Change. In: Genetic and Evolutionary Computation Conference, pp. 941–948 (2000)Google Scholar
  21. 21.
    Watson, R., Ficici, S., Pollack, J.: Embodied Evolution: Distributing an Evolutionary Algorithm in a Population of Robots. Robotics and Autonomous Systems 39(1), 1–18 (2002)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Stefan Elfwing
    • 1
  • Eiji Uchibe
    • 1
  • Kenji Doya
    • 1
  1. 1.Neural Computation UnitOkinawa Institute of Science and TechnologyOkinawaJapan

Personalised recommendations