Advertisement

Feature Discovery in Reinforcement Learning Using Genetic Programming

  • Sertan Girgin
  • Philippe Preux
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4971)

Abstract

The goal of reinforcement learning is to find a policy that maximizes the expected reward accumulated by an agent over time based on its interactions with the environment; to this end, a function of the state of the agent has to be learned. It is often the case that states are better characterized by a set of features. However, finding a “good” set of features is generally a tedious task which requires a good domain knowledge. In this paper, we propose a genetic programming based approach for feature discovery in reinforcement learning. A population of individuals, each representing a set of features, is evolved, and individuals are evaluated by their average performance on short reinforcement learning trials. The results of experiments conducted on several benchmark problems demonstrate that the resulting features allow the agent to learn better policies in a reduced amount of episodes.

Keywords

Genetic Programming Reinforcement Learning Feature Discovery Policy Iteration Reinforcement Learning Algorithm 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998) (A Bradford Book)Google Scholar
  2. 2.
    Bertsekas, D., Ioffe, S.: Temporal differences-based policy iteration and applications in neuro-dynamic programming. Technical Report LIDS-P-2349, MIT (1996)Google Scholar
  3. 3.
    Koza, J.R.: Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge (1992)zbMATHGoogle Scholar
  4. 4.
    Riedmiller, M., Peters, J., Schaal, S.: Evaluation of policy gradient methods and variants on the cart-pole benchmark, pp. 254–261 (2007)Google Scholar
  5. 5.
    Krawiec, K.: Genetic programming-based construction of features for machine learning and knowledge discovery tasks. Genetic Programming and Evolvable Machines 3(4), 329–343 (2002)zbMATHCrossRefGoogle Scholar
  6. 6.
    Smith, M.G., Bull, L.: Genetic programming with a genetic algorithm for feature construction and selection. Genetic Programming and Evolvable Machines 6(3), 265–281 (2005)CrossRefGoogle Scholar
  7. 7.
    Sanner, S.: Online feature discovery in relational reinforcement learning. In: Open Problems in Statistical Relational Learning Workshop (SRL-2006) (2006)Google Scholar
  8. 8.
    Siedlecki, W., Sklansky, J.: A note on genetic algorithms for large-scale feature selection. Pattern Recogn. Lett. 10(5), 335–347 (1989)zbMATHCrossRefGoogle Scholar
  9. 9.
    Martin-Bautista, M.J., Vila, M.A.: A survey of genetic feature selection in mining issues. In: Proceedings of the 1999 Congress on Evolutionary Computation CEC 1999, vol. 2, p. 1321 (1999)Google Scholar
  10. 10.
    Hussein, F.: Genetic algorithms for feature selection and weighting, a review and study. In: Proceedings of the Sixth International Conference on Document Analysis and Recognition, Washington, DC, USA, p. 1240. IEEE Computer Society, Los Alamitos (2001)CrossRefGoogle Scholar
  11. 11.
    Nordin, P.: A compiling genetic programming system that directly manipulates the machine code. In: Kinnear Jr, K.E. (ed.) Advances in Genetic Programming, pp. 311–331. MIT Press, Cambridge (1994)Google Scholar
  12. 12.
    Banzhaf, W., Francone, F.D., Keller, R.E., Nordin, P.: Genetic programming: an introduction: on the automatic evolution of computer programs and its applications. Morgan Kaufmann Publishers Inc, San Francisco (1998)zbMATHGoogle Scholar
  13. 13.
    Fukunaga, A., Stechert, A., Mutz, D.: A genome compiler for high performance genetic programming. In: Genetic Programming 1998: Proceedings of the Third Annual Conference, University of Wisconsin, Madison, Wisconsin, USA, pp. 86–94. Morgan Kaufmann, San Francisco (1998)Google Scholar
  14. 14.
    G.N.U.: Lightning (2007), http://www.gnu.org/software/lightning/
  15. 15.
    Laboratory, A.N.: Mpich2 (2007), http://www-unix.mcs.anl.gov/mpi/mpich2/
  16. 16.
    Spong, M.W.: Swing up control of the acrobot. In: ICRA, pp. 2356–2361 (1994)Google Scholar
  17. 17.
    Coulom, R.: Reinforcement Learning Using Neural Networks, with Applications to Motor Control. PhD thesis, Institut National Polytechnique de Grenoble (2002)Google Scholar
  18. 18.
    Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-Dynamic Programming. Athena Scientific, Belmont, MA (1996)zbMATHGoogle Scholar
  19. 19.
    Scherrer, B.: Performance bounds for lambda policy iteration (2007)Google Scholar
  20. 20.
    Sutton, R.S., McAllester, D.A., Singh, S.P., Mansour, Y.: Policy gradient methods for RL with function approximation. In: NIPS, pp. 1057–1063 (1999)Google Scholar
  21. 21.
    Koza, J.R.: Genetic programming II: automatic discovery of reusable programs. MIT Press, Cambridge (1994)zbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Sertan Girgin
    • 1
  • Philippe Preux
    • 1
    • 2
  1. 1.Team-Project SequeLINRIA Futurs Lille 
  2. 2.LIFL (UMR CNRS)Université de Lille 

Personalised recommendations