Advertisement

Generalized State Values in an Anticipatory Learning Classifier System

  • Martin V. Butz
  • David E. Goldberg
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2684)

Abstract

This paper introduces generalized state values to the anticipatory learning classifier system ACS2. Previous studies showed that the evolving generalized state value in ACS2 might be overgeneral for a proper policy representation. Thus, the policy representation is separated from the model representation. A function approximation module is added that approximates state values. Actual action choice then depends on the learned generalized state values predicted by the means of the predictive model yielding anticipatory behavior. It is shown that the function approximation module accurately generalizes the state value function in the investigated MDP. Improvement of the approach by the means of further anticipatory interaction between predictive model learner and state value learner is suggested. We also propose the implementation of task dependent anticipatory attentional mechanisms exploiting the representation of the generalized state-value function. Finally, the anticipatory framework may be extended to support multiple motivations integrated in a motivational module which could be influenced by emotional biases.

Keywords

Optimal Policy Reinforcement Learning Markov Decision Process State List Generalization Mechanism 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Avila-Garcá, O., Cañamero, L.D.: A comparison of behavior selection architectures using viability indicators. In: EPSRC/BBSRC International Workshop Biologically-Inspired Robotics: The Legacy of W. Grey Walter, HP Bristol Labs, UK (2002)Google Scholar
  2. 2.
    Boyan, J.A., Moore, A.W.: Generalization in reinforcement learning: Safely approximating the value function. Advances in Neural Information Processing Systems 7 (1995)Google Scholar
  3. 3.
    Butz, M.V.: Anticipatory learning classifier systems. Kluwer Academic Publishers, Boston (2002)zbMATHGoogle Scholar
  4. 4.
    Butz, M.V.: Biasing exploration in an anticipatory learning classifier system. In: Lanzi, P.L., Stolzmann, W., Wilson, S.W. (eds.) IWLCS 2001. LNCS (LNAI), vol. 2321, pp. 3–22. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  5. 5.
    Butz, M.V.: State value learning with an anticipatory learning classifier system in a markov decision process. IlliGAL report 2002018, Illinois Genetic Algorithms Laboratory, University of Illinois at Urbana-Champaign (2002), http://wwwilligal.ge.uiuc.edu/
  6. 6.
    Butz, M.V., Goldberg, D.E., Stolzmann, W.: The anticipatory classifier system and genetic generalization. Natural Computing 1, 427–467 (2002)zbMATHCrossRefMathSciNetGoogle Scholar
  7. 7.
    Butz, M.V., Goldberg, D.E., Tharakunnel, K.: Analysis and improvement of fitness exploitation in xcs: Bounding models, tournament selection, and bilateral accuracy. In: Evolutionary Computation (2003) (in press)Google Scholar
  8. 8.
    Butz, M.V., Hoffmann, J.: Anticipations control behavior: Animal behavior in an anticipatory learning classifier system. In: Adaptive Behavior (2003) (in press)Google Scholar
  9. 9.
    Butz, M.V., Wilson, S.W.: An algorithmic description of XCS. In: Lanzi, P.L., Stolzmann, W., Wilson, S.W. (eds.) IWLCS 2000. LNCS (LNAI), vol. 1996, pp. 253–272. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  10. 10.
    Cañamero, L.D.: Designing emotions for activity selection in autonomous agents. In: Trappl, R., Petta, P., Payr, S. (eds.) Emotions in Humans and Artifacts, The MIT Press, Cambridge (2003) (in press)Google Scholar
  11. 11.
    Gérard, P., Meyer, J.A., Sigaud, O.: Combining latent learning and dynamic programming in MACS. In: European Journal of Operational Research (2002) (submitted)Google Scholar
  12. 12.
    Gérard, P., Sigaud, O.: Adding a generalization mechanism to YACS. In: Proceedings of the Third Genetic and Evolutionary Computation Conference (GECCO-2001), pp. 951–957 (2001)Google Scholar
  13. 13.
    Gérard, P., Sigaud, O.: YACS: Combining dynamic programming with generalization in classifier systems. In: Lanzi, P.L., Stolzmann, W., Wilson, S.W. (eds.) IWLCS 2000. LNCS (LNAI), vol. 1996, pp. 52–69. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  14. 14.
    Hoffmann, J.: Vorhersage und Erkenntnis: Die Funktion von Antizipationen in der menschlichen Verhaltenssteuerung und Wahrnehmung. In: [Anticipation and cognition: The function of anticipations in human behavioral control and perception.], Hogrefe, Göttingen, Germany (1993)Google Scholar
  15. 15.
    Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: A survey. Journal of Artificial Intelligence Research 4, 237–258 (1996)Google Scholar
  16. 16.
    Koch, C., Ullman, S.: Shifts in selective attention: Towards the underlying neural circuitry. Human Neurobiology 4, 219–227 (1985)Google Scholar
  17. 17.
    Moore, A.W., Atkeson, C.: Prioritized sweeping: Reinforcement learning with less data and less real time. Machine Learning 13, 103–130 (1993)Google Scholar
  18. 18.
    Pashler, H.E.: The psychology of attention. MIT Press, Cambridge (1998)Google Scholar
  19. 19.
    Stolzmann, W.: Anticipatory classifier systems. In: Genetic Programming 1998: Proceedings of the Third Annual Conference, pp. 658–664 (1998)Google Scholar
  20. 20.
    Stolzmann, W.: An introduction to anticipatory classifier systems. In: Lanzi, P.L., Stolzmann, W., Wilson, S.W. (eds.) From foundations to applications, pp. 175–194. Springer, Berlin (2000)Google Scholar
  21. 21.
    Sutton, R.S.: Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In: Proceedings of the Seventh International Conference on Machine Learning, pp. 216–224 (1990)Google Scholar
  22. 22.
    Sutton, R.S., Barto, A.G.: Reinforcement learning: An introduction. MIT Press, Cambridge (1998)Google Scholar
  23. 23.
    Watkins, C.J.C.H.: Learning from Delayed Rewards. PhD thesis, King’s College, Cambridge, UK (1989)Google Scholar
  24. 24.
    Wilson, S.W.: Classifier fitness based on accuracy. Evolutionary Computation 3, 149–175 (1995)CrossRefGoogle Scholar
  25. 25.
    Wilson, S.W.: Function approximation with a classifier system. In: Proceedings of the Third Genetic and Evolutionary Computation Conference (GECCO-2001), pp. 974–981 (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Martin V. Butz
    • 1
    • 2
  • David E. Goldberg
    • 1
  1. 1.Illinois Genetic Algorithms LaboratoryUniversity of Illinois at Urbana-ChampaignUSA
  2. 2.Department of Cognitive PsychologyUniversity of WürzburgGermany

Personalised recommendations