Skip to main content

Feedback of Delayed Rewards in XCS for Environments with Aliasing States

  • Conference paper
Book cover Artificial Life: Borrowing from Biology (ACAL 2009)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5865))

Included in the following conference series:

  • 728 Accesses

Abstract

Wilson [13] showed how delayed reward feedback can be used to solve many multi-step problems for the widely used XCS learning classifier system. However, Wilson’s method – based on back-propagation with discounting from Q-learning – runs into difficulties in environments with aliasing states, since the local reward function often does not converge. This paper describes a different approach to reward feedback, in which a layered reward scheme for XCS classifiers is learnt during training. We show that, with a relatively minor modification to XCS feedback, the approach not only solves problems such as Woods1 but can also solve aliasing states problems such as Littman57, MiyazakiA and MazeB.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Barry, A.M., Down, C.: Limits in Long Path Learning with XCS. In: Cantú-Paz, E., Foster, J.A., Deb, K., Davis, L., Roy, R., O’Reilly, U.-M., Beyer, H.-G., Kendall, G., Wilson, S.W., Harman, M., Wegener, J., Dasgupta, D., Potter, M.A., Schultz, A., Dowsland, K.A., Jonoska, N., Miller, J., Standish, R.K. (eds.) GECCO 2003. LNCS, vol. 2724, pp. 1832–1843. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  2. Butz, M.V., Kaloxylos, R., Liu, J., Chou, P.H., Bagherzadeh, N., Kurdahi, F.J.: Analysis and Improvement of Fitness Exploitation in XCS: Bounding Models, Tournament Selection, and Bilateral Accuracy. Evolutionary Computation 11, 239–277 (2003)

    Article  Google Scholar 

  3. Butz, M.V., Kovacs, T., Lanzi, P.L., Wilson, S.W.: How XCS Evolves Accurate Classifiers. In: Proceedings of the Genetic and Evolutionary Computation Conference (GECCO 2001), pp. 927–934 (2001)

    Google Scholar 

  4. Butz, M.V., Kovacs, T., Lanzi, P.L., Wilson, S.W.: Toward a theory of generalization and learning in XCS. Evolutionary Computation 8, 28–46 (2004)

    Article  Google Scholar 

  5. Chen, K.-Y., Dam, H.H., Lindsay, P.A., Abbass, H.A.: Biasing XCS with Domain Knowledge for Planning Flight Trajectories in a Moving Sector Free Flight Environment. In: IEEE Symposium on Proceedings of Artificial Life, 2007. ALIFE 2007, Honolulu, HI, USA, pp. 456–462 (2007)

    Google Scholar 

  6. Lang, K.J., Witbrock, M.J.: Learning to Tell Two Spirals Apart. In: Proceedings of the 1988 Connectionist Models Summer School, San Mateo, CA, pp. 52–59 (1988)

    Google Scholar 

  7. Lanzi, P.L.: A Model of the Environment to Avoid Local Learning with XCS, Dipartimento di Elettronica e Informazione, Politecnico do Milano, IT, Tech. Rep. N. 97. 46 (1997)

    Google Scholar 

  8. Lanzi, P.L.: Solving Problems in Partially Observable Environments with Classifier Systems, Dipartimento di Elettronica e Informazione, Politecnico do Milano, IT, Tech. Rep. N. 97. 45 (1997)

    Google Scholar 

  9. Littman, M.L., Cassandra, A.R., Kaelbling, L.P.: Learning Policies for Partially Observable Environments: Scaling up. In: Proceedings of the Twelfth International Conference on Machine Learning (1995)

    Google Scholar 

  10. Mitchell, T.M.: Machine Learning. McGraw-Hill, New York (1997)

    MATH  Google Scholar 

  11. Tang, K.W., Jarvis, R.A.: Is XCS Suitable for Problems with Temporal Rewards. In: Proceedings of International Conference on Computational Intelligence for Modelling, Control and Automation (CIMCA 2005), Vienna, Austria, pp. 258–264 (2005)

    Google Scholar 

  12. Tangamchit, P., Dolan, J.M., Khosla, P.K.: The Necessity of Average Rewards in Cooperative Multirobot Learning. In: Proceedings of the 2002 IEEE International Conference on Robotics & Automation, Washington, DC USA, pp. 1296–1301 (2002)

    Google Scholar 

  13. Wilson, S.W.: Classifier Fitness Based on Accuracy. Evolutionary Computation 3, 149–175 (1995)

    Article  Google Scholar 

  14. Wilson, S.W.: Generalization in the XCS classifier system. In: Proceedings of Genetic Programming 1998: Proceedings of the Third Annual Conference San Francisco, CA, pp. 665–674 (1998)

    Google Scholar 

  15. Wilson, S.W.: Structure and Function of the XCS Classifier System. In: Cantú-Paz, E., Foster, J.A., Deb, K., Davis, L., Roy, R., O’Reilly, U.-M., Beyer, H.-G., Kendall, G., Wilson, S.W., Harman, M., Wegener, J., Dasgupta, D., Potter, M.A., Schultz, A., Dowsland, K.A., Jonoska, N., Miller, J., Standish, R.K. (eds.) GECCO 2003. LNCS, vol. 2724, pp. 1857–1869. Springer, Heidelberg (2003)

    Google Scholar 

  16. Wilson, S.W.: ZCS: A Zeroth Level Classifier System. Evolutionary Computation 2, 1–18 (1994)

    Article  Google Scholar 

  17. Zatuchna, Z.V.: AgentP: a Learning Classifier System with Associative Perception in Maze Environments, University of East Anglia, PhD thesis (2005)

    Google Scholar 

  18. Zatuchna, Z.V., Bagnall, A.: Learning Mazes with Aliasing States: An LCS Algorithm with Associative Perception. Adaptive Behavior - Animals, Animats, Software Agents, Robots, Adaptive Systems 17, 28–57 (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chen, KY., Lindsay, P.A. (2009). Feedback of Delayed Rewards in XCS for Environments with Aliasing States. In: Korb, K., Randall, M., Hendtlass, T. (eds) Artificial Life: Borrowing from Biology. ACAL 2009. Lecture Notes in Computer Science(), vol 5865. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-10427-5_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-10427-5_25

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-10426-8

  • Online ISBN: 978-3-642-10427-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics