Skip to main content
Log in

XCSF with tile coding in discontinuous action-value landscapes

  • Special Issue
  • Published:
Evolutionary Intelligence Aims and scope Submit manuscript

Abstract

Tile coding is an effective reinforcement learning method that uses a rather ingenious generalization mechanism based on (1) a carefully designed parameter setting and (2) the assumption that nearby states in the problem space will correspond to similar payoff predictions in the action-value function. Previously, we extended XCSF with tile coding prediction and compared it to tabular tile coding, showing that (1) XCSF performs as well as parameter-optimized tile coding, while also (2) evolving individualized parameter settings in each problem subspace. Our comparison was based on a set of well-known reinforcement learning environments (2D Gridworld and the Mountain Car) that involved no action-value discontinuities and so posed no challenge to tabular tile coding. In this paper, we go a step further and test XCSF with tile coding on a set of problems designed to challenge tile coding by introducing discontinuities in the action value landscape. The new testbed (called MazeWorld) extends 2D Gridworld with impenetrable obstacles, a conceptually simple modification that can dramatically increase the problem complexity for tabular tile coding. We compare four versions of XCSF with tile coding (each adapting a different set of parameters) to tabular tile coding on four problems of increasing complexity. We show that our system (1) needs fewer training problems than standard tile coding to reach an optimal policy; (2) it can evolve adequate coding parameters in each subspace without any previous knowledge; and that (3) even when XCSF is not allowed to evolve these parameters, the genetic algorithm will still adapt classifier conditions to properly decompose the problem into subspaces thus being much less sensitive to the parameter settings than tabular tile coding.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Boyan JA, Moore AW (1995) Generalization in reinforcement learning: safely approximating the value function. In: Tesauro G, Touretzky DS, Leen TK (eds) Advances in neural information processing systems 7. The MIT Press, Cambridge, pp 369–376

    Google Scholar 

  2. Butz MV, Sigaud O (2012) XCSF with local deletion: preventing detrimental forgetting. Evol Intell 5(2):117–127

    Article  Google Scholar 

  3. Glantz SA, Slinker BK (2001) Primer of applied regression & analysis of variance, 2nd edn. McGraw-Hill, NY

    Google Scholar 

  4. Glaubius R, Smart WD (2004) Manifold representations for value-function approximation. In: Pucci de Farias D, Mannor S, Precup D, Theocharous G (eds) Learning and planning in Markov processes—advances and challenges: papers from the 2004 AAAI workshop, pp 13–18. Available in AAAI Technical Report WS-04-08

  5. Howard GD, Bull L, Lanzi P-L (2008) Self-adaptive constructivism in neural XCS and XCSF. In: Proceedings of the 10th annual conference on genetic and evolutionary computation, GECCO ’08. ACM, New York, pp 1389–1396

  6. Howard GD, Bull L, Lanzi P-L (2009) Towards continuous actions in continuous space and time using self-adaptive constructivism in neural XCSF. In: Proceedings of the 11th annual conference on genetic and evolutionary computation, GECCO ’09. ACM, New York, pp 1219–1226

  7. Howard GD, Bull L, Lanzi P-L (2010) Use of a connection-selection scheme in neural XCSF. In: Bacardit J, Browne W, Drugowitsch J, Bernad-Mansilla E, Butz MV (eds) Learning classifier systems, volume 6471 of lecture notes in computer science. Springer, Berlin, pp 87–106

    Google Scholar 

  8. Kneissler J, Stalph PO, Drugowitsch J, Butz MV (2012) Filtering sensory information with XCSF: improving learning robustness and control performance. In: Soule T, Moore JH (eds) Genetic and evolutionary computation conference, GECCO ’12, Philadelphia, PA, USA, July 7–11, 2012. ACM, New York, pp 871–878

    Google Scholar 

  9. Kneissler J, Stalph PO, Drugowitsch J, Butz MV (2012) Filtering sensory information with XCSF: improving learning robustness and control performance. In: Proceedings of the 14th annual conference on genetic and evolutionary computation, GECCO ’12. ACM, New York, pp 871–878

  10. Kneissler J, Stalph PO, Drugowitsch J, Butz MV (2014) Filtering sensory information with XCSF: improving learning robustness and robot arm control performance. Evol Comput 22(1):139–158

    Article  Google Scholar 

  11. Lanzi PL, Loiacono D, Wilson SW, Goldberg DE (2005) Extending XCSF beyond linear approximation. In: Beyer H-G (ed) Genetic and evolutionary computation—GECCO-2005. ACM Press, Washington DC, pp 1859–1866

  12. Lanzi PL, Loiacono D, Wilson SW, Goldberg DE (2005) XCS with computed prediction for the learning of boolean functions. In: Proceedings of the IEEE congress on evolutionary computation—CEC-2005, pp 588–595, Edinburgh, UK. IEEE

  13. Lanzi PL, Loiacono D, Wilson SW, Goldberg DE (2005) XCS with computed prediction in continuous multistep environments. In: Proceedings of the IEEE congress on evolutionary computation—CEC-2005, pp 2032–2039, Edinburgh, UK. IEEE

  14. Lanzi PL, Loiacono D, Wilson SW, Goldberg DE (2005) XCS with computed prediction in multistep environments. In: Beyer H-G (ed) Genetic and evolutionary computation—GECCO-2005. ACM Press, Washington DC, pp 1827–1834

  15. Lanzi PL, Loiacono D, Wilson SW, Goldberg DE (2006) Classifier prediction based on tile coding. In: GECCO ’06: proceedings of the 8th annual conference on genetic and evolutionary computation. ACM Press, New York, pp 1497–1504

  16. Lanzi PL, Loiacono D, Zanini M (2007) Evolving classifiers ensembles with heterogeneous predictors. In: Bacardit J, Bernadó-Mansilla E, Butz MV, Kovacs T, Llorà X, Takadama K (eds) Learning classifier systems, 10th international workshop, IWLCS 2006, Seattle, MA, USA, July 8, 2006 and 11th International Workshop, IWLCS 2007, London, UK, July 8, 2007, revised selected papers, volume 4998 of lecture notes in computer science. Springer, Berlin, pp 218–234

  17. Lanzi PL, Loiacono D, Zanini M (2008) Evolving classifier ensembles with voting predictors. In: IEEE congress on evolutionary computation, pp 3760–3767. IEEE

  18. Loiacono D, Lanzi PL (2008) Recursive least squares and quadratic prediction in continuous multistep problems. In: Ryan C, Keijzer M (eds) GECCO (Companion). ACM, New York, pp 1985–1992

    Chapter  Google Scholar 

  19. Marin D, Decock J, Rigoux L, Sigaud O (2011) Learning cost-efficient control policies with XCSF: generalization capabilities and further improvement. In: Proceedings of the 13th annual conference on genetic and evolutionary computation, GECCO ’11. ACM, New York, pp 1235–1242

  20. Nakata M, Sato F, Takadama K (2011) Towards generalization by identification-based xcs in multi-steps problem. In: Nature and biologically inspired computing (NaBIC), 2011 third world congress on, pp 389–394

  21. Nakata M, Lanzi PL, Takadama K (2012) Enhancing learning capabilities by XCS with best action mapping. In: Coello CAC, Cutello V, Deb K, Forrest S, Nicosia G, Pavone M (eds) Parallel problem solving from nature–PPSN XII, volume 7491 of lecture notes in computer science. Springer, Berlin, pp 256–265

    Google Scholar 

  22. Piater JH, Cohen PR, Zhang X, Atighetchi M (1998) A randomized ANOVA procedure for comparing performance curves. In: Machine learning: proceedings of the fifteenth international conference (ICML), pp 430–438, Madison, Wisconsin. Morgan Kaufmann, San Mateo, CA, USA

  23. Ramírez-Ruiz JA, Valenzuela-Rendón M, Terashima-Marín H (2008) A new approach to fuzzy lcss in two-dimensional continuous multistep environment with continuous vector actions. In: Proceedings of the 10th annual conference on genetic and evolutionary computation, GECCO ’08. ACM, New York, pp 1433–1434

  24. Ramírez-Ruiz JA, Valenzuela-Rendón M, Terashima-Marín H (2008) QFCS: a fuzzy LCS in continuous multi-step environments with continuous vector actions. In: Rudolph G, Jansen T, Lucas SM, Poloni C, Beume N (eds) Parallel problem solving from nature—PPSN X, 10th international conference Dortmund, Germany, September 13–17, 2008, proceedings, volume 5199 of lecture notes in computer science. Springer, Berlin, pp 286–295

  25. Ramírez-Ruiz JA, Valenzuela-Rendón M, Terashima-Marín H (2009) uQFCS: QFCS with unfixed fuzzy sets in continuous multi-step environments with continuous vector actions. In: Proceedings of the 11th annual conference on genetic and evolutionary computation, GECCO ’09. ACM, New York, pp 1307–1314

  26. Sherstov AA, Stone P (2005) Function approximation via tile coding: automating parameter choice. In: Proceedins of symposium on abstraction, reformulation, and approximation (SARA-05), Edinburgh, Scotland, UK

  27. Sicard G, Salaün C, Ivaldi S, Padois V, Sigaud O (2011) Learning the velocity kinematics of ICUB for model-based control: XCSF versus LWPR. In: 11th IEEE-RAS international conference on humanoid robots (Humanoids 2011), Bled, Slovenia, October 26–28, 2011, pp 570–575. IEEE

  28. Stalph PO, Butz MV (2012) Guided evolution in xcsf. In: Proceedings of the 14th annual conference on genetic and evolutionary computation, GECCO ’12, ACM, New York, pp 911–918

  29. Stalph PO, Rubinsztajn J, Sigaud O, Butz MV (2012) Function approximation with LWPR and XCSF: a comparative study. Evol Intell 5(2):103–116

    Article  Google Scholar 

  30. Sutton RS (1996) Generalization in reinforcement learning: successful examples using sparse coarse coding. In: Touretzky DS, Mozer MC, Hasselmo ME (eds) Advances in neural information processing systems 8. The MIT Press, Cambridge, pp 1038–1044

    Google Scholar 

  31. Sutton RS, Barto AG (1998) Reinforcement learning—an introduction. MIT Press, Cambridge

    Google Scholar 

  32. Tesauro G (1994) TD-gammon, a self-teaching backgammon program, achieves master-level play. Neural Comput 6(2):215–219

    Article  Google Scholar 

  33. Wilson SW (1995) Classifier fitness based on accuracy. Evol Comput 3(2):149–175

    Article  Google Scholar 

  34. Wilson SW (2002) Classifiers that approximate functions. Natural Comput 1(2–3):211–234

    Article  MATH  Google Scholar 

Download references

Acknowledgments

Pier Luca wish to thank Stewart Wilson, a mentor since 1997 for anything related to English writing and learning classifier systems. The authors wish to thank the reviewers for their invaluable comments and suggestions regarding possible extensions of the approach using a more competent genetic algorithm.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pier Luca Lanzi.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lanzi, P.L., Loiacono, D. XCSF with tile coding in discontinuous action-value landscapes. Evol. Intel. 8, 117–132 (2015). https://doi.org/10.1007/s12065-015-0129-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12065-015-0129-7

Keywords

Navigation