Evolutionary Intelligence

, Volume 8, Issue 2–3, pp 117–132

XCSF with tile coding in discontinuous action-value landscapes

Special Issue

Abstract

Tile coding is an effective reinforcement learning method that uses a rather ingenious generalization mechanism based on (1) a carefully designed parameter setting and (2) the assumption that nearby states in the problem space will correspond to similar payoff predictions in the action-value function. Previously, we extended XCSF with tile coding prediction and compared it to tabular tile coding, showing that (1) XCSF performs as well as parameter-optimized tile coding, while also (2) evolving individualized parameter settings in each problem subspace. Our comparison was based on a set of well-known reinforcement learning environments (2D Gridworld and the Mountain Car) that involved no action-value discontinuities and so posed no challenge to tabular tile coding. In this paper, we go a step further and test XCSF with tile coding on a set of problems designed to challenge tile coding by introducing discontinuities in the action value landscape. The new testbed (called MazeWorld) extends 2D Gridworld with impenetrable obstacles, a conceptually simple modification that can dramatically increase the problem complexity for tabular tile coding. We compare four versions of XCSF with tile coding (each adapting a different set of parameters) to tabular tile coding on four problems of increasing complexity. We show that our system (1) needs fewer training problems than standard tile coding to reach an optimal policy; (2) it can evolve adequate coding parameters in each subspace without any previous knowledge; and that (3) even when XCSF is not allowed to evolve these parameters, the genetic algorithm will still adapt classifier conditions to properly decompose the problem into subspaces thus being much less sensitive to the parameter settings than tabular tile coding.

Keywords

Learning classifier systems Tile coding Reinforcement learning XCSF 

References

  1. 1.
    Boyan JA, Moore AW (1995) Generalization in reinforcement learning: safely approximating the value function. In: Tesauro G, Touretzky DS, Leen TK (eds) Advances in neural information processing systems 7. The MIT Press, Cambridge, pp 369–376Google Scholar
  2. 2.
    Butz MV, Sigaud O (2012) XCSF with local deletion: preventing detrimental forgetting. Evol Intell 5(2):117–127CrossRefGoogle Scholar
  3. 3.
    Glantz SA, Slinker BK (2001) Primer of applied regression & analysis of variance, 2nd edn. McGraw-Hill, NYGoogle Scholar
  4. 4.
    Glaubius R, Smart WD (2004) Manifold representations for value-function approximation. In: Pucci de Farias D, Mannor S, Precup D, Theocharous G (eds) Learning and planning in Markov processes—advances and challenges: papers from the 2004 AAAI workshop, pp 13–18. Available in AAAI Technical Report WS-04-08Google Scholar
  5. 5.
    Howard GD, Bull L, Lanzi P-L (2008) Self-adaptive constructivism in neural XCS and XCSF. In: Proceedings of the 10th annual conference on genetic and evolutionary computation, GECCO ’08. ACM, New York, pp 1389–1396Google Scholar
  6. 6.
    Howard GD, Bull L, Lanzi P-L (2009) Towards continuous actions in continuous space and time using self-adaptive constructivism in neural XCSF. In: Proceedings of the 11th annual conference on genetic and evolutionary computation, GECCO ’09. ACM, New York, pp 1219–1226Google Scholar
  7. 7.
    Howard GD, Bull L, Lanzi P-L (2010) Use of a connection-selection scheme in neural XCSF. In: Bacardit J, Browne W, Drugowitsch J, Bernad-Mansilla E, Butz MV (eds) Learning classifier systems, volume 6471 of lecture notes in computer science. Springer, Berlin, pp 87–106Google Scholar
  8. 8.
    Kneissler J, Stalph PO, Drugowitsch J, Butz MV (2012) Filtering sensory information with XCSF: improving learning robustness and control performance. In: Soule T, Moore JH (eds) Genetic and evolutionary computation conference, GECCO ’12, Philadelphia, PA, USA, July 7–11, 2012. ACM, New York, pp 871–878Google Scholar
  9. 9.
    Kneissler J, Stalph PO, Drugowitsch J, Butz MV (2012) Filtering sensory information with XCSF: improving learning robustness and control performance. In: Proceedings of the 14th annual conference on genetic and evolutionary computation, GECCO ’12. ACM, New York, pp 871–878Google Scholar
  10. 10.
    Kneissler J, Stalph PO, Drugowitsch J, Butz MV (2014) Filtering sensory information with XCSF: improving learning robustness and robot arm control performance. Evol Comput 22(1):139–158CrossRefGoogle Scholar
  11. 11.
    Lanzi PL, Loiacono D, Wilson SW, Goldberg DE (2005) Extending XCSF beyond linear approximation. In: Beyer H-G (ed) Genetic and evolutionary computation—GECCO-2005. ACM Press, Washington DC, pp 1859–1866Google Scholar
  12. 12.
    Lanzi PL, Loiacono D, Wilson SW, Goldberg DE (2005) XCS with computed prediction for the learning of boolean functions. In: Proceedings of the IEEE congress on evolutionary computation—CEC-2005, pp 588–595, Edinburgh, UK. IEEEGoogle Scholar
  13. 13.
    Lanzi PL, Loiacono D, Wilson SW, Goldberg DE (2005) XCS with computed prediction in continuous multistep environments. In: Proceedings of the IEEE congress on evolutionary computation—CEC-2005, pp 2032–2039, Edinburgh, UK. IEEEGoogle Scholar
  14. 14.
    Lanzi PL, Loiacono D, Wilson SW, Goldberg DE (2005) XCS with computed prediction in multistep environments. In: Beyer H-G (ed) Genetic and evolutionary computation—GECCO-2005. ACM Press, Washington DC, pp 1827–1834Google Scholar
  15. 15.
    Lanzi PL, Loiacono D, Wilson SW, Goldberg DE (2006) Classifier prediction based on tile coding. In: GECCO ’06: proceedings of the 8th annual conference on genetic and evolutionary computation. ACM Press, New York, pp 1497–1504Google Scholar
  16. 16.
    Lanzi PL, Loiacono D, Zanini M (2007) Evolving classifiers ensembles with heterogeneous predictors. In: Bacardit J, Bernadó-Mansilla E, Butz MV, Kovacs T, Llorà X, Takadama K (eds) Learning classifier systems, 10th international workshop, IWLCS 2006, Seattle, MA, USA, July 8, 2006 and 11th International Workshop, IWLCS 2007, London, UK, July 8, 2007, revised selected papers, volume 4998 of lecture notes in computer science. Springer, Berlin, pp 218–234Google Scholar
  17. 17.
    Lanzi PL, Loiacono D, Zanini M (2008) Evolving classifier ensembles with voting predictors. In: IEEE congress on evolutionary computation, pp 3760–3767. IEEEGoogle Scholar
  18. 18.
    Loiacono D, Lanzi PL (2008) Recursive least squares and quadratic prediction in continuous multistep problems. In: Ryan C, Keijzer M (eds) GECCO (Companion). ACM, New York, pp 1985–1992CrossRefGoogle Scholar
  19. 19.
    Marin D, Decock J, Rigoux L, Sigaud O (2011) Learning cost-efficient control policies with XCSF: generalization capabilities and further improvement. In: Proceedings of the 13th annual conference on genetic and evolutionary computation, GECCO ’11. ACM, New York, pp 1235–1242Google Scholar
  20. 20.
    Nakata M, Sato F, Takadama K (2011) Towards generalization by identification-based xcs in multi-steps problem. In: Nature and biologically inspired computing (NaBIC), 2011 third world congress on, pp 389–394Google Scholar
  21. 21.
    Nakata M, Lanzi PL, Takadama K (2012) Enhancing learning capabilities by XCS with best action mapping. In: Coello CAC, Cutello V, Deb K, Forrest S, Nicosia G, Pavone M (eds) Parallel problem solving from nature–PPSN XII, volume 7491 of lecture notes in computer science. Springer, Berlin, pp 256–265Google Scholar
  22. 22.
    Piater JH, Cohen PR, Zhang X, Atighetchi M (1998) A randomized ANOVA procedure for comparing performance curves. In: Machine learning: proceedings of the fifteenth international conference (ICML), pp 430–438, Madison, Wisconsin. Morgan Kaufmann, San Mateo, CA, USAGoogle Scholar
  23. 23.
    Ramírez-Ruiz JA, Valenzuela-Rendón M, Terashima-Marín H (2008) A new approach to fuzzy lcss in two-dimensional continuous multistep environment with continuous vector actions. In: Proceedings of the 10th annual conference on genetic and evolutionary computation, GECCO ’08. ACM, New York, pp 1433–1434Google Scholar
  24. 24.
    Ramírez-Ruiz JA, Valenzuela-Rendón M, Terashima-Marín H (2008) QFCS: a fuzzy LCS in continuous multi-step environments with continuous vector actions. In: Rudolph G, Jansen T, Lucas SM, Poloni C, Beume N (eds) Parallel problem solving from nature—PPSN X, 10th international conference Dortmund, Germany, September 13–17, 2008, proceedings, volume 5199 of lecture notes in computer science. Springer, Berlin, pp 286–295Google Scholar
  25. 25.
    Ramírez-Ruiz JA, Valenzuela-Rendón M, Terashima-Marín H (2009) uQFCS: QFCS with unfixed fuzzy sets in continuous multi-step environments with continuous vector actions. In: Proceedings of the 11th annual conference on genetic and evolutionary computation, GECCO ’09. ACM, New York, pp 1307–1314Google Scholar
  26. 26.
    Sherstov AA, Stone P (2005) Function approximation via tile coding: automating parameter choice. In: Proceedins of symposium on abstraction, reformulation, and approximation (SARA-05), Edinburgh, Scotland, UKGoogle Scholar
  27. 27.
    Sicard G, Salaün C, Ivaldi S, Padois V, Sigaud O (2011) Learning the velocity kinematics of ICUB for model-based control: XCSF versus LWPR. In: 11th IEEE-RAS international conference on humanoid robots (Humanoids 2011), Bled, Slovenia, October 26–28, 2011, pp 570–575. IEEEGoogle Scholar
  28. 28.
    Stalph PO, Butz MV (2012) Guided evolution in xcsf. In: Proceedings of the 14th annual conference on genetic and evolutionary computation, GECCO ’12, ACM, New York, pp 911–918Google Scholar
  29. 29.
    Stalph PO, Rubinsztajn J, Sigaud O, Butz MV (2012) Function approximation with LWPR and XCSF: a comparative study. Evol Intell 5(2):103–116CrossRefGoogle Scholar
  30. 30.
    Sutton RS (1996) Generalization in reinforcement learning: successful examples using sparse coarse coding. In: Touretzky DS, Mozer MC, Hasselmo ME (eds) Advances in neural information processing systems 8. The MIT Press, Cambridge, pp 1038–1044Google Scholar
  31. 31.
    Sutton RS, Barto AG (1998) Reinforcement learning—an introduction. MIT Press, CambridgeGoogle Scholar
  32. 32.
    Tesauro G (1994) TD-gammon, a self-teaching backgammon program, achieves master-level play. Neural Comput 6(2):215–219CrossRefGoogle Scholar
  33. 33.
    Wilson SW (1995) Classifier fitness based on accuracy. Evol Comput 3(2):149–175CrossRefGoogle Scholar
  34. 34.
    Wilson SW (2002) Classifiers that approximate functions. Natural Comput 1(2–3):211–234CrossRefMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2015

Authors and Affiliations

  1. 1.Dipartimento di Elettronica, Informazione e BioingegneriaPolitecnico di MilanoMilanItaly

Personalised recommendations