Abstract
Tile coding is an effective reinforcement learning method that uses a rather ingenious generalization mechanism based on (1) a carefully designed parameter setting and (2) the assumption that nearby states in the problem space will correspond to similar payoff predictions in the action-value function. Previously, we extended XCSF with tile coding prediction and compared it to tabular tile coding, showing that (1) XCSF performs as well as parameter-optimized tile coding, while also (2) evolving individualized parameter settings in each problem subspace. Our comparison was based on a set of well-known reinforcement learning environments (2D Gridworld and the Mountain Car) that involved no action-value discontinuities and so posed no challenge to tabular tile coding. In this paper, we go a step further and test XCSF with tile coding on a set of problems designed to challenge tile coding by introducing discontinuities in the action value landscape. The new testbed (called MazeWorld) extends 2D Gridworld with impenetrable obstacles, a conceptually simple modification that can dramatically increase the problem complexity for tabular tile coding. We compare four versions of XCSF with tile coding (each adapting a different set of parameters) to tabular tile coding on four problems of increasing complexity. We show that our system (1) needs fewer training problems than standard tile coding to reach an optimal policy; (2) it can evolve adequate coding parameters in each subspace without any previous knowledge; and that (3) even when XCSF is not allowed to evolve these parameters, the genetic algorithm will still adapt classifier conditions to properly decompose the problem into subspaces thus being much less sensitive to the parameter settings than tabular tile coding.
Similar content being viewed by others
References
Boyan JA, Moore AW (1995) Generalization in reinforcement learning: safely approximating the value function. In: Tesauro G, Touretzky DS, Leen TK (eds) Advances in neural information processing systems 7. The MIT Press, Cambridge, pp 369–376
Butz MV, Sigaud O (2012) XCSF with local deletion: preventing detrimental forgetting. Evol Intell 5(2):117–127
Glantz SA, Slinker BK (2001) Primer of applied regression & analysis of variance, 2nd edn. McGraw-Hill, NY
Glaubius R, Smart WD (2004) Manifold representations for value-function approximation. In: Pucci de Farias D, Mannor S, Precup D, Theocharous G (eds) Learning and planning in Markov processes—advances and challenges: papers from the 2004 AAAI workshop, pp 13–18. Available in AAAI Technical Report WS-04-08
Howard GD, Bull L, Lanzi P-L (2008) Self-adaptive constructivism in neural XCS and XCSF. In: Proceedings of the 10th annual conference on genetic and evolutionary computation, GECCO ’08. ACM, New York, pp 1389–1396
Howard GD, Bull L, Lanzi P-L (2009) Towards continuous actions in continuous space and time using self-adaptive constructivism in neural XCSF. In: Proceedings of the 11th annual conference on genetic and evolutionary computation, GECCO ’09. ACM, New York, pp 1219–1226
Howard GD, Bull L, Lanzi P-L (2010) Use of a connection-selection scheme in neural XCSF. In: Bacardit J, Browne W, Drugowitsch J, Bernad-Mansilla E, Butz MV (eds) Learning classifier systems, volume 6471 of lecture notes in computer science. Springer, Berlin, pp 87–106
Kneissler J, Stalph PO, Drugowitsch J, Butz MV (2012) Filtering sensory information with XCSF: improving learning robustness and control performance. In: Soule T, Moore JH (eds) Genetic and evolutionary computation conference, GECCO ’12, Philadelphia, PA, USA, July 7–11, 2012. ACM, New York, pp 871–878
Kneissler J, Stalph PO, Drugowitsch J, Butz MV (2012) Filtering sensory information with XCSF: improving learning robustness and control performance. In: Proceedings of the 14th annual conference on genetic and evolutionary computation, GECCO ’12. ACM, New York, pp 871–878
Kneissler J, Stalph PO, Drugowitsch J, Butz MV (2014) Filtering sensory information with XCSF: improving learning robustness and robot arm control performance. Evol Comput 22(1):139–158
Lanzi PL, Loiacono D, Wilson SW, Goldberg DE (2005) Extending XCSF beyond linear approximation. In: Beyer H-G (ed) Genetic and evolutionary computation—GECCO-2005. ACM Press, Washington DC, pp 1859–1866
Lanzi PL, Loiacono D, Wilson SW, Goldberg DE (2005) XCS with computed prediction for the learning of boolean functions. In: Proceedings of the IEEE congress on evolutionary computation—CEC-2005, pp 588–595, Edinburgh, UK. IEEE
Lanzi PL, Loiacono D, Wilson SW, Goldberg DE (2005) XCS with computed prediction in continuous multistep environments. In: Proceedings of the IEEE congress on evolutionary computation—CEC-2005, pp 2032–2039, Edinburgh, UK. IEEE
Lanzi PL, Loiacono D, Wilson SW, Goldberg DE (2005) XCS with computed prediction in multistep environments. In: Beyer H-G (ed) Genetic and evolutionary computation—GECCO-2005. ACM Press, Washington DC, pp 1827–1834
Lanzi PL, Loiacono D, Wilson SW, Goldberg DE (2006) Classifier prediction based on tile coding. In: GECCO ’06: proceedings of the 8th annual conference on genetic and evolutionary computation. ACM Press, New York, pp 1497–1504
Lanzi PL, Loiacono D, Zanini M (2007) Evolving classifiers ensembles with heterogeneous predictors. In: Bacardit J, Bernadó-Mansilla E, Butz MV, Kovacs T, Llorà X, Takadama K (eds) Learning classifier systems, 10th international workshop, IWLCS 2006, Seattle, MA, USA, July 8, 2006 and 11th International Workshop, IWLCS 2007, London, UK, July 8, 2007, revised selected papers, volume 4998 of lecture notes in computer science. Springer, Berlin, pp 218–234
Lanzi PL, Loiacono D, Zanini M (2008) Evolving classifier ensembles with voting predictors. In: IEEE congress on evolutionary computation, pp 3760–3767. IEEE
Loiacono D, Lanzi PL (2008) Recursive least squares and quadratic prediction in continuous multistep problems. In: Ryan C, Keijzer M (eds) GECCO (Companion). ACM, New York, pp 1985–1992
Marin D, Decock J, Rigoux L, Sigaud O (2011) Learning cost-efficient control policies with XCSF: generalization capabilities and further improvement. In: Proceedings of the 13th annual conference on genetic and evolutionary computation, GECCO ’11. ACM, New York, pp 1235–1242
Nakata M, Sato F, Takadama K (2011) Towards generalization by identification-based xcs in multi-steps problem. In: Nature and biologically inspired computing (NaBIC), 2011 third world congress on, pp 389–394
Nakata M, Lanzi PL, Takadama K (2012) Enhancing learning capabilities by XCS with best action mapping. In: Coello CAC, Cutello V, Deb K, Forrest S, Nicosia G, Pavone M (eds) Parallel problem solving from nature–PPSN XII, volume 7491 of lecture notes in computer science. Springer, Berlin, pp 256–265
Piater JH, Cohen PR, Zhang X, Atighetchi M (1998) A randomized ANOVA procedure for comparing performance curves. In: Machine learning: proceedings of the fifteenth international conference (ICML), pp 430–438, Madison, Wisconsin. Morgan Kaufmann, San Mateo, CA, USA
Ramírez-Ruiz JA, Valenzuela-Rendón M, Terashima-Marín H (2008) A new approach to fuzzy lcss in two-dimensional continuous multistep environment with continuous vector actions. In: Proceedings of the 10th annual conference on genetic and evolutionary computation, GECCO ’08. ACM, New York, pp 1433–1434
Ramírez-Ruiz JA, Valenzuela-Rendón M, Terashima-Marín H (2008) QFCS: a fuzzy LCS in continuous multi-step environments with continuous vector actions. In: Rudolph G, Jansen T, Lucas SM, Poloni C, Beume N (eds) Parallel problem solving from nature—PPSN X, 10th international conference Dortmund, Germany, September 13–17, 2008, proceedings, volume 5199 of lecture notes in computer science. Springer, Berlin, pp 286–295
Ramírez-Ruiz JA, Valenzuela-Rendón M, Terashima-Marín H (2009) uQFCS: QFCS with unfixed fuzzy sets in continuous multi-step environments with continuous vector actions. In: Proceedings of the 11th annual conference on genetic and evolutionary computation, GECCO ’09. ACM, New York, pp 1307–1314
Sherstov AA, Stone P (2005) Function approximation via tile coding: automating parameter choice. In: Proceedins of symposium on abstraction, reformulation, and approximation (SARA-05), Edinburgh, Scotland, UK
Sicard G, Salaün C, Ivaldi S, Padois V, Sigaud O (2011) Learning the velocity kinematics of ICUB for model-based control: XCSF versus LWPR. In: 11th IEEE-RAS international conference on humanoid robots (Humanoids 2011), Bled, Slovenia, October 26–28, 2011, pp 570–575. IEEE
Stalph PO, Butz MV (2012) Guided evolution in xcsf. In: Proceedings of the 14th annual conference on genetic and evolutionary computation, GECCO ’12, ACM, New York, pp 911–918
Stalph PO, Rubinsztajn J, Sigaud O, Butz MV (2012) Function approximation with LWPR and XCSF: a comparative study. Evol Intell 5(2):103–116
Sutton RS (1996) Generalization in reinforcement learning: successful examples using sparse coarse coding. In: Touretzky DS, Mozer MC, Hasselmo ME (eds) Advances in neural information processing systems 8. The MIT Press, Cambridge, pp 1038–1044
Sutton RS, Barto AG (1998) Reinforcement learning—an introduction. MIT Press, Cambridge
Tesauro G (1994) TD-gammon, a self-teaching backgammon program, achieves master-level play. Neural Comput 6(2):215–219
Wilson SW (1995) Classifier fitness based on accuracy. Evol Comput 3(2):149–175
Wilson SW (2002) Classifiers that approximate functions. Natural Comput 1(2–3):211–234
Acknowledgments
Pier Luca wish to thank Stewart Wilson, a mentor since 1997 for anything related to English writing and learning classifier systems. The authors wish to thank the reviewers for their invaluable comments and suggestions regarding possible extensions of the approach using a more competent genetic algorithm.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lanzi, P.L., Loiacono, D. XCSF with tile coding in discontinuous action-value landscapes. Evol. Intel. 8, 117–132 (2015). https://doi.org/10.1007/s12065-015-0129-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12065-015-0129-7