XCSF with tile coding in discontinuous action-value landscapes

Lanzi, Pier Luca; Loiacono, Daniele

doi:10.1007/s12065-015-0129-7

XCSF with tile coding in discontinuous action-value landscapes

Special Issue
Published: 11 April 2015

Volume 8, pages 117–132, (2015)
Cite this article

Evolutionary Intelligence Aims and scope Submit manuscript

Pier Luca Lanzi¹ &
Daniele Loiacono¹

254 Accesses
1 Citation
Explore all metrics

Abstract

Tile coding is an effective reinforcement learning method that uses a rather ingenious generalization mechanism based on (1) a carefully designed parameter setting and (2) the assumption that nearby states in the problem space will correspond to similar payoff predictions in the action-value function. Previously, we extended XCSF with tile coding prediction and compared it to tabular tile coding, showing that (1) XCSF performs as well as parameter-optimized tile coding, while also (2) evolving individualized parameter settings in each problem subspace. Our comparison was based on a set of well-known reinforcement learning environments (2D Gridworld and the Mountain Car) that involved no action-value discontinuities and so posed no challenge to tabular tile coding. In this paper, we go a step further and test XCSF with tile coding on a set of problems designed to challenge tile coding by introducing discontinuities in the action value landscape. The new testbed (called MazeWorld) extends 2D Gridworld with impenetrable obstacles, a conceptually simple modification that can dramatically increase the problem complexity for tabular tile coding. We compare four versions of XCSF with tile coding (each adapting a different set of parameters) to tabular tile coding on four problems of increasing complexity. We show that our system (1) needs fewer training problems than standard tile coding to reach an optimal policy; (2) it can evolve adequate coding parameters in each subspace without any previous knowledge; and that (3) even when XCSF is not allowed to evolve these parameters, the genetic algorithm will still adapt classifier conditions to properly decompose the problem into subspaces thus being much less sensitive to the parameter settings than tabular tile coding.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Assessing Policy, Loss and Planning Combinations in Reinforcement Learning Using a New Modular Architecture

CostNet: An End-to-End Framework for Goal-Directed Reinforcement Learning

Fitness Landscape Features and Reward Shaping in Reinforcement Learning Policy Spaces

References

Boyan JA, Moore AW (1995) Generalization in reinforcement learning: safely approximating the value function. In: Tesauro G, Touretzky DS, Leen TK (eds) Advances in neural information processing systems 7. The MIT Press, Cambridge, pp 369–376
Google Scholar
Butz MV, Sigaud O (2012) XCSF with local deletion: preventing detrimental forgetting. Evol Intell 5(2):117–127
Article Google Scholar
Glantz SA, Slinker BK (2001) Primer of applied regression & analysis of variance, 2nd edn. McGraw-Hill, NY
Google Scholar
Glaubius R, Smart WD (2004) Manifold representations for value-function approximation. In: Pucci de Farias D, Mannor S, Precup D, Theocharous G (eds) Learning and planning in Markov processes—advances and challenges: papers from the 2004 AAAI workshop, pp 13–18. Available in AAAI Technical Report WS-04-08
Howard GD, Bull L, Lanzi P-L (2008) Self-adaptive constructivism in neural XCS and XCSF. In: Proceedings of the 10th annual conference on genetic and evolutionary computation, GECCO ’08. ACM, New York, pp 1389–1396
Howard GD, Bull L, Lanzi P-L (2009) Towards continuous actions in continuous space and time using self-adaptive constructivism in neural XCSF. In: Proceedings of the 11th annual conference on genetic and evolutionary computation, GECCO ’09. ACM, New York, pp 1219–1226
Howard GD, Bull L, Lanzi P-L (2010) Use of a connection-selection scheme in neural XCSF. In: Bacardit J, Browne W, Drugowitsch J, Bernad-Mansilla E, Butz MV (eds) Learning classifier systems, volume 6471 of lecture notes in computer science. Springer, Berlin, pp 87–106
Google Scholar
Kneissler J, Stalph PO, Drugowitsch J, Butz MV (2012) Filtering sensory information with XCSF: improving learning robustness and control performance. In: Soule T, Moore JH (eds) Genetic and evolutionary computation conference, GECCO ’12, Philadelphia, PA, USA, July 7–11, 2012. ACM, New York, pp 871–878
Google Scholar
Kneissler J, Stalph PO, Drugowitsch J, Butz MV (2012) Filtering sensory information with XCSF: improving learning robustness and control performance. In: Proceedings of the 14th annual conference on genetic and evolutionary computation, GECCO ’12. ACM, New York, pp 871–878
Kneissler J, Stalph PO, Drugowitsch J, Butz MV (2014) Filtering sensory information with XCSF: improving learning robustness and robot arm control performance. Evol Comput 22(1):139–158
Article Google Scholar
Lanzi PL, Loiacono D, Wilson SW, Goldberg DE (2005) Extending XCSF beyond linear approximation. In: Beyer H-G (ed) Genetic and evolutionary computation—GECCO-2005. ACM Press, Washington DC, pp 1859–1866
Lanzi PL, Loiacono D, Wilson SW, Goldberg DE (2005) XCS with computed prediction for the learning of boolean functions. In: Proceedings of the IEEE congress on evolutionary computation—CEC-2005, pp 588–595, Edinburgh, UK. IEEE
Lanzi PL, Loiacono D, Wilson SW, Goldberg DE (2005) XCS with computed prediction in continuous multistep environments. In: Proceedings of the IEEE congress on evolutionary computation—CEC-2005, pp 2032–2039, Edinburgh, UK. IEEE
Lanzi PL, Loiacono D, Wilson SW, Goldberg DE (2005) XCS with computed prediction in multistep environments. In: Beyer H-G (ed) Genetic and evolutionary computation—GECCO-2005. ACM Press, Washington DC, pp 1827–1834
Lanzi PL, Loiacono D, Wilson SW, Goldberg DE (2006) Classifier prediction based on tile coding. In: GECCO ’06: proceedings of the 8th annual conference on genetic and evolutionary computation. ACM Press, New York, pp 1497–1504
Lanzi PL, Loiacono D, Zanini M (2007) Evolving classifiers ensembles with heterogeneous predictors. In: Bacardit J, Bernadó-Mansilla E, Butz MV, Kovacs T, Llorà X, Takadama K (eds) Learning classifier systems, 10th international workshop, IWLCS 2006, Seattle, MA, USA, July 8, 2006 and 11th International Workshop, IWLCS 2007, London, UK, July 8, 2007, revised selected papers, volume 4998 of lecture notes in computer science. Springer, Berlin, pp 218–234
Lanzi PL, Loiacono D, Zanini M (2008) Evolving classifier ensembles with voting predictors. In: IEEE congress on evolutionary computation, pp 3760–3767. IEEE
Loiacono D, Lanzi PL (2008) Recursive least squares and quadratic prediction in continuous multistep problems. In: Ryan C, Keijzer M (eds) GECCO (Companion). ACM, New York, pp 1985–1992
Chapter Google Scholar
Marin D, Decock J, Rigoux L, Sigaud O (2011) Learning cost-efficient control policies with XCSF: generalization capabilities and further improvement. In: Proceedings of the 13th annual conference on genetic and evolutionary computation, GECCO ’11. ACM, New York, pp 1235–1242
Nakata M, Sato F, Takadama K (2011) Towards generalization by identification-based xcs in multi-steps problem. In: Nature and biologically inspired computing (NaBIC), 2011 third world congress on, pp 389–394
Nakata M, Lanzi PL, Takadama K (2012) Enhancing learning capabilities by XCS with best action mapping. In: Coello CAC, Cutello V, Deb K, Forrest S, Nicosia G, Pavone M (eds) Parallel problem solving from nature–PPSN XII, volume 7491 of lecture notes in computer science. Springer, Berlin, pp 256–265
Google Scholar
Piater JH, Cohen PR, Zhang X, Atighetchi M (1998) A randomized ANOVA procedure for comparing performance curves. In: Machine learning: proceedings of the fifteenth international conference (ICML), pp 430–438, Madison, Wisconsin. Morgan Kaufmann, San Mateo, CA, USA
Ramírez-Ruiz JA, Valenzuela-Rendón M, Terashima-Marín H (2008) A new approach to fuzzy lcss in two-dimensional continuous multistep environment with continuous vector actions. In: Proceedings of the 10th annual conference on genetic and evolutionary computation, GECCO ’08. ACM, New York, pp 1433–1434
Ramírez-Ruiz JA, Valenzuela-Rendón M, Terashima-Marín H (2008) QFCS: a fuzzy LCS in continuous multi-step environments with continuous vector actions. In: Rudolph G, Jansen T, Lucas SM, Poloni C, Beume N (eds) Parallel problem solving from nature—PPSN X, 10th international conference Dortmund, Germany, September 13–17, 2008, proceedings, volume 5199 of lecture notes in computer science. Springer, Berlin, pp 286–295
Ramírez-Ruiz JA, Valenzuela-Rendón M, Terashima-Marín H (2009) uQFCS: QFCS with unfixed fuzzy sets in continuous multi-step environments with continuous vector actions. In: Proceedings of the 11th annual conference on genetic and evolutionary computation, GECCO ’09. ACM, New York, pp 1307–1314
Sherstov AA, Stone P (2005) Function approximation via tile coding: automating parameter choice. In: Proceedins of symposium on abstraction, reformulation, and approximation (SARA-05), Edinburgh, Scotland, UK
Sicard G, Salaün C, Ivaldi S, Padois V, Sigaud O (2011) Learning the velocity kinematics of ICUB for model-based control: XCSF versus LWPR. In: 11th IEEE-RAS international conference on humanoid robots (Humanoids 2011), Bled, Slovenia, October 26–28, 2011, pp 570–575. IEEE
Stalph PO, Butz MV (2012) Guided evolution in xcsf. In: Proceedings of the 14th annual conference on genetic and evolutionary computation, GECCO ’12, ACM, New York, pp 911–918
Stalph PO, Rubinsztajn J, Sigaud O, Butz MV (2012) Function approximation with LWPR and XCSF: a comparative study. Evol Intell 5(2):103–116
Article Google Scholar
Sutton RS (1996) Generalization in reinforcement learning: successful examples using sparse coarse coding. In: Touretzky DS, Mozer MC, Hasselmo ME (eds) Advances in neural information processing systems 8. The MIT Press, Cambridge, pp 1038–1044
Google Scholar
Sutton RS, Barto AG (1998) Reinforcement learning—an introduction. MIT Press, Cambridge
Google Scholar
Tesauro G (1994) TD-gammon, a self-teaching backgammon program, achieves master-level play. Neural Comput 6(2):215–219
Article Google Scholar
Wilson SW (1995) Classifier fitness based on accuracy. Evol Comput 3(2):149–175
Article Google Scholar
Wilson SW (2002) Classifiers that approximate functions. Natural Comput 1(2–3):211–234
Article MATH Google Scholar

Download references

Acknowledgments

Pier Luca wish to thank Stewart Wilson, a mentor since 1997 for anything related to English writing and learning classifier systems. The authors wish to thank the reviewers for their invaluable comments and suggestions regarding possible extensions of the approach using a more competent genetic algorithm.

Author information

Authors and Affiliations

Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Milan, Italy
Pier Luca Lanzi & Daniele Loiacono

Authors

Pier Luca Lanzi
View author publications
You can also search for this author in PubMed Google Scholar
Daniele Loiacono
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pier Luca Lanzi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lanzi, P.L., Loiacono, D. XCSF with tile coding in discontinuous action-value landscapes. Evol. Intel. 8, 117–132 (2015). https://doi.org/10.1007/s12065-015-0129-7

Download citation

Received: 31 July 2014
Revised: 23 February 2015
Accepted: 06 March 2015
Published: 11 April 2015
Issue Date: September 2015
DOI: https://doi.org/10.1007/s12065-015-0129-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

XCSF with tile coding in discontinuous action-value landscapes

Abstract

Access this article

Similar content being viewed by others

Assessing Policy, Loss and Planning Combinations in Reinforcement Learning Using a New Modular Architecture

CostNet: An End-to-End Framework for Goal-Directed Reinforcement Learning

Fitness Landscape Features and Reward Shaping in Reinforcement Learning Policy Spaces

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

XCSF with tile coding in discontinuous action-value landscapes

Abstract

Access this article

Similar content being viewed by others

Assessing Policy, Loss and Planning Combinations in Reinforcement Learning Using a New Modular Architecture

CostNet: An End-to-End Framework for Goal-Directed Reinforcement Learning

Fitness Landscape Features and Reward Shaping in Reinforcement Learning Policy Spaces

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation