Skip to main content

Advertisement

SpringerLink
Log in
Menu
Find a journal Publish with us
Search
Cart
Book cover

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

ECML PKDD 2012: Machine Learning and Knowledge Discovery in Databases pp 180–194Cite as

  1. Home
  2. Machine Learning and Knowledge Discovery in Databases
  3. Conference paper
Fast Reinforcement Learning with Large Action Sets Using Error-Correcting Output Codes for MDP Factorization

Fast Reinforcement Learning with Large Action Sets Using Error-Correcting Output Codes for MDP Factorization

  • Gabriel Dulac-Arnold21,
  • Ludovic Denoyer21,
  • Philippe Preux22 &
  • …
  • Patrick Gallinari21 
  • Conference paper
  • 5225 Accesses

  • 3 Citations

Part of the Lecture Notes in Computer Science book series (LNAI,volume 7524)

Abstract

The use of Reinforcement Learning in real-world scenarios is strongly limited by issues of scale. Most RL learning algorithms are unable to deal with problems composed of hundreds or sometimes even dozens of possible actions, and therefore cannot be applied to many real-world problems. We consider the RL problem in the supervised classification framework where the optimal policy is obtained through a multiclass classifier, the set of classes being the set of actions of the problem. We introduce error-correcting output codes (ECOCs) in this setting and propose two new methods for reducing complexity when using rollouts-based approaches. The first method consists in using an ECOC-based classifier as the multiclass classifier, reducing the learning complexity from \(\mathcal{O}(A^2)\) to \(\mathcal{O}(A \log(A))\). We then propose a novel method that profits from the ECOC’s coding dictionary to split the initial MDP into \(\mathcal{O}(\log(A))\) separate two-action MDPs. This second method reduces learning complexity even further, from \(\mathcal{O}(A^2)\) to \(\mathcal{O}(\log(A))\), thus rendering problems with large action sets tractable. We finish by experimentally demonstrating the advantages of our approach on a set of benchmark problems, both in speed and performance.

Keywords

  • Optimal Policy
  • Reinforcement Learn
  • Action Space
  • Markov Decision Process
  • Policy Iteration

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Download conference paper PDF

References

  1. Lazaric, A., Restelli, M., Bonarini, A.: Reinforcement Learning in Continuous Action Spaces through Sequential Monte Carlo Methods. In: Proc. of NIPS 2007 (2007)

    Google Scholar 

  2. Bubeck, S., Munos, R., Stoltz, G., Szepesvári, C., et al.: X-armed bandits. Journal of Machine Learning Research 12, 1655–1695 (2011)

    Google Scholar 

  3. Negoescu, D., Frazier, P., Powell, W.: The knowledge-gradient algorithm for sequencing experiments in drug discovery. INFORMS J. on Computing 23(3), 346–363 (2011)

    CrossRef  MathSciNet  MATH  Google Scholar 

  4. Dietterich, T., Bakiri, G.: Solving multiclass learning problems via error-correcting output codes. Jo. of Art. Int. Research 2, 263–286 (1995)

    MATH  Google Scholar 

  5. Lagoudakis, M.G., Parr, R.: Reinforcement learning as classification: Leveraging modern classifiers. In: Proc. of ICML 2003 (2003)

    Google Scholar 

  6. Bentley, J.L.: Multidimensional binary search trees used for associative searching. Communications of the ACM 18(9), 509–517 (1975)

    CrossRef  MathSciNet  MATH  Google Scholar 

  7. Lazaric, A., Ghavamzadeh, M., Munos, R.: Analysis of a classification-based policy iteration algorithm. In: Proc. of ICML 2010, pp. 607–614 (2010)

    Google Scholar 

  8. Sutton, R.: Generalization in reinforcement learning: Successful examples using sparse coarse coding. In: Proc. of NIPS 1996, pp. 1038–1044 (1996)

    Google Scholar 

  9. Berger, A.: Error-correcting output coding for text classification. In: Workshop on Machine Learning for Information Filtering, IJCAI 1999 (1999)

    Google Scholar 

  10. Dimitrakakis, C., Lagoudakis, M.G.: Rollout sampling approximate policy iteration. Machine Learning 72(3), 157–171 (2008)

    CrossRef  Google Scholar 

  11. Tham, C.: Modular on-line function approximation for scaling up reinforcement learning. PhD thesis, University of Cambridge (1994)

    Google Scholar 

  12. Tesauro, G.: Practical issues in temporal difference learning. Machine Learning 8, 257–277 (1992)

    MATH  Google Scholar 

  13. Tesauro, G., Galperin, G.R.: On-Line Policy Improvement Using Monte-Carlo Search. In: Proc. of NIPS 1997, pp. 1068–1074 (1997)

    Google Scholar 

  14. Pazis, J., Lagoudakis, M.G.: Reinforcement Learning in Multidimensional Continuous Action Spaces. In: Proc. of Adaptive Dynamic Programming and Reinf. Learn., pp. 97–104 (2011)

    Google Scholar 

  15. Pazis, J., Parr, R.: Generalized Value Functions for Large Action Sets. In: Proc. of ICML 2011, pp. 1185–1192 (2011)

    Google Scholar 

  16. Beygelzimer, A., Langford, J., Zadrozny, B.: Machine learning techniques reductions between prediction quality metrics. In: Performance Modeling and Engineering, pp. 3–28 (2008)

    Google Scholar 

  17. Crammer, K., Singer, Y.: On the Learnability and Design of Output Codes for Multiclass Problems. Machine Learning 47(2), 201–233 (2002)

    CrossRef  MATH  Google Scholar 

  18. Cissé, M., Artieres, T., Gallinari, P.: Learning efficient error correcting output codes for large hierarchical multi-class problems. In: Workshop on Large-Scale Hierarchical Classification ECML/PKDD 2011, pp. 37–49 (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

  1. LIP6, UPMC, Case 169 4 Place Jussieu, Paris, 75005, France

    Gabriel Dulac-Arnold, Ludovic Denoyer & Patrick Gallinari

  2. LIFL (UMR CNRS) & INRIA Lille Nord-Europe, Université de Lille, Villeneuve d’Ascq, France

    Philippe Preux

Authors
  1. Gabriel Dulac-Arnold
    View author publications

    You can also search for this author in PubMed Google Scholar

  2. Ludovic Denoyer
    View author publications

    You can also search for this author in PubMed Google Scholar

  3. Philippe Preux
    View author publications

    You can also search for this author in PubMed Google Scholar

  4. Patrick Gallinari
    View author publications

    You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

  1. Intelligent Systems Laboratory, University of Bristol, Merchant Venturers Building, Woodland Road, BS8 1UB, Bristol, UK

    Peter A. Flach

  2. Intelligent Systems Laboratory, University of Bristol, Merchant Venturers Building, Woodland Road,, BS8 1UB, Bristol, UK

    Tijl De Bie & Nello Cristianini & 

Rights and permissions

Reprints and Permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Dulac-Arnold, G., Denoyer, L., Preux, P., Gallinari, P. (2012). Fast Reinforcement Learning with Large Action Sets Using Error-Correcting Output Codes for MDP Factorization. In: Flach, P.A., De Bie, T., Cristianini, N. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2012. Lecture Notes in Computer Science(), vol 7524. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33486-3_12

Download citation

  • .RIS
  • .ENW
  • .BIB
  • DOI: https://doi.org/10.1007/978-3-642-33486-3_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-33485-6

  • Online ISBN: 978-3-642-33486-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Share this paper

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Search

Navigation

  • Find a journal
  • Publish with us

Discover content

  • Journals A-Z
  • Books A-Z

Publish with us

  • Publish your research
  • Open access publishing

Products and services

  • Our products
  • Librarians
  • Societies
  • Partners and advertisers

Our imprints

  • Springer
  • Nature Portfolio
  • BMC
  • Palgrave Macmillan
  • Apress
  • Your US state privacy rights
  • Accessibility statement
  • Terms and conditions
  • Privacy policy
  • Help and support

167.114.118.210

Not affiliated

Springer Nature

© 2023 Springer Nature