Skip to main content

Meta-learning of Exploration/Exploitation Strategies: The Multi-armed Bandit Case

  • Conference paper

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 358))

Abstract

The exploration/exploitation (E/E) dilemma arises naturally in many subfields of Science. Multi-armed bandit problems formalize this dilemma in its canonical form. Most current research in this field focuses on generic solutions that can be applied to a wide range of problems. However, in practice, it is often the case that a form of prior information is available about the specific class of target problems. Prior knowledge is rarely used in current solutions due to the lack of a systematic approach to incorporate it into the E/E strategy.

To address a specific class of E/E problems, we propose to proceed in three steps: (i) model prior knowledge in the form of a probability distribution over the target class of E/E problems; (ii) choose a large hypothesis space of candidate E/E strategies; and (iii), solve an optimization problem to find a candidate E/E strategy of maximal average performance over a sample of problems drawn from the prior distribution.

We illustrate this meta-learning approach with two different hypothesis spaces: one where E/E strategies are numerically parameterized and another where E/E strategies are represented as small symbolic formulas. We propose appropriate optimization algorithms for both cases. Our experiments, with two-armed “Bernoulli” bandit problems and various playing budgets, show that the meta-learnt E/E strategies outperform generic strategies of the literature (UCB1, UCB1-Tuned, UCB-V, KL-UCB and ε n -Greedy); they also evaluate the robustness of the learnt E/E strategies, by tests carried out on arms whose rewards follow a truncated Gaussian distribution.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Robbins, H.: Some aspects of the sequential design of experiments. Bulletin of The American Mathematical Society 58, 527–536 (1952)

    Article  MathSciNet  MATH  Google Scholar 

  2. Lai, T., Robbins, H.: Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics 6, 4–22 (1985)

    Article  MathSciNet  MATH  Google Scholar 

  3. Agrawal, R.: Sample mean based index policies with o(log n) regret for the multi-armed bandit problem. Advances in Applied Mathematics 27, 1054–1078 (1995)

    MATH  Google Scholar 

  4. Auer, P., Fischer, P., Cesa-Bianchi, N.: Finite-time analysis of the multi-armed bandit problem. Machine Learning 47, 235–256 (2002)

    Article  MATH  Google Scholar 

  5. Audibert, J.-Y., Munos, R., Szepesvári, C.: Tuning Bandit Algorithms in Stochastic Environments. In: Hutter, M., Servedio, R.A., Takimoto, E. (eds.) ALT 2007. LNCS (LNAI), vol. 4754, pp. 150–165. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  6. Audibert, J., Munos, R., Szepesvari, C.: Exploration-exploitation trade-off using variance estimates in multi-armed bandits. In: Theoretical Computer Science (2008)

    Google Scholar 

  7. Maes, F., Wehenkel, L., Ernst, D.: Learning to play K-armed bandit problems. In: Proc. of the 4th International Conference on Agents and Artificial Intelligence (2012)

    Google Scholar 

  8. Maes, F., Wehenkel, L., Ernst, D.: Automatic Discovery of Ranking Formulas for Playing with Multi-armed Bandits. In: Sanner, S., Hutter, M. (eds.) EWRL 2011. LNCS, vol. 7188, pp. 5–17. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  9. Gonzalez, C., Lozano, J., Larrañaga, P.: Estimation of Distribution Algorithms. A New Tool for Evolutionary Computation. Kluwer Academic Publishers (2002)

    Google Scholar 

  10. Pelikan, M., Mühlenbein, H.: Marginal distributions in evolutionary algorithms. In: Proceedings of the 4th International Conference on Genetic Algorithms (1998)

    Google Scholar 

  11. Bubeck, S., Munos, R., Stoltz, G.: Pure Exploration in Multi-armed Bandits Problems. In: Gavaldà, R., Lugosi, G., Zeugmann, T., Zilles, S. (eds.) ALT 2009. LNCS, vol. 5809, pp. 23–37. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  12. Bubeck, S., Munos, R., Stoltz, G., Szepesvári, C.: X-armed bandits. Journal of Machine Learning Research 12, 1655–1695 (2011)

    Google Scholar 

  13. Garivier, A., Cappé, O.: The KL-UCB algorithm for bounded stochastic bandits and beyond. CoRR abs/1102.2490 (2011)

    Google Scholar 

  14. Rubenstein, R., Kroese, D.: The cross-entropy method: a unified approach to combinatorial optimization, Monte-Carlo simluation, and machine learning. Springer, New York (2004)

    Google Scholar 

  15. Castronovo, M., Maes, F., Fonteneau, R., Ernst, D.: Learning exploration/exploitation strategies for single trajectory reinforcement learning. In: Proc. of 10th European Workshop on Reinforcement Learning (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Maes, F., Wehenkel, L., Ernst, D. (2013). Meta-learning of Exploration/Exploitation Strategies: The Multi-armed Bandit Case. In: Filipe, J., Fred, A. (eds) Agents and Artificial Intelligence. ICAART 2012. Communications in Computer and Information Science, vol 358. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36907-0_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-36907-0_7

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-36906-3

  • Online ISBN: 978-3-642-36907-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics