Sub-sampling for Multi-armed Bandits

  • Akram Baransi
  • Odalric-Ambrym Maillard
  • Shie Mannor
Conference paper

DOI: 10.1007/978-3-662-44848-9_8

Part of the Lecture Notes in Computer Science book series (LNCS, volume 8724)
Cite this paper as:
Baransi A., Maillard OA., Mannor S. (2014) Sub-sampling for Multi-armed Bandits. In: Calders T., Esposito F., Hüllermeier E., Meo R. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2014. Lecture Notes in Computer Science, vol 8724. Springer, Berlin, Heidelberg

Abstract

The stochastic multi-armed bandit problem is a popular model of the exploration/exploitation trade-off in sequential decision problems. We introduce a novel algorithm that is based on sub-sampling. Despite its simplicity, we show that the algorithm demonstrates excellent empirical performances against state-of-the-art algorithms, including Thompson sampling and KL-UCB. The algorithm is very flexible, it does need to know a set of reward distributions in advance nor the range of the rewards. It is not restricted to Bernoulli distributions and is also invariant under rescaling of the rewards. We provide a detailed experimental study comparing the algorithm to the state of the art, the main intuition that explains the striking results, and conclude with a finite-time regret analysis for this algorithm in the simplified two-arm bandit setting.

Keywords

Multi-armed Bandits Sub-sampling Reinforcement Learning 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  • Akram Baransi
    • 1
  • Odalric-Ambrym Maillard
    • 1
  • Shie Mannor
    • 1
  1. 1.Department of Electrical EngineeringTechnion - Israel Institute of TechnologyHaifaIsrael

Personalised recommendations