Robust Risk-Averse Stochastic Multi-armed Bandits

  • Odalric-Ambrym Maillard
Conference paper

DOI: 10.1007/978-3-642-40935-6_16

Part of the Lecture Notes in Computer Science book series (LNCS, volume 8139)
Cite this paper as:
Maillard OA. (2013) Robust Risk-Averse Stochastic Multi-armed Bandits. In: Jain S., Munos R., Stephan F., Zeugmann T. (eds) Algorithmic Learning Theory. ALT 2013. Lecture Notes in Computer Science, vol 8139. Springer, Berlin, Heidelberg


We study a variant of the standard stochastic multi-armed bandit problem when one is not interested in the arm with the best mean, but instead in the arm maximizing some coherent risk measure criterion. Further, we are studying the deviations of the regret instead of the less informative expected regret. We provide an algorithm, called RA-UCB to solve this problem, together with a high probability bound on its regret.


Multi-armed bandits coherent risk measure cumulant generative function concentration of measure 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Odalric-Ambrym Maillard
    • 1
  1. 1.Faculty of Electrical EngineeringTechnionHaifaIsrael

Personalised recommendations