Robust Risk-Averse Stochastic Multi-armed Bandits
- Cite this paper as:
- Maillard OA. (2013) Robust Risk-Averse Stochastic Multi-armed Bandits. In: Jain S., Munos R., Stephan F., Zeugmann T. (eds) Algorithmic Learning Theory. ALT 2013. Lecture Notes in Computer Science, vol 8139. Springer, Berlin, Heidelberg
We study a variant of the standard stochastic multi-armed bandit problem when one is not interested in the arm with the best mean, but instead in the arm maximizing some coherent risk measure criterion. Further, we are studying the deviations of the regret instead of the less informative expected regret. We provide an algorithm, called RA-UCB to solve this problem, together with a high probability bound on its regret.
KeywordsMulti-armed bandits coherent risk measure cumulant generative function concentration of measure
Unable to display preview. Download preview PDF.