Reference Work Entry

Encyclopedia of Algorithms

pp 1356-1359


Multi-armed Bandit Problem

  • Nicolò Cesa-BianchiAffiliated withDipartimento di Informatica, Università degli Studi di Milano Email author 


Adaptive allocation Regret minimization Repeated games Sequential experiment design

Years and Authors of Summarized Original Work

  • 2002; Auer, Cesa-Bianchi, Freund, Schapire

  • 2002; Auer, Cesa-Bianchi, Fischer

Problem Definition

A multi-armed bandit is a sequential decision problem defined on a set of actions. At each time step, the decision maker selects an action from the set and obtains an observable payoff. The goal is to maximize the total payoff obtained in a sequence of decisions. The name bandit refers to the colloquial term for a slot machine (“one-armed bandit” in American slang) and to the decision problem, faced by a casino gambler, of choosing which slot machine to play next. Bandit problems naturally address the fundamental trade-off between exploration and exploitation in sequential experiments. Indeed, the decision maker must use a strategy (called allocation policy) able to balance the exploitation of actions that did well in the past with the exploration of actions that might give higher payoffs in the future. Although the original motivation came from clin ...

This is an excerpt from the content