A Simple Distribution-Free Approach to the Max k-Armed Bandit Problem
The max k-armed bandit problem is a recently-introduced online optimization problem with practical applications to heuristic search. Given a set of k slot machines, each yielding payoff from a fixed (but unknown) distribution, we wish to allocate trials to the machines so as to maximize the maximum payoff received over a series of n trials. Previous work on the max k-armed bandit problem has assumed that payoffs are drawn from generalized extreme value (GEV) distributions. In this paper we present a simple algorithm, based on an algorithm for the classical k-armed bandit problem, that solves the max k-armed bandit problem effectively without making strong distributional assumptions. We demonstrate the effectiveness of our approach by applying it to the task of selecting among priority dispatching rules for the resource-constrained project scheduling problem with maximal time lags (RCPSP/max).
Unable to display preview. Download preview PDF.
- 3.Berry, D.A., Fristedt, B.: Bandit Problems: Sequential Allocation of Experiments. Chapman and Hall, London (1986)Google Scholar
- 4.Cicirello, V.A., Smith, S.F.: Heuristic selection for stochastic search optimization: Modeling solution quality by extreme value theory. In: Proceedings of the 10th International Conference on Principles and Practice of Constraint Programming, pp. 197–211 (2004)Google Scholar
- 5.Cicirello, V.A., Smith, S.F.: The max k-armed bandit: A new model of exploration applied to search heuristic selection. In: Proceedings of the Twentieth National Conference on Artificial Intelligence, pp. 1355–1361 (2005)Google Scholar
- 7.Kaelbling, L.P.: Learning in Embedded Systems. The MIT Press, Cambridge (1993)Google Scholar
- 10.Neumann, K., Schwindt, C., Zimmerman, J.: Project Scheduling with Time Windows and Scarce Resources. Springer, Heidelberg (2002)Google Scholar
- 12.Schwindt, C.: Generation of resource–constrained project scheduling problems with minimal and maximal time lags. Technical Report WIOR-489, Universität Karlsruhe (1996)Google Scholar
- 13.Streeter, M.J., Smith, S.F.: An asymptotically optimal algorithm for the max k-armed bandit problem. In: Proceedings of the Twenty-First National Conference on Artificial Intelligence (2006)Google Scholar