A Simple Distribution-Free Approach to the Max k-Armed Bandit Problem

  • Matthew J. Streeter
  • Stephen F. Smith
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4204)


The max k-armed bandit problem is a recently-introduced online optimization problem with practical applications to heuristic search. Given a set of k slot machines, each yielding payoff from a fixed (but unknown) distribution, we wish to allocate trials to the machines so as to maximize the maximum payoff received over a series of n trials. Previous work on the max k-armed bandit problem has assumed that payoffs are drawn from generalized extreme value (GEV) distributions. In this paper we present a simple algorithm, based on an algorithm for the classical k-armed bandit problem, that solves the max k-armed bandit problem effectively without making strong distributional assumptions. We demonstrate the effectiveness of our approach by applying it to the task of selecting among priority dispatching rules for the resource-constrained project scheduling problem with maximal time lags (RCPSP/max).


Generalize Extreme Value Feasible Schedule Slot Machine Generalize Extreme Value Distribution Project Schedule Problem 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Machine Learning 47, 235–256 (2002a)zbMATHCrossRefGoogle Scholar
  2. 2.
    Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.E.: The nonstochastic multiarmed bandit problem. SIAM Journal on Computing 32(1), 48–77 (2002b)zbMATHCrossRefMathSciNetGoogle Scholar
  3. 3.
    Berry, D.A., Fristedt, B.: Bandit Problems: Sequential Allocation of Experiments. Chapman and Hall, London (1986)Google Scholar
  4. 4.
    Cicirello, V.A., Smith, S.F.: Heuristic selection for stochastic search optimization: Modeling solution quality by extreme value theory. In: Proceedings of the 10th International Conference on Principles and Practice of Constraint Programming, pp. 197–211 (2004)Google Scholar
  5. 5.
    Cicirello, V.A., Smith, S.F.: The max k-armed bandit: A new model of exploration applied to search heuristic selection. In: Proceedings of the Twentieth National Conference on Artificial Intelligence, pp. 1355–1361 (2005)Google Scholar
  6. 6.
    Coles, S.: An Introduction to Statistical Modeling of Extreme Values. Springer, London (2001)zbMATHGoogle Scholar
  7. 7.
    Kaelbling, L.P.: Learning in Embedded Systems. The MIT Press, Cambridge (1993)Google Scholar
  8. 8.
    Lai, T.L.: Adaptive treatment allocation and the multi-armed bandit problem. The Annals of Statistics 15(3), 1091–1114 (1987)zbMATHCrossRefMathSciNetGoogle Scholar
  9. 9.
    Möhring, R.H., Schulz, A.S., Stork, F., Uetz, M.: Solving project scheduling problems by minimum cut computations. Management Science 49(3), 330–350 (2003)CrossRefGoogle Scholar
  10. 10.
    Neumann, K., Schwindt, C., Zimmerman, J.: Project Scheduling with Time Windows and Scarce Resources. Springer, Heidelberg (2002)Google Scholar
  11. 11.
    Robbins, H.: Some aspects of sequential design of experiments. Bulletin of the American Mathematical Society 58, 527–535 (1952)zbMATHCrossRefMathSciNetGoogle Scholar
  12. 12.
    Schwindt, C.: Generation of resource–constrained project scheduling problems with minimal and maximal time lags. Technical Report WIOR-489, Universität Karlsruhe (1996)Google Scholar
  13. 13.
    Streeter, M.J., Smith, S.F.: An asymptotically optimal algorithm for the max k-armed bandit problem. In: Proceedings of the Twenty-First National Conference on Artificial Intelligence (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Matthew J. Streeter
    • 1
  • Stephen F. Smith
    • 2
  1. 1.Computer Science Department and Center for the Neural Basis of Cognition 
  2. 2.The Robotics InstituteCarnegie Mellon UniversityPittsburgh

Personalised recommendations