Algorithm Selection as a Bandit Problem with Unbounded Losses

  • Matteo Gagliolo
  • Jürgen Schmidhuber
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6073)


Algorithm selection is typically based on models of algorithm performance learned during a separate offline training sequence, which can be prohibitively expensive. In recent work, we adopted an online approach, in which a performance model is iteratively updated and used to guide selection on a sequence of problem instances. The resulting exploration-exploitation trade-off was represented as a bandit problem with expert advice, using an existing solver for this game, but this required the setting of an arbitrary bound on algorithm runtimes, thus invalidating the optimal regret of the solver. In this paper, we propose a simpler framework for representing algorithm selection as a bandit problem, with partial information, and an unknown bound on losses. We adapt an existing solver to this game, proving a bound on its expected regret, which holds also for the resulting algorithm selection technique. We present experiments with a set of SAT solvers on a mixed SAT-UNSAT benchmark.


Problem Instance Algorithm Selection Bandit Problem Cumulative Loss Algorithm Portfolio 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Rice, J.R.: The algorithm selection problem. In: Rubinoff, M., Yovits, M.C. (eds.) Advances in Computers, vol. 15, pp. 65–118. Academic Press, New York (1976)Google Scholar
  2. 2.
    Vilalta, R., Drissi, Y.: A perspective view and survey of meta-learning. Artif. Intell. Rev. 18(2), 77–95 (2002)CrossRefGoogle Scholar
  3. 3.
    Gagliolo, M., Zhumatiy, V., Schmidhuber, J.: Adaptive online time allocation to search algorithms. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol. 3201, pp. 134–143. Springer, Heidelberg (2004)Google Scholar
  4. 4.
    Gagliolo, M., Schmidhuber, J.: A neural network model for inter-problem adaptive online time allocation. In: Duch, W., Kacprzyk, J., Oja, E., Zadrożny, S. (eds.) ICANN 2005. LNCS, vol. 3697, pp. 7–12. Springer, Heidelberg (2005)Google Scholar
  5. 5.
    Gagliolo, M., Schmidhuber, J.: Learning dynamic algorithm portfolios. Annals of Mathematics and Artificial Intelligence 47(3-4), 295–328 (2006); AI&MATH 2006 Special Issue zbMATHCrossRefMathSciNetGoogle Scholar
  6. 6.
    Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.E.: The nonstochastic multiarmed bandit problem. SIAM J. Comput. 32(1), 48–77 (2003)CrossRefMathSciNetGoogle Scholar
  7. 7.
    Gagliolo, M., Schmidhuber, J.: Learning restart strategies. In: Veloso, M.M. (ed.) IJCAI 2007 – Twentieth International Joint Conference on Artificial Intelligence, January 2007, vol. 1, pp. 792–797. AAAI Press, Menlo Park (2007)Google Scholar
  8. 8.
    Cesa-Bianchi, N., Mansour, Y., Stoltz, G.: Improved second-order bounds for prediction with expert advice. In: Auer, P., Meir, R. (eds.) COLT 2005. LNCS (LNAI), vol. 3559, pp. 217–232. Springer, Heidelberg (2005)Google Scholar
  9. 9.
    Hoos, H.H., Stützle, T.: Local search algorithms for SAT: An empirical evaluation. Journal of Automated Reasoning 24(4), 421–481 (2000)zbMATHCrossRefGoogle Scholar
  10. 10.
    Hutter, F., Hamadi, Y.: Parameter adjustment based on performance prediction: Towards an instance-aware problem solver. Technical Report MSR-TR-2005-125, Microsoft Research, Cambridge, UK (December 2005)Google Scholar
  11. 11.
    Petrik, M.: Statistically optimal combination of algorithms. Presented at SOFSEM 2005 - 31st Annual Conference on Current Trends in Theory and Practice of Informatics (2005)Google Scholar
  12. 12.
    Fürnkranz, J.: On-line bibliography on meta-learning. In: EU ESPRIT METAL Project (26.357): A Meta-Learning Assistant for Providing User Support in Machine Learning Mining (2001)Google Scholar
  13. 13.
    Giraud-Carrier, C., Vilalta, R., Brazdil, P.: Introduction to the special issue on meta-learning. Machine Learning 54(3), 187–193 (2004)CrossRefGoogle Scholar
  14. 14.
    Leyton-Brown, K., Nudelman, E., Shoham, Y.: Learning the empirical hardness of optimization problems: The case of combinatorial auctions. In: Van Hentenryck, P. (ed.) CP 2002. LNCS, vol. 2470, p. 556. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  15. 15.
    Nudelman, E., Leyton-Brown, K., Hoos, H.H., Devkar, A., Shoham, Y.: Understanding random sat: Beyond the clauses-to-variables ratio. In: Wallace, M. (ed.) CP 2004. LNCS, vol. 3258, pp. 438–452. Springer, Heidelberg (2004)Google Scholar
  16. 16.
    Huberman, B.A., Lukose, R.M., Hogg, T.: An economic approach to hard computational problems. Science 275, 51–54 (1997)CrossRefGoogle Scholar
  17. 17.
    Gomes, C.P., Selman, B.: Algorithm portfolios. Artificial Intelligence 126(1-2), 43–62 (2001)zbMATHCrossRefMathSciNetGoogle Scholar
  18. 18.
    Petrik, M., Zilberstein, S.: Learning static parallel portfolios of algorithms. In: Ninth International Symposium on Artificial Intelligence and Mathematics (2006)Google Scholar
  19. 19.
    Kautz, H.A., Horvitz, E., Ruan, Y., Gomes, C.P., Selman, B.: Dynamic restart policies. In: AAAI/IAAI, pp. 674–681 (2002)Google Scholar
  20. 20.
    Sutton, R., Barto, A.: Reinforcement learning: An introduction. MIT Press, Cambridge (1998)Google Scholar
  21. 21.
    Lagoudakis, M.G., Littman, M.L.: Algorithm selection using reinforcement learning. In: Proc. 17th ICML, pp. 511–518. Morgan Kaufmann, San Francisco (2000)Google Scholar
  22. 22.
    Finkelstein, L., Markovitch, S., Rivlin, E.: Optimal schedules for parallelizing anytime algorithms: the case of independent processes. In: Eighteenth national conference on Artificial intelligence, pp. 719–724. AAAI Press, Menlo Park (2002)Google Scholar
  23. 23.
    Finkelstein, L., Markovitch, S., Rivlin, E.: Optimal schedules for parallelizing anytime algorithms: The case of shared resources. Journal of Artificial Intelligence Research 19, 73–138 (2003)zbMATHMathSciNetGoogle Scholar
  24. 24.
    Sayag, T., Fine, S., Mansour, Y.: Combining multiple heuristics. In: Durand, B., Thomas, W. (eds.) STACS 2006. LNCS, vol. 3884, pp. 242–253. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  25. 25.
    Streeter, M.J., Golovin, D., Smith, S.F.: Combining multiple heuristics online. In: Proceedings of the Twenty-Second AAAI Conference on Artificial Intelligence, Vancouver, British Columbia, Canada, July 22-26, pp. 1197–1203. AAAI Press, Menlo Park (2007)Google Scholar
  26. 26.
    Streeter, M., Smith, S.F.: New techniques for algorithm portfolio design. In: UAI 2008: Proceedings of the 24th Conference on Uncertainty in Artificial Intelligence (2008)Google Scholar
  27. 27.
    Beck, C.J., Freuder, E.C.: Simple rules for low-knowledge algorithm selection. In: CPAIOR, pp. 50–64 (2004)Google Scholar
  28. 28.
    Carchrae, T., Beck, J.C.: Applying machine learning to low knowledge control of optimization algorithms. Computational Intelligence 21(4), 373–387 (2005)CrossRefMathSciNetGoogle Scholar
  29. 29.
    Cicirello, V.A., Smith, S.F.: The max k-armed bandit: A new model of exploration applied to search heuristic selection. In: Twentieth National Conference on Artificial Intelligence, pp. 1355–1361. AAAI Press, Menlo Park (2005)Google Scholar
  30. 30.
    Streeter, M.J., Smith, S.F.: An asymptotically optimal algorithm for the max k-armed bandit problem. In: Twenty-First National Conference on Artificial Intelligence. AAAI Press, Menlo Park (2006)Google Scholar
  31. 31.
    Smith-Miles, K.A.: Cross-disciplinary perspectives on meta-learning for algorithm selection. ACM Computing Surveys 41(1), 1–25 (2008)CrossRefGoogle Scholar
  32. 32.
    Robbins, H.: Some aspects of the sequential design of experiments. Bulletin of the AMS 58, 527–535 (1952)zbMATHCrossRefMathSciNetGoogle Scholar
  33. 33.
    Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.E.: Gambling in a rigged casino: the adversarial multi-armed bandit problem. In: Proceedings of the 36th Annual Symposium on Foundations of Computer Science, pp. 322–331. IEEE Computer Society Press, Los Alamitos (1995)Google Scholar
  34. 34.
    Gomes, C.P., Selman, B., Crato, N., Kautz, H.: Heavy-tailed phenomena in satisfiability and constraint satisfaction problems. J. Autom. Reason. 24(1-2), 67–100 (2000)zbMATHCrossRefMathSciNetGoogle Scholar
  35. 35.
    Cesa-Bianchi, N., Mansour, Y., Stoltz, G.: Improved second-order bounds for prediction with expert advice. Machine Learning 66(2-3), 321–352 (2007)CrossRefGoogle Scholar
  36. 36.
    Allenberg, C., Auer, P., Györfi, L., Ottucsák, G.: Hannan consistency in on-line learning in case of unbounded losses under partial monitoring. In: Balcázar, J.L., Long, P.M., Stephan, F. (eds.) ALT 2006. LNCS (LNAI), vol. 4264, pp. 229–243. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  37. 37.
    Littlestone, N., Warmuth, M.K.: The weighted majority algorithm. Inf. Comput. 108(2), 212–261 (1994)zbMATHCrossRefMathSciNetGoogle Scholar
  38. 38.
    Gagliolo, M., Schmidhuber, J.: Towards distributed algorithm portfolios. In: Corchado, J.M., et al. (eds.) International Symposium on Distributed Computing and Artificial Intelligence 2008 (DCAI 2008). Advances in Soft Computing, vol. 50, pp. 634–643. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  39. 39.
    Li, C.M., Huang, W.: Diversification and determinism in local search for satisfiability. In: Bacchus, F., Walsh, T. (eds.) SAT 2005. LNCS, vol. 3569, pp. 158–172. Springer, Heidelberg (2005)Google Scholar
  40. 40.
    Hoos, H.H., Stützle, T.: SATLIB: An Online Resource for Research on SAT. In: Gent, I.P., et al. (eds.) SAT 2000, pp. 283–292 (2000),
  41. 41.
    Cesa-Bianchi, N.: Personal Communication (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Matteo Gagliolo
    • 1
    • 2
  • Jürgen Schmidhuber
    • 1
    • 2
  1. 1.IDSIAManno (Lugano)Switzerland
  2. 2.Faculty of InformaticsUniversity of LuganoLuganoSwitzerland

Personalised recommendations