Bandit-Based Search for Constraint Programming

  • Manuel Loth
  • Michèle Sebag
  • Youssef Hamadi
  • Marc Schoenauer
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8124)


Constraint Programming (CP) solvers classically explore the solution space using tree-search based heuristics. Monte-Carlo Tree Search (MCTS), aimed at optimal sequential decision making under uncertainty, gradually grows a search tree to explore the most promising regions according to a specified reward function. At the crossroad of CP and MCTS, this paper presents the Bandit Search for Constraint Programming (BaSCoP) algorithm, adapting MCTS to the specifics of the CP search. This contribution relies on i) a generic reward function suited to CP and compatible with a multiple restart strategy; ii) the use of depth-first search as roll-out procedure in MCTS. BaSCoP, on the top of the Gecode constraint solver, is shown to significantly improve on depth-first search on some CP benchmark suites, demonstrating its relevance as a generic yet robust CP search method.


adaptive search value selection bandit UCB MCTS 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    van Beek, P.: Backtracking Search Algorithms. In: Handbook of Constraint Programming (Foundations of Artificial Intelligence), pp. 85–134. Elsevier Science Inc., New York (2006)CrossRefGoogle Scholar
  2. 2.
    Rice, J.: The algorithm selection problem. In: Advances in Computers, pp. 65–118 (1976)Google Scholar
  3. 3.
    Xu, L., Hutter, F., Hoos, H., Leyton-Brown, K.: Satzilla: Portfolio-based algorithm selection for SAT. JAIR 32, 565–606 (2008)zbMATHGoogle Scholar
  4. 4.
    O’Mahony, E., Hebrard, E., Holland, A., Nugent, C., O’Sullivan, B.: Using case-based reasoning in an algorithm portfolio for constraint solving. In: AICS (2008)Google Scholar
  5. 5.
    Samulowitz, H., Memisevic, R.: Learning to solve QBF. In: AAAI, 255–260 (2007)Google Scholar
  6. 6.
    Streeter, M., Golovin, D., Smith, S.: Combining multiple heuristics online. In: AAAI, pp. 1197–1203 (2007)Google Scholar
  7. 7.
    Hutter, F., Hoos, H.H., Leyton-Brown, K., Stützle, T.: Paramils: An automatic algorithm configuration framework. J. Artif. Intell. Res (JAIR) 36, 267–306 (2009)zbMATHGoogle Scholar
  8. 8.
    Sutton, R., Barto, A.: Reinforcement Learning: an introduction. MIT Press (1998)Google Scholar
  9. 9.
    Previti, A., Ramanujan, R., Schaerf, M., Selman, B.: Monte-carlo style UCT search for boolean satisfiability. In: Pirrone, R., Sorbello, F. (eds.) AI*IA 2011. LNCS, vol. 6934, pp. 177–188. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  10. 10.
    Runarsson, T.P., Schoenauer, M., Sebag, M.: Pilot, Rollout and Monte Carlo Tree Search Methods for Job Shop Scheduling. In: Hamadi, Y., Schoenauer, M. (eds.) LION 2012. LNCS, vol. 7219, pp. 160–174. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  11. 11.
    Sabharwal, A., Samulowitz, H., Reddy, C.: Guiding combinatorial optimization with UCT. In: Beldiceanu, N., Jussien, N., Pinson, É. (eds.) CPAIOR 2012. LNCS, vol. 7298, pp. 356–361. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  12. 12.
    Loth, M.: Hybridizing constraint programming and Monte-Carlo Tree Search: Application to the job shop problem. In: Nicosia, G., Pardalos, P. (eds.) Learning and Intelligent Optimization Conference (LION 7). Springer, Heidelberg (2013)Google Scholar
  13. 13.
    Lai, T., Robbins, H.: Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics 6, 4–22 (1985)MathSciNetCrossRefzbMATHGoogle Scholar
  14. 14.
    Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Machine Learning 47(2-3), 235–256 (2002)CrossRefzbMATHGoogle Scholar
  15. 15.
    Kocsis, L., Szepesvári, C.: Bandit based monte-carlo planning. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 282–293. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  16. 16.
    Gelly, S., Silver, D.: Combining online and offline knowledge in UCT. In: International Conference on Machine Learning, pp. 273–280. ACM (2007)Google Scholar
  17. 17.
    Ciancarini, P., Favini, G.: Monte-Carlo Tree Search techniques in the game of Kriegspiel. In: International Joint Conference on Artificial Intelligence, pp. 474–479 (2009)Google Scholar
  18. 18.
    Nakhost, H., Müller, M.: Monte-Carlo exploration for deterministic planning. In: Boutilier, C. (ed.) International Joint Conference on Artificial Intelligence, pp. 1766–1771 (2009)Google Scholar
  19. 19.
    Luby, M., Sinclair, A., Zuckerman, D.: Optimal speedup of las vegas algorithms. Information Processing Letters 47(4), 173–180 (1993)MathSciNetCrossRefzbMATHGoogle Scholar
  20. 20.
    Gecode Team: Gecode: Generic constraint development environment (2012),
  21. 21.
    Beck, J.C.: Solution-guided multi-point constructive search for job shop scheduling. Journal of Artificial Intelligence Research 29, 49–77 (2007)zbMATHGoogle Scholar
  22. 22.
    Mathon, R., Rosa, A.: Tables of parameters for BIBD’s with r ≤ 41 including existence, enumeration, and resolvability results. Ann. Discrete Math. 26, 275–308 (1985)MathSciNetGoogle Scholar
  23. 23.
    Hutter, F., Hamadi, Y., Hoos, H., Leyton-Brown, K.: Performance prediction and automated tuning of randomized and parametric algorithms. In: Benhamou, F. (ed.) CP 2006. LNCS, vol. 4204, pp. 213–228. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  24. 24.
    Haim, S., Walsh, T.: Restart strategy selection using machine learning techniques. In: Kullmann, O. (ed.) SAT 2009. LNCS, vol. 5584, pp. 312–325. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  25. 25.
    Epstein, S., Freuder, E., Wallace, R., Morozov, A., Samuels, B.: The adaptive constraint engine. In: Van Hentenryck, P. (ed.) CP 2002. LNCS, vol. 2470, pp. 525–540. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  26. 26.
    Wu, H., Van Beek, P.: Portfolios with deadlines for backtracking search. In: IJAIT, vol. 17, pp. 835–856 (2008)Google Scholar
  27. 27.
    Schneider, M., Hoos, H.H.: Quantifying homogeneity of instance sets for algorithm configuration. In: Hamadi, Y., Schoenauer, M. (eds.) LION 2012. LNCS, vol. 7219, pp. 190–204. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  28. 28.
    Wang, Y., Audibert, J., Munos, R.: Algorithms for infinitely many-armed bandits. In: Advances in Neural Information Processing Systems, pp. 1–8 (2008)Google Scholar
  29. 29.
    Refalo, P.: Impact-based search strategies for constraint programming. In: Wallace, M. (ed.) CP 2004. LNCS, vol. 3258, pp. 557–571. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  30. 30.
    Harvey, W., Ginsberg, M.: Limited discrepancy search. In: International Joint Conference on Artificial Intelligence, pp. 607–615 (1995)Google Scholar
  31. 31.
    Taillard, E.: Benchmarks for basic scheduling problems. European Journal of Operational Research 64(2), 278–285 (1993)CrossRefzbMATHGoogle Scholar
  32. 32.
    Gent, I., Walsh, T.: Csplib: A benchmark library for constraints. In: Jaffar, J. (ed.) CP 1999. LNCS, vol. 1713, pp. 480–481. Springer, Heidelberg (1999)CrossRefGoogle Scholar
  33. 33.
    Werner, F., Winkler, A.: Insertion techniques for the heuristic solution of the job shop problem. Discrete Applied Mathematics 58(2), 191–211 (1995)MathSciNetCrossRefzbMATHGoogle Scholar
  34. 34.
    Boussemart, F., Hemery, F., Lecoutre, C., Sais, L.: Boosting systematic search by weighting constraints. In: ECAI, pp. 146–150 (2004)Google Scholar
  35. 35.
    Beck, J., Feng, T., Watson, J.P.: Combining constraint programming and local search for job-shop scheduling. INFORMS Journal on Computing 23(1), 1–14 (2011)MathSciNetCrossRefzbMATHGoogle Scholar
  36. 36.
    Michel, L., Van Hentenryck, P.: Activity-based search for black-box constraint programming solvers. In: Beldiceanu, N., Jussien, N., Pinson, É. (eds.) CPAIOR 2012. LNCS, vol. 7298, pp. 228–243. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  37. 37.
    Perron, L., Shaw, P.: Combining forces to solve the car sequencing problem. In: Régin, J.-C., Rueher, M. (eds.) CPAIOR 2004. LNCS, vol. 3011, pp. 225–239. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  38. 38.
    Coulom, R.: Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search. In: van den Herik, H.J., Ciancarini, P., Donkers, H.H.L.M(J.) (eds.) CG 2006. LNCS, vol. 4630, pp. 72–83. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  39. 39.
    Bubeck, S., Munos, R., Stoltz, G., Szepesvári, C.: X-armed bandits. Journal of Machine Learning Research 12, 1655–1695 (2011)Google Scholar
  40. 40.
    Chaslot, G.M.J.-B., Winands, M.H.M., van den Herik, H.J.: Parallel Monte-Carlo Tree Search. In: van den Herik, H.J., Xu, X., Ma, Z., Winands, M.H.M. (eds.) CG 2008. LNCS, vol. 5131, pp. 60–71. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  41. 41.
    Chu, G., Schulte, C., Stuckey, P.: Confidence-based work stealing in parallel constraint programming. In: Gent, I.P. (ed.) CP 2009. LNCS, vol. 5732, pp. 226–241. Springer, Heidelberg (2009)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Manuel Loth
    • 1
  • Michèle Sebag
    • 2
  • Youssef Hamadi
    • 3
  • Marc Schoenauer
    • 2
  1. 1.Microsoft Research – INRIA joint centrePalaiseauFrance
  2. 2.TAO, INRIA − CNRS − LRIUniversité Paris-SudOrsayFrance
  3. 3.Microsoft ResearchCambridgeUnited Kingdom

Personalised recommendations