Guiding Combinatorial Optimization with UCT
We propose a new approach for search tree exploration in the context of combinatorial optimization, specifically Mixed Integer Programming (MIP), that is based on UCT, an algorithm for the multi-armed bandit problem designed for balancing exploration and exploitation in an online fashion. UCT has recently been highly successful in game tree search. We discuss the differences that arise when UCT is applied to search trees as opposed to bandits or game trees, and provide initial results demonstrating that the performance of even a highly optimized state-of-the-art MIP solver such as CPLEX can be boosted using UCT’s guidance on a range of problem instances.
KeywordsMixed Integer Programming Linear Programming Relaxation Node Selection Game Tree Open Node
Unable to display preview. Download preview PDF.
- 2.Ciancarini, P., Favini, G.P.: Monte Carlo tree search techniques in the game of Kriegspiel. In: 21st IJCAI, Pasadena, CA, pp. 474–479 (July 2009)Google Scholar
- 3.Finnsson, H., Björnsson, Y.: Simulation-based approach to general game playing. In: 23rd AAAI, Chicago, IL, pp. 259–264 (July 2008)Google Scholar
- 4.Gelly, S., Silver, D.: Combining online and offline knowledge in UCT. In: 24th ICML, Corvallis, OR, pp. 273–280 (June 2007)Google Scholar
- 5.Gelly, S., Silver, D.: Achieving master level play in 9 ×9 computer Go. In: 23rd AAAI, Chicago, IL, pp. 1537–1540 (July 2008)Google Scholar
- 6.IBM ILOG. IBM CPLEX Optimization Studio 12.3 (2011)Google Scholar
- 8.Nemhauser, G.L., Wolsey, L.A.: Integer and Combinatorial Optimization. Wiley-Interscience (1999)Google Scholar
- 10.Ramanujan, R., Sabharwal, A., Selman, B.: Understanding sampling style adversarial search methods. In: 26th UAI, Catalina Island, CA (July 2010)Google Scholar
- 11.Ramanujan, R., Selman, B.: Trade-offs in sampling-based adversarial planning. In: 21st ICAPS, Freiburg, Germany (June 2011)Google Scholar
- 12.Wolsey, L.A.: Integer Programming. Wiley-Interscience (1998)Google Scholar