Advertisement

Monte-Carlo Tree Search in Board Games

  • Mark H. M. Winands
Living reference work entry

Abstract

Monte-Carlo Tree Search (MCTS) is a best-first search method guided by the results of Monte-Carlo simulations. It is based on randomized exploration of the search space. Using the results of previous explorations, the method gradually builds up a game tree in memory and successively becomes better at accurately estimating the values of the most promising moves. MCTS has substantially advanced the state of the art in board games such as Go, Amazons, Hex, Chinese Checkers, Kriegspiel, and Lines of Action.

This chapter gives an overview of popular and effective enhancements for board game playing MCTS agents. First, it starts by describing the structure of MCTS and giving pseudocode. It also addresses how to adjust MCTS to prove the game-theoretic value of a board position. Next, popular enhancements such as RAVE, progressive bias, progressive widening, and prior knowledge, which improve the simulation in the tree part of MCTS, are discussed in detail. Subsequently, enhancements such as MAST, N-Grams, and evaluation function-based strategies are explained for improving the simulation outside the tree. As modern computers have nowadays multiple cores, this chapter mentions techniques to parallelize MCTS in a straightforward but effective way. Finally, approaches to deal with imperfect information and stochasticity in an MCTS context are discussed as well.

Keywords

Minimax search techniques Monte-Carlo Tree Search (MCTS) MCTS MCTS-solver Tree and Root Parellization Chance nodes Multi-Armed Bandit (MAB) problem Playout Simulation strategy Expansion strategy Backpropagation Final move selection strategies Domain-independent Rapid Action-Value Estimator (RAVE) Progressive Bias (PB) Implicit Minimax Progressive Widening Move-Average Sampling Technique (MAST) N-Gram Selection Technique (NST) Greedy strategy Root parallelization Tree parallelization Determinization Upper Confidence Bounds applied to Trees (UCT) 

References

  1. B. Abramson, Expected-outcome: a general model of static evaluation. IEEE Trans. Pattern Anal. Mach. Intell. 12(2), 182–193 (1990)CrossRefGoogle Scholar
  2. B. Arneson, R.B. Hayward, P. Henderson, Monte Carlo tree search in Hex. IEEE Trans. Comput. Intell. AI Games 2(4), 251–258 (2010)CrossRefGoogle Scholar
  3. P. Auer, N. Cesa-Bianchi, P. Fischer, Finite-time analysis of the multi-armed bandit problem. Mach. Learn. 47(2–3), 235–256 (2002)CrossRefMATHGoogle Scholar
  4. Y. Björnsson, H. Finnsson, CadiaPlayer: a simulation-based general game player. IEEE Trans. Comput. Intell. AI Games 1(1), 4–15 (2009)CrossRefGoogle Scholar
  5. B. Bouzy, B. Helmstetter, Monte-Carlo Go developments, in Advances in Computer Games 10: Many Games, Many Challenges, ed. by H.J. van den Herik, H. Iida, E.A. Heinz (Kluwer, Boston, 2003), pp. 159–174Google Scholar
  6. C.B. Browne, E. Powley, D. Whitehouse, S.M. Lucas, P.I. Cowling, P. Rohlfshagen, S. Tavener, D. Perez, S. Samothrakis, S. Colton, A survey of Monte Carlo tree search methods. IEEE Trans. Comput. Intell. AI Games 4(1), 1–43 (2012)CrossRefGoogle Scholar
  7. B. Brügmann, Monte Carlo Go. Technical report, Physics Department, Syracuse University, Syracuse, NY, 1993Google Scholar
  8. M. Buro, J.R. Long, T. Furtak, N.R. Sturtevant, Improving state evaluation, inference, and search in trick-based card games, in IJCAI 2009, Proceedings of the 21st International Joint Conference on Artificial Intelligence, ed. by C. Boutilier (AAAI Press, Menlo Park, CA, 2009), pp. 1407–1413Google Scholar
  9. T. Cazenave, A Phantom Go program, in Advances in Computer Games (ACG 11), ed. by H.J. van den Herik, S.-C. Hsu, T.-S. Hsu, H.H.L.M. Donkers. Lecture Notes of Computer Science, vol. 4250 (Springer, Berlin, 2006), pp. 120–125CrossRefGoogle Scholar
  10. T. Cazenave, N. Jouandeau, On the parallelization of UCT, in Proceedings of the Computer Games Workshop 2007 (CGW 2007), ed. by H.J. van den Herik, J.W.H.M. Uiterwijk, M.H.M. Winands, M.P.D. Schadd (Universiteit Maastricht, Maastricht, 2007), pp. 93–101Google Scholar
  11. G.M.J.-B. Chaslot, M.H.M. Winands, H.J. van den Herik, J.W.H.M. Uiterwijk, B. Bouzy, Progressive strategies for Monte-Carlo tree search. New Math. Nat. Comput. 4(3), 343–357 (2008a)Google Scholar
  12. G.M.J.-B. Chaslot, M.H.M. Winands, H.J. van den Herik, Parallel Monte-Carlo tree search, in Computers and Games (CG 2008), ed. by H.J. van den Herik, X. Xu, Z. Ma, M.H.M. Winands. Lecture Notes in Computer Science (LNCS), vol. 5131 (Springer, Berlin, 2008b), pp. 60–71CrossRefGoogle Scholar
  13. P. Ciancarini, G.P. Favini, Monte Carlo tree search in Kriegspiel. AI J. 174(11), 670–684 (2010)MathSciNetGoogle Scholar
  14. A. Couetoux, J-B. Hoock, N. Sokolovska, O. Teytaud, N. Bonnard. Continuous upper confidence trees, in Learning and Intelligent Optimization – 5th International Conference (LION 5). Lecture Notes in Computer Science, vol 6683 (Springer Berlin Heidelberg, 2011), pp. 433–445Google Scholar
  15. R. Coulom, Computing “Elo ratings” of move patterns in the game of Go. ICGA J. 30(4), 199–208 (2007a)Google Scholar
  16. R. Coulom, Efficient selectivity and backup operators in Monte-Carlo tree search, in Computers and Games (CG 2006), ed. by H.J. van den Herik, P. Ciancarini, H.H.L.M. Donkers. Lecture Notes in Computer Science (LNCS), vol. 4630 (Springer, Berlin, 2007b), pp. 72–83CrossRefGoogle Scholar
  17. P.I. Cowling, E.J. Powley, D. Whitehouse, Information set Monte Carlo tree search. IEEE Trans. Comput. Intell. AI Games 4(2), 120–143 (2012)CrossRefGoogle Scholar
  18. M. Enzenberger, M. Müller, A lock-free multithreaded Monte-Carlo tree search-algorithm, in Advances in Computer Games (ACG 2009), ed. by H.J. van den Herik, P.H.M. Spronck. Lecture Notes in Computer Science (LNCS), vol. 6048 (Springer, Berlin/Heidelberg, 2010), pp. 14–20Google Scholar
  19. M. Enzenberger, M. Müller, B. Arneson, R. Segal, Fuego – an open-source framework for board games and Go engine based on Monte Carlo tree search. IEEE Trans. Comput. Intell. AI Games 2(4), 259–270 (2010)CrossRefGoogle Scholar
  20. S. Gelly, D. Silver, Combining online and offline knowledge in UCT, in Proceedings of the International Conference on Machine Learning (ICML), ed. by Z. Ghahramani (ACM, New York, 2007), pp. 273–280CrossRefGoogle Scholar
  21. S. Gelly, L. Kocsis, M. Schoenauer, M. Sebag, D. Silver, C. Szepesvári, O. Teytaud, The grand challenge of computer Go: Monte Carlo tree search and extensions. Commun. ACM 55(3), 106–113 (2012)CrossRefGoogle Scholar
  22. M.L. Ginsberg, Gib: steps toward an expert-level bridge-playing program, in Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence (IJCAI-99), ed. by T. Dean, vol. 1 (Morgan Kaufmann, San Francisco, 1999), pp. 584–589Google Scholar
  23. D.E. Knuth, R.W. Moore, An analysis of alpha-beta pruning. Artif. Intell. 6(4), 293–326 (1975)CrossRefMathSciNetMATHGoogle Scholar
  24. L. Kocsis, C. Szepesvári, Bandit based Monte-Carlo planning, in Machine Learning: ECML 2006, ed. by J. Fürnkranz, T. Scheffer, M. Spiliopoulou. Lecture Notes in Artificial Intelligence, vol. 4212 (Springer, Berlin, 2006), pp. 282–293CrossRefGoogle Scholar
  25. M. Lanctot, A. Saffidine, J. Veness, C. Archibald, M.H.M. Winands. Monte Carlo *-minimax search, in Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence (IJCAI) (AAAI Press, Menlo Park, CA, USA, 2013), pp. 580–586Google Scholar
  26. M. Lanctot, M.H.M. Winands, T. Pepels, N.R. Sturtevant. Monte carlo tree search with heuristic evaluations using implicit minimax backups, in 2014 I.E. Conference on Computational Intelligence and Games, CIG 2014 (IEEE, Piscataway, NJ, USA, 2014), pp. 341–348Google Scholar
  27. R.J. Lorentz, Amazons discover Monte-Carlo, in Computers and Games (CG 2008), ed. by H.J. van den Herik, X. Xu, Z. Ma, M.H.M. Winands. Lecture Notes in Computer Science (LNCS), vol. 5131 (Springer, Berlin, 2008), pp. 13–24CrossRefGoogle Scholar
  28. R.J. Lorentz, An MCTS program to play EinStein Würfelt Nicht! in Advances in Computer Games (ACG 2011), ed. by H.J. van den Herik, A. Plaat. LNCS, vol. 7168 (Springer, Berlin, 2012), pp. 52–59CrossRefGoogle Scholar
  29. T.A. Marsland, A review of game-tree pruning. ICCA J. 9(1), 3–19 (1986)Google Scholar
  30. I. Millington, J. Funge, Artificial Intelligence for Games, Chapter 7, 2nd edn. (Morgan Kaufmann, Burlington, 2009), pp. 579–665CrossRefGoogle Scholar
  31. J.A.M. Nijssen, M.H.M. Winands, Enhancements for multi-player Monte-Carlo tree search, in Computers and Games (CG 2010), ed. by H.J. van den Herik, H. Iida, A. Plaat. Lecture Notes in Computer Science (LNCS), vol. 6151 (Springer, Berlin, 2011), pp. 238–249CrossRefGoogle Scholar
  32. J.A.M. Nijssen, M.H.M. Winands, Monte-Carlo tree search for the hide-and-seek game Scotland Yard. IEEE Trans. Comput. Intell. AI Games 4(4), 282–294 (2012a)CrossRefGoogle Scholar
  33. J.A.M. Nijssen, M.H.M. Winands, Playout search for Monte-Carlo tree search in multi-player games, in 13th International Conference on Advances in Computer Games (ACG 2011), ed. by H.J. van den Herik, A. Plaat. Lecture Notes in Computer Science, vol. 7168 (Springer, Berlin, 2012b), pp. 72–83Google Scholar
  34. J.A.M. Nijssen, M.H.M. Winands, Search policies in multi-player games. ICGA J. 36(1), 3–21 (2013)Google Scholar
  35. T. Pepels, M.J.W. Tak, M. Lanctot, M.H.M. Winands. Quality-based rewards for Monte-Carlo tree search simulations, in Proceedings of the 21st European Conference on Artificial Intelligence (ECAI 2014), ed. by T. Schaub, G. Friedrich, B. O’Sullivan. Frontiers in Artificial Intelligence and Applications, vol 263 (IOS Press, Amsterdam, The Netherlands 2014), pp. 705–710Google Scholar
  36. E.J. Powley, D. Whitehouse, P.I. Cowling. Bandits all the way down: UCB1 as a simulation policy in Monte Carlo tree search, in Computational Intelligence in Games (CIG), 2013 I.E. Conference on (IEEE, Piscataway, NJ, USA, 2013), pp. 81–88Google Scholar
  37. H. Robbins, Some aspects of the sequential design of experiments. Bull. Am. Math. Soc. 58(5), 527–535 (1952)CrossRefMATHGoogle Scholar
  38. S.J. Russell, P. Norvig, Artificial Intelligence: A Modern Approach, 3rd edn. (Prentice-Hall, Upper Saddle River, 2010)Google Scholar
  39. J. Schaeffer, The history heuristic. ICCA J. 6(3), 16–19 (1983)Google Scholar
  40. B. Sheppard, World-championship-caliber Scrabble. Artif. Intell. 134(1–2), 241–275 (2002)CrossRefMATHGoogle Scholar
  41. Y. Soejima, A. Kishimoto, O. Watanabe, Evaluating root parallelization in Go. IEEE Trans. Comput. Intell. AI Games 2(4), 278–287 (2010)CrossRefGoogle Scholar
  42. J.A. Stankiewicz, M.H.M. Winands, J.W.H.M. Uiterwijk, Monte-Carlo tree search enhancements for Havannah, in Advances in Computer Games (ACG 13), ed. by H.J. van den Herik, A. Plaat. LNCS, vol. 7168 (Springer, Berlin, 2012), pp. 60–71CrossRefGoogle Scholar
  43. N.R. Sturtevant, An analysis of UCT in multi-player games. ICGA J. 31(4), 195–208 (2008)Google Scholar
  44. R.S. Sutton, A.G. Barto, Reinforcement learning: an introduction (MIT Press, Cambridge, MA, 1998)Google Scholar
  45. M.J.W. Tak, M.H.M. Winands, Y. Björnsson, N-grams and the last-good-reply policy applied in general game playing. IEEE Trans. Comput. Intell. AI Games 4(2), 73–83 (2012)CrossRefGoogle Scholar
  46. M.J.W. Tak, M.H.M. Winands, Y. Björnsson, Decaying simulation strategies. IEEE Trans. Comput. Intell. AI Games 6(4), 395–406 (2014)CrossRefGoogle Scholar
  47. F. Teytaud, O. Teytaud. On the huge benefit of decisive moves in Monte-Carlo tree search algorithms, in 2010 I.E. Conference on Computational Intelligence and Games (CIG 2010), ed. by G.N. Yannakakis, J. Togelius (IEEE, Piscataway, NJ, USA 2010), pp. 359–364Google Scholar
  48. Y. Tsuruoka, D. Yokoyama, T. Chikayama, Game-tree search algorithm based on realization probability. ICGA J. 25(3), 132–144 (2002)Google Scholar
  49. M.H.M. Winands, Y. Björnsson, J.-T. Saito, Monte Carlo tree search in Lines of Action. IEEE Trans. Comput. Intell. AI Games 2(4), 239–250 (2010)CrossRefGoogle Scholar
  50. M.H.M. Winands, Y. Björnsson. αβ-based play-outs in Monte-Carlo tree search, in 2011 I.E. Conference on Computational Intelligence and Games (CIG 2011), ed. by S-B. Cho, S.M. Lucas, P. Hingston (IEEE, Piscataway, NJ, USA, 2011), pp. 110–117Google Scholar

Copyright information

© Springer Science+Business Media Singapore 2015

Authors and Affiliations

  1. 1.Department of Data Science and Knowledge EngineeringMaastricht UniversityMaastrichtThe Netherlands

Personalised recommendations