Computer Go was an interesting target in AI domain because Go was exceptionally difficult for computers among popular two-player zero-sum games.
As widely known, computers are now superior to human beings in most of the popular two-player zero-sum perfect information games including checkers, chess, shogi, and Go. The minimax search-based approach is known to be effective for most games in this category. Since Go is also one of such games, intuitively minimax search should also work for Go. However, despite the simple rules which had changed only slightly in these 2,000 years, Go is arguably the last two-player zero-sum game in which human beings are still superior to computers.
The solution to the difficulty of Go was a combination of random sampling and search. The resulting algorithm, Monte Carlo tree search (MCTS), was not only a major breakthrough for computer Go but also an important invention for many other domains related to AI. The strength of computer Go had rapidly improved since the invention of MCTS.
This entry consists of introduction to the game of Go and computer Go topics before and after the invention of MCTS.
Game of Go
History of the Game
The game of Go originated in China and had been played for more than 2,000 years. It is one of the most popular two-player board games. The game is called Go or Igo in Japan, Baduk in Korea, and Weiqi in China. Because the Japanese Go association took main part in spreading Go to the world, Go became the most popular word for the game. The word Go will be used for this entry. Most of the players reside in East Asia but in the last century it got more popular in the rest of the world. The population of Go players is thought to be approximately 40 million. There are professional organizations in East Asian countries, and several hundreds of professional players belong to these organizations.
Rules of Go
Once placed, stones never move on the board. Stones get connected in four directions, vertically or horizontally (do not connect diagonally). Connected stones form a block, and if a block gets completely surrounded by opponent’s stones, the block will be captured and removed from the board. Capturing opponent stones is often advantageous because it results in greater chances to occupy more territory.
Suicide and Eye
It is prohibited to place a stone if the stone (or the block which contains the newly placed stone) has no liberties. In other words, suicidal move is prohibited. For example, white is not allowed to play at C in Fig. 2.
A single empty intersection surrounded by the stones of the same color is called an eye (in Fig. 3, A and D are white’s eyes and C is a black’s eye). Making eyes is important for the game (cf. section “Life and Death”).
There is a variation of rules which allows suicide of more than one stones (e.g., New Zealand rules). It gives some effects to theoretical analysis but will not be described in details in this entry because it is rarely used.
Ko and Repetition
There are several variations in repetition avoiding rules. Super Ko rule prohibits global repetition (which of course includes simple Ko) (Super Ko means the global repetition.) For human beings, accurate detection of Super Ko during real games is difficult, and it is excluded from some of the official rules for human tournament (e.g., Japanese Go association official rules).
However, computer Go tournaments typically use Super Ko rule because it is not a problem for computers. There are two types of Super Ko rule. Situational Super Ko distinguishes the same board position if the next player is different and positional Super Ko does not.
Life and Death
If a player owns a group of stones (consisting of one or more blocks) which has two or more eyes, the group will never be captured by the opponent, unless the owner intentionally fills one of his own eyes (filling own eye is almost always a terrible move).
End of Game and Scoring
For the game of Go, pass is always a legal move. Players can pass if there is no other beneficial move remaining. The game ends if two consecutive passes are played. If the game had ended by passes, the winner is decided by the score. (Of course, players are allowed to resign at any moment. The opponent will be the winner.)
The number of empty points only one player’s stones surround
The number of stones of each player
Komi points to compensate the advantage of the first player
The number of empty points only one player’s stones surround
Minus the number of stones captured by the opponent
Komi points to compensate the advantage of the first player
The outcome is similar for both rules and the difference rarely affects human players. However, how to correctly handle territory scoring is an interesting topic for computer Go. Area scoring is more computer friendly and used in most compute Go tournament.
Strength of Human Players
Strength of the players is measured by kyu and dan. Human players are given a 25-kyu rank after learning rules. As players improve their strength, the number decreases until it reaches 1 kyu. Players of different ranks can play even games using handicap stones because having more stones in the opening is more advantageous. The difference between the ranks is used as the number of handicap stones (e.g., a 5-kyu player and a 1-kyu player play with four handicap stones).
Computer Go Difficulty
Computer strength for two-player zero-sum games without Monte-Carlo Tree Search (as of 2015)
Perfect play is possible
Stronger than human champion
Stronger than human champion
Approximately as strong as human champion
Go 9 × 9
Approximately 3 kyu (based on authors’ guess)
Go 19 × 19
Approximately 3 kyu
Difficulty: Search Space Size
Search space size of two-player games
Go (19 × 19)
Go (9 × 9)
It is empirically known that computers tend to be stronger for smaller games if the rules are similar. However, there was a small difference in the strength of 9 × 9 Go and 19 × 19 Go for non-MCTS programs. This fact indicates that the search space size is not the only reason for the difficulty of Go.
Difficulty: Evaluation Function
Among the games shown in Table 2, only checkers is solved by exhaustively searching the game states. For the rest of the games, the search space is too enormous. Therefore, minimax search prunes unpromising branches based on evaluation functions. Fast and accurate evaluation functions for the games were made using a combination of handcrafted codes and machine learning techniques.
Despite the simple rules, Go was the only exception. It is widely believed that making evaluation function for Go is difficult compared to other two-player games. There are many explanations for the difficulty. Unlike chess, there is no clue such as the value of the pieces because all stones are identical. Unlike Othello, focusing on important portions of the board (e.g., corner points or edges) didn’t work. Because it is a territory-enclosing game, it seems like it is possible to predict the final territory, but it is only possible in the late endgames. Seeking for local goals such as capturing opponent stones often does not help in finding globally good moves.
The best evaluation functions developed for Go (as of 2014) was either too slow or inaccurate. Minimax search does not work without an evaluation function. A survey paper published in 2002 (Müller Jan. 2002) listed research challenges in computer Go. The first challenge in the list seems very trivial: “Develop a Go program that can automatically take advantage of greater processing power.” It emphasizes the fact that Go needed a new approach.
Before Monte Carlo Tree Search
Local Tactical Problems
The Go board is large enough to have multiple local tactical fights. Although there is no guarantee that locally good moves are globally good moves, blunders in local fights are often fatal.
Another tactical problem is the capturing race or semeai, which is a variation of capturing problems. Capturing race occurs when two groups of different colors are adjacent in an encircled space and can live only by capturing the other group. Normal algorithms for solving two-player games such as minimax search (alpha-beta search) and proof number search could be used to solve the problem.
Life and Death (Tsumego)
One of the most important local tactical fights is the life-and-death problems, also called Tsumego. (In formal definition, Tsumego means life-and-death problems with only one correct move, but it is often used as the same meaning.)
Killing (and capturing) a large group of stones is advantageous in general. Therefore, in real games, it is crucial to correctly analyze life and death of stones. Alpha-beta search-based solver and df-pn (depth-first proof number) search-based solver are both known to be effective.
If the problem is enclosed in a small region, these solvers are much faster than human players. However, open-boundary Tsumego is still difficult for computers.
Theoretical and Practical Analysis
Solving Go on Small Boards
The smallest size of the board which makes the Go interesting for human players is probably 5 × 5. Go on 5 × 5, 5 × 6, and 4 × 7 is solved by search (van der Werf 2015).
Go using Japanese rules is proved to be EXPTIME-complete (Robson et al. 1983). The proofs with Chinese rules, the class is only proved to be somewhere between PSPACE-hard and EXPSPACE.
Since Go is a territory-occupying game, the value of each move can be described as the amount of territory it will occupy. Combinatorial game theory (CGT) (Berlekamp and Wolfe 1994) shows how to systematically analyze the values of moves as a sequence of numerical values and how to choose the optimal move after these analyses. CGT solves difficult artificial positions better than human professionals, but there is no program which actually uses it in the play.
One-Ply Monte Carlo Go
Because it was difficult to make good evaluation function for Go, there was a different approach called one-ply Monte Carlo Go. (It was originally called Monte Carlo Go, but to distinguish from Monte Carlo tree search, the term one-ply Monte Carlo Go will be used throughout this entry.)
Because the number of legal moves decreases, it is possible for randomized players to end the game naturally according to the rules. If both players randomly choose one of the legal moves, the game will continue for a long time because filling own eyes results in repeatedly capturing large blocks. However, given a simple rule to avoid filling its own eyes, the game will end in a reasonably short time (average number of moves will be approximately the same as the number of the intersections of the board). In this way it is possible to evaluate the given positions by letting random players play both sides and count the territory. The random play sequences until the endgame is called playout.
Assume that the playout is purely random except avoiding eye-filling moves. If there is a threatening move with only one correct reply, the opponent will likely to choose the wrong reply in the playouts. Therefore, such a move will be evaluated highly. The one-ply Monte Carlo Go program likes to play direct atari moves which are, in most cases, useless moves. In short, it tends to choose moves which expect opponents to make blunders.
The chance of choosing nonoptimal moves will not be zero even given infinite computational time. The limit of the strength is analyzed when using simple playouts. The winning rate against GNU Go on 9 × 9 board was approximately 10 %, and it was also extremely weak on 19 × 19 boards.
The first known work was described in an unpublished report written by Brügmann in 1993 (Brügmann 1993). There was more sophisticated approach based on one-ply Monte Carlo Go. They had comparable strengths with other approaches, but it was clearly not the most successful approach for Go. However this idea is important because it triggered the invention of the Monte Carlo tree search algorithm.
Monte Carlo Tree Search and Go Programs
As described above, one-ply Monte Carlo Go introduced a new way of evaluating the board position which does not require an evaluation function. But there was also a fundamental weakness. The breakthrough came in the year 2006.
Brief History of MCTS Invention
Go program Crazy Stone, developed by a French researcher Rémi Coulom, is the winner of the 9 × 9 Go division of the 11th Computer Olympiad taken place at Turin in 2006. The algorithm used in Crazy Stone was published at the same time in Computers and Games Conference which was one of the joint events with the Olympiad (Coulom et al. 2006). It is widely regarded that the algorithm developed for Crazy Stone by Coulom is the first MCTS algorithm.
Based on the success of Crazy Stone, Kocsis and Csaba Szepesvári submitted the paper about Upper Confidence applied to Trees (UCT) algorithm to ECML 2006 Conference (Kocsis and Szepesvári 2006). UCT had the proof of convergence to the optimal solution which Crazy Stone’s first approach did not have (explained in section “UCT Algorithm”).
At first, it seemed MCTS works only for small boards. However, soon after the UCT paper was published, a Go program named MoGo became the first Go program to achieve a shodan on 19 × 19 board (Gelly et al. 2006) (on an Internet Go server, KGS (KGS Go Server 2015)) and became famous among Go players.
Basic Framework of MCTS
However, at this point, the definition of promising branch is not clear. The key point of the algorithm is the selection of promising branches which is explained in the following sections.
Theoretical Background: Multi-armed Bandit
The basic approach was surprisingly simple. However, promising branch has to be decided appropriately. Possibly the simplest approach is to select the branch with the highest mean reward. But it is obviously a bad idea, because if the first playout of the (unknown) optimal branch had lost, it will never be selected again. Therefore, the selection method has to give an advantage to branches with small number playouts. More formerly saying, for MCTS to be successful, branches with large confidence interval must be given a positive bias. Theories of the multi-armed bandit (MAB) problem gave a solution. MAB is an old problem which is studied from 1930s.
The problem settings are as follows. You have a certain number of coins and there is a slot machine which has a number of arms. Each arm returns a reward based on an unknown distribution. The goal is to find a strategy which minimizes the expected value of cumulative regret. Cumulative regret of a strategy is the difference between the sum of the expected reward of the strategy and the sum of the ideal optimal reward which could be obtained by pulling the optimal arm every time. (There are many different formulations of MAB but this entry focuses on the settings which is related to MCTS and Go.)
Intuitively, part of the coins must be used to explore the arm, and the majority of the coins should be spent on the optimal arm. This is called the exploration-exploitation dilemma.
The first term is the mean term and second term is the bias term. While arms with higher mean tend to be chosen, the bias term gives an advantage to arms with small number of coins.
Follow the branch with the highest UCB1 value until reaching the leaf node.
If the number of playouts at the leaf exceeds a given threshold, expand the node.
Do one playout.
Update the values of the nodes on the path.
UCT is a generic algorithm which works for various problems, and it also has a proof of converging to the optimal solution if the range of the playout reward is in [0, 1]. However, in the same way, as the constant in UCB1, exploration constant C should be adjusted for UCT also (e.g., to make Go programs stronger).
Reward Definition and Playing Style
Crazy Stone attracted the attention of Go programmers not only with the strength but also with the unique playing style. It won many games by the smallest possible margin by intentionally (it looked like so) playing safe-win moves.
Play aggressively when losing; play safely when winning. It was a very difficult task for minimax search-based programs. But MCTS-based Go programs naturally acquire this ability. It is based on the definition of playout rewards. Since Go is a score-based game, it is possible to use the score itself as the reward. However, if the reward is two valued (e.g., 1 for win and 0 for loss), MCTS tries to maximize the winning probability, not the score difference. The early version of Crazy Stone was using the score as the reward, and the winning rate against GNU Go was in 30–40 % range. After the reward was changed to 0, 1, it jumped up to higher than 60 %.
Why MCTS Works for Go (Or Weakness of MCTS)
MCTS has a generic framework and it drastically improved Go program strength. But, of course, it is not an all mighty algorithm. Theoretical and practical analysis revealed the weakness of MCTS if the tree has a deceptive structure or trap.
A trap is a tree where a small number of branches have significantly better (or worse) values than other branches. If a long sequence trap is in the tree, it is highly unlikely for MCTS to find the correct solution. In Go the situation typically occurs in a ladder where only one move is the correct move and all others are blunders. Early MCTS-based Go programs did actually miss long ladders in real games.
A Go proverb says, “if you don’t know ladders, don’t play Go.” It is impossible to make a strong Go program without correctly recognizing ladders. Recent Go programs handle ladders by playouts. As explained later in section “Playout Enhancements,” playouts used in recent Go programs are far from random. The ladder sequences in real games are simple and playouts can solve them. From the viewpoint of the tree search algorithm, the trap is removed by playouts.
MCTS is a combination of tree search and playout. Playout can read simple deep sequences. Tree search can select the best branch from various options. If the combination is effective, MCTS works well. However, there are often needs for reading long sequences of moves in tactical situations (capturing or life and death is typical). It is difficult to make playouts correctly read tactical sequence. This is widely regarded as the remaining weakness of MCTS-based Go programs.
Enhancements for MCTS-Based Go Programs
RAVE and AMAF
UCT has a proof of convergence and works fairly well, but state-of-the art Go programs (as of 2015) are not relying on UCT. Practitioners ignored the theory and replaced the bias term with other terms using Go knowledge. Rapid Action Value Estimation (RAVE) is one of the most popular techniques used in Go (Gelly et al. 2007).
Occupying a point is often crucial in Go regardless of the order of moves. A heuristic technique called All Moves As First (AMAF) heuristic is invented based on this observation. Instead of forgetting the sequence in playouts, AMAF updates the values of all moves that appeared in playout sequences. It is inaccurate but the update speed is improved by a large margin. In RAVE, branches with small number of playouts use AMAF-based values, and as the playouts increases, it is gradually replaced by true values of playouts.
Improving playout quality is the most important and subtle part of MCTS-based Go programs. Both handcrafted approach and machine learning approach succeed (as of 2014).
MoGo had used handcrafted playouts, and it is said that program Zen (one of the strongest programs in 2014) also uses at least partly handcrafted approach. Many other programs use different approach. Pattern-based features are defined by programmers and the weights are adjusted by machine learning. Typically, game records played by strong players are used as training data, and the objective function will be the matching rate with the expert moves. In both approaches, the playouts will choose more “reasonable” moves which makes it possible to solve simple tactical situations including ladders. How to make good playout is still not clear because playout and tree search are correlated in a complex manner and theoretical analysis is difficult.
To find good moves in game playing, search must focus on promising part of the tree. In MCTS, progressive widening method is popularly used for pruning unpromising part. If the number of playout at a node is small, only few branches will be selected as the target of search. As the number of playouts increases, more branches are added.
Using shared memory parallel, MCTS is common for strong Go programs. Normal implementation based on lock mechanism achieves speedup on multi-core machines. It is also known that the performance could be improved by using lockless hash technique.
For distributed memory environment, root parallel approach is used by several strong programs. Each compute node independently searches with different random seeds, and a small part of the tree is shared among the compute nodes (e.g., tree nodes with depth 1–3 are shared). It is known to scale well for up to several dozens of computers.
Transpositions and MCTS
Go programs uses mainly two ways. One is to ignore transpositions and use trees. This is wasting computational time, but it is possible to make strong enough programs based on trees. The other is to record the values separately for nodes and branches. UCT is proved to converge to the optimal solution if the values stored in nodes are used for mean term and values of the branches are used for the bias term, as shown in the right of Fig. 13.
Fast data structures for Go board, including block and pattern information.
Fast pattern matcher including simple 3 × 3 matcher and heuristic features needed in both machine learning phase and playing phase.
Machine learning methods.
Zobrist hashing for fast hash value calculation.
Game database used as training data for machine learning and opening book construction.
Time control for playing games in tournament.
Pondering (thinking while the opponent is thinking) and tree (or hash table) reuse.
Dynamic komi. Especially important for handicapped games. Adjust virtual komi to avoid playing too safe (too aggressive) moves.
Using the results of tactical searches such as capture search or life-and-death search.
Current Computer Go Strength
N. Wedd maintains a Web page titled “Human-Computer Go Challenges” (Wedd 2015). After the invention of MCTS, strength of Go programs improved rapidly. From 2012 to 2014, strongest Go programs (Zen and Crazy Stone) have played several 4-stone handicapped games against professional players including former champions (4-stone handicap means approximately 4-dan difference.) The results include similar number of wins and losses.
Before the invention of MCTS, Go was regarded as a grand challenge of game AI research because of the difficulty. The difficulty of Go led to the invention of an epoch-making algorithm, Monte Carlo tree search. Many MCTS-related researches exist both in theory and application and in game and nongame domains. Still, Go is the most intensively studied target for MCTS.
There are many studies about search algorithm and machine learning, which is combined with many implementation techniques. Many researchers are working how to exploit increasing computational power of recent computers. Recently, at the end of year 2014, the first success of deep learning approach for Go was reported. Deep learning could be the candidate for the future breakthrough. It is still in early research phase, but the results seem promising.
Computer Go is improving rapidly and it is difficult to predict even in the near future. At least for some more years, Go is likely to remain as one of the most interesting challenges in game AI.
- Brügmann, B. Monte Carlo Go. Technical report, 1993. Unpublished draft, http://www.althofer.de/Bruegmann-MonteCarloGo.pdf
- Coulom, R.: Efficient selectivity and backup operators in Monte-Carlo tree search. In: Proceedings of the 5th International Conference on Computers and Games (CG’2006). Lecture Notes in Computer Science, vol. 4630, pp. 72–83 (2006)Google Scholar
- Gelly, S., Wang, Y., Munos, R., Teytaud, O.: Modification of UCT with patterns in Monte-Carlo Go. Technical report 6062, INRIA (2006)Google Scholar
- Gelly, S., Silver, D.: Combining online and offline knowledge in UCT. In: Proceedings of the 24th International Conference on Machine Learning (ICML 2007), pp. 273–280 (2007)Google Scholar
- Kgs go server. https://www.gokgs.com/ . Accessed 12 Feb 2015
- Kocsis, L., Szepesvári, C.: Bandit based Monte-Carlo planning. In: Proceedings 17th European Conference on Machine Learning (ECML 2006), pp. 282–293 (2006)Google Scholar
- Robson, J.M.: The complexity of go. In: IFIP Congress, pp. 413–417 (1983)Google Scholar
- Schaeffer, J., Müller, M., Kishimoto, A.: Ais have mastered chess. will go be next? IEEE Spectrum, July 2014.Google Scholar
- van der Werf, E.C.D.: First player scores for mxn go. http://erikvanderwerf.tengen.nl/mxngo.html . Accessed Dec 2015
- Wedd, N.: Human-computer go challenges. http://www.computer-go.info/h-c/index.html . Accessed 12 Feb 2015