Encyclopedia of Computer Graphics and Games

Living Edition
| Editors: Newton Lee

Computer Go

  • Kazuki YoshizoeEmail author
  • Martin Müller
Living reference work entry
DOI: https://doi.org/10.1007/978-3-319-08234-9_20-1

Synonyms

Definition

Computer Go was an interesting target in AI domain because Go was exceptionally difficult for computers among popular two-player zero-sum games.

Overview

As widely known, computers are now superior to human beings in most of the popular two-player zero-sum perfect information games including checkers, chess, shogi, and Go. The minimax search-based approach is known to be effective for most games in this category. Since Go is also one of such games, intuitively minimax search should also work for Go. However, despite the simple rules which had changed only slightly in these 2,000 years, Go is arguably the last two-player zero-sum game in which human beings are still superior to computers.

The solution to the difficulty of Go was a combination of random sampling and search. The resulting algorithm, Monte Carlo tree search (MCTS), was not only a major breakthrough for computer Go but also an important invention for many other domains related to AI. The strength of computer Go had rapidly improved since the invention of MCTS.

This entry consists of introduction to the game of Go and computer Go topics before and after the invention of MCTS.

Game of Go

History of the Game

The game of Go originated in China and had been played for more than 2,000 years. It is one of the most popular two-player board games. The game is called Go or Igo in Japan, Baduk in Korea, and Weiqi in China. Because the Japanese Go association took main part in spreading Go to the world, Go became the most popular word for the game. The word Go will be used for this entry. Most of the players reside in East Asia but in the last century it got more popular in the rest of the world. The population of Go players is thought to be approximately 40 million. There are professional organizations in East Asian countries, and several hundreds of professional players belong to these organizations.

Rules of Go

Equipment

Go is played on a board with a grid. Players alternately place one stone at a time on an empty intersection of the grid. The first player uses black stones and the second player uses white stones. The aim of the game is to occupy as much territory as possible. A 19 × 19 grid is used in official rules. For beginners and short-time games, 9 × 9 or 13 × 13 grids are also used (the rule is independent of the size of the grid and it can be played on arbitrary-sized board) (See Figs. 1 and 2).
Fig. 1

Left, An empty 19*19 board. Right, middle game position taken from one of the most famous games in Go history

Capturing

Once placed, stones never move on the board. Stones get connected in four directions, vertically or horizontally (do not connect diagonally). Connected stones form a block, and if a block gets completely surrounded by opponent’s stones, the block will be captured and removed from the board. Capturing opponent stones is often advantageous because it results in greater chances to occupy more territory.

Empty intersections adjacent to a block are called liberties. If a block has only one remaining liberty, the block is in atari. Capturing occurs if an opponent stone occupies the last remaining liberty of a block. Examples of capturing are shown in Fig. 2. If black plays on A or B, the white blocks are captured and removed, as shown on the right side.
Fig. 2

Capturing and illegal move

Suicide and Eye

It is prohibited to place a stone if the stone (or the block which contains the newly placed stone) has no liberties. In other words, suicidal move is prohibited. For example, white is not allowed to play at C in Fig. 2.

However, black is allowed to play at B in Fig. 3 because it can capture the surrounding white block and make liberties for the black stone at B. In Fig. 3, A, D, and E are all illegal moves for black. B and C are illegal moves for white.
Fig. 3

Suicide moves, eyes

A single empty intersection surrounded by the stones of the same color is called an eye (in Fig. 3, A and D are white’s eyes and C is a black’s eye). Making eyes is important for the game (cf. section “Life and Death”).

There is a variation of rules which allows suicide of more than one stones (e.g., New Zealand rules). It gives some effects to theoretical analysis but will not be described in details in this entry because it is rarely used.

Ko and Repetition

Similar to other board games, Go has a rule about avoiding repetitions. The simplest and most popular case of repetition occurs by capturing an opponent stone resulting in a stone with only one liberty. The example is shown in Fig. 4. Black captures a white stone by playing on A and then white can capture back the black stone by playing on B. The stones marked C and D are also in Ko. To avoid infinite recapturing, a player must play another move, called Ko threat, before capturing back. Ko adds more complexity in the game (both in practice and theory, see section “Computational Complexity”) and often makes it more interesting.
Fig. 4

Examples of Ko

There are several variations in repetition avoiding rules. Super Ko rule prohibits global repetition (which of course includes simple Ko) (Super Ko means the global repetition.) For human beings, accurate detection of Super Ko during real games is difficult, and it is excluded from some of the official rules for human tournament (e.g., Japanese Go association official rules).

However, computer Go tournaments typically use Super Ko rule because it is not a problem for computers. There are two types of Super Ko rule. Situational Super Ko distinguishes the same board position if the next player is different and positional Super Ko does not.

Life and Death

If a player owns a group of stones (consisting of one or more blocks) which has two or more eyes, the group will never be captured by the opponent, unless the owner intentionally fills one of his own eyes (filling own eye is almost always a terrible move).

Groups safe from capturing are alive. If a group cannot avoid capturing, it is dead. As the game ends, all stones on the board will be either alive or dead. The black blocks in Fig. 5 are all alive. Life and death is not a part of the rule, but it is a natural consequence of the rules and the concept is crucial for the game.
Fig. 5

Safe (living) blocks

End of Game and Scoring

For the game of Go, pass is always a legal move. Players can pass if there is no other beneficial move remaining. The game ends if two consecutive passes are played. If the game had ended by passes, the winner is decided by the score. (Of course, players are allowed to resign at any moment. The opponent will be the winner.)

There are two rules for scoring, area scoring and territory scoring. Area scoring counts the sum of:
  • The number of empty points only one player’s stones surround

  • The number of stones of each player

  • Komi points to compensate the advantage of the first player

Territory scoring counts the sum of:
  • The number of empty points only one player’s stones surround

  • Minus the number of stones captured by the opponent

  • Komi points to compensate the advantage of the first player

The outcome is similar for both rules and the difference rarely affects human players. However, how to correctly handle territory scoring is an interesting topic for computer Go. Area scoring is more computer friendly and used in most compute Go tournament.

Strength of Human Players

Strength of the players is measured by kyu and dan. Human players are given a 25-kyu rank after learning rules. As players improve their strength, the number decreases until it reaches 1 kyu. Players of different ranks can play even games using handicap stones because having more stones in the opening is more advantageous. The difference between the ranks is used as the number of handicap stones (e.g., a 5-kyu player and a 1-kyu player play with four handicap stones).

Shodan (which means first dan) is given to players who are 1-stone stronger than 1 kyu, and then the number increases for stronger players. It normally requires more than 1 year of training to reach shodan. The strongest amateur players are rated approximately 9 dan. Professional players also use the same word dan, but the difference is not measured by the number of handicap stones (Fig. 6).
Fig. 6

9 × 9 board endgame examples

Computer Go Difficulty

Theoretically, minimax search can find the optimal move for two-player zero-sum perfect information games. For most popular games in this category, minimax search combined with alpha-beta pruning (e.g., alpha-beta search) actually succeeded in making programs which is at least as strong as human champion (Fig. 1). Go is the only exception in this category of games (Table 1).
Table 1

Computer strength for two-player zero-sum games without Monte-Carlo Tree Search (as of 2015)

Game

Strength

Checkers

Perfect play is possible

Othello

Stronger than human champion

Chess

Stronger than human champion

Shogi

Approximately as strong as human champion

Go 9 × 9

Approximately 3 kyu (based on authors’ guess)

Go 19 × 19

Approximately 3 kyu

Difficulty: Search Space Size

One of the difficulties of the game of Go is the enormous search space size. The search spaces of popular two-player zero-sum games are listed in Table 2 (numbers are from Schaeffer et al. 2014). The game of Go has the greatest search space size. Checkers was solved by exhaustive search in 2007. Go search space is far beyond the limit of current (and at least near future) computational power.
Table 2

Search space size of two-player games

Game

Search space

Checkers

1020

Othello

1028

Chess

1045

Shogi

1070

Go (19 × 19)

10172

Go (9 × 9)

1038

It is empirically known that computers tend to be stronger for smaller games if the rules are similar. However, there was a small difference in the strength of 9 × 9 Go and 19 × 19 Go for non-MCTS programs. This fact indicates that the search space size is not the only reason for the difficulty of Go.

Difficulty: Evaluation Function

Among the games shown in Table 2, only checkers is solved by exhaustively searching the game states. For the rest of the games, the search space is too enormous. Therefore, minimax search prunes unpromising branches based on evaluation functions. Fast and accurate evaluation functions for the games were made using a combination of handcrafted codes and machine learning techniques.

Despite the simple rules, Go was the only exception. It is widely believed that making evaluation function for Go is difficult compared to other two-player games. There are many explanations for the difficulty. Unlike chess, there is no clue such as the value of the pieces because all stones are identical. Unlike Othello, focusing on important portions of the board (e.g., corner points or edges) didn’t work. Because it is a territory-enclosing game, it seems like it is possible to predict the final territory, but it is only possible in the late endgames. Seeking for local goals such as capturing opponent stones often does not help in finding globally good moves.

The best evaluation functions developed for Go (as of 2014) was either too slow or inaccurate. Minimax search does not work without an evaluation function. A survey paper published in 2002 (Müller Jan. 2002) listed research challenges in computer Go. The first challenge in the list seems very trivial: “Develop a Go program that can automatically take advantage of greater processing power.” It emphasizes the fact that Go needed a new approach.

Before Monte Carlo Tree Search

Local Tactical Problems

The Go board is large enough to have multiple local tactical fights. Although there is no guarantee that locally good moves are globally good moves, blunders in local fights are often fatal.

Ladders

Ladder is a simple but important technique in Go. Capturing a block after a sequence of ataris is called a ladder, which normally results in a zigzag shape (Fig. 7). Many Go programs use ladder search as one of the tactical components.
Fig. 7

Ladder example

Semeai

Another tactical problem is the capturing race or semeai, which is a variation of capturing problems. Capturing race occurs when two groups of different colors are adjacent in an encircled space and can live only by capturing the other group. Normal algorithms for solving two-player games such as minimax search (alpha-beta search) and proof number search could be used to solve the problem.

Life and Death (Tsumego)

One of the most important local tactical fights is the life-and-death problems, also called Tsumego. (In formal definition, Tsumego means life-and-death problems with only one correct move, but it is often used as the same meaning.)

Killing (and capturing) a large group of stones is advantageous in general. Therefore, in real games, it is crucial to correctly analyze life and death of stones. Alpha-beta search-based solver and df-pn (depth-first proof number) search-based solver are both known to be effective.

If the problem is enclosed in a small region, these solvers are much faster than human players. However, open-boundary Tsumego is still difficult for computers.

Theoretical and Practical Analysis

Solving Go on Small Boards

The smallest size of the board which makes the Go interesting for human players is probably 5 × 5. Go on 5 × 5, 5 × 6, and 4 × 7 is solved by search (van der Werf 2015).

Computational Complexity

Go using Japanese rules is proved to be EXPTIME-complete (Robson et al. 1983). The proofs with Chinese rules, the class is only proved to be somewhere between PSPACE-hard and EXPSPACE.

Endgame Theory

Since Go is a territory-occupying game, the value of each move can be described as the amount of territory it will occupy. Combinatorial game theory (CGT) (Berlekamp and Wolfe 1994) shows how to systematically analyze the values of moves as a sequence of numerical values and how to choose the optimal move after these analyses. CGT solves difficult artificial positions better than human professionals, but there is no program which actually uses it in the play.

One-Ply Monte Carlo Go

Because it was difficult to make good evaluation function for Go, there was a different approach called one-ply Monte Carlo Go. (It was originally called Monte Carlo Go, but to distinguish from Monte Carlo tree search, the term one-ply Monte Carlo Go will be used throughout this entry.)

Because the number of legal moves decreases, it is possible for randomized players to end the game naturally according to the rules. If both players randomly choose one of the legal moves, the game will continue for a long time because filling own eyes results in repeatedly capturing large blocks. However, given a simple rule to avoid filling its own eyes, the game will end in a reasonably short time (average number of moves will be approximately the same as the number of the intersections of the board). In this way it is possible to evaluate the given positions by letting random players play both sides and count the territory. The random play sequences until the endgame is called playout.

The basic idea is illustrated in Fig. 8. A given board position will be evaluated by the average score of the playouts performed from the position. In the figure, black had won 2 out of 3 playouts. Therefore, the position might be promising for black.
Fig. 8

A 9 × 9 board evaluation by playout

This is an extremely simple idea. All legal moves are evaluated by playouts, and the move with the highest winning rate will be chosen (the left most branch in Fig. 9). Unsurprisingly, one-ply Monte Carlo Go is weak because of a fundamental weakness.
Fig. 9

Simplest one-ply Monte Carlo Go

Assume that the playout is purely random except avoiding eye-filling moves. If there is a threatening move with only one correct reply, the opponent will likely to choose the wrong reply in the playouts. Therefore, such a move will be evaluated highly. The one-ply Monte Carlo Go program likes to play direct atari moves which are, in most cases, useless moves. In short, it tends to choose moves which expect opponents to make blunders.

The chance of choosing nonoptimal moves will not be zero even given infinite computational time. The limit of the strength is analyzed when using simple playouts. The winning rate against GNU Go on 9 × 9 board was approximately 10 %, and it was also extremely weak on 19 × 19 boards.

The first known work was described in an unpublished report written by Brügmann in 1993 (Brügmann 1993). There was more sophisticated approach based on one-ply Monte Carlo Go. They had comparable strengths with other approaches, but it was clearly not the most successful approach for Go. However this idea is important because it triggered the invention of the Monte Carlo tree search algorithm.

Monte Carlo Tree Search and Go Programs

As described above, one-ply Monte Carlo Go introduced a new way of evaluating the board position which does not require an evaluation function. But there was also a fundamental weakness. The breakthrough came in the year 2006.

Brief History of MCTS Invention

Go program Crazy Stone, developed by a French researcher Rémi Coulom, is the winner of the 9 × 9 Go division of the 11th Computer Olympiad taken place at Turin in 2006. The algorithm used in Crazy Stone was published at the same time in Computers and Games Conference which was one of the joint events with the Olympiad (Coulom et al. 2006). It is widely regarded that the algorithm developed for Crazy Stone by Coulom is the first MCTS algorithm.

Based on the success of Crazy Stone, Kocsis and Csaba Szepesvári submitted the paper about Upper Confidence applied to Trees (UCT) algorithm to ECML 2006 Conference (Kocsis and Szepesvári 2006). UCT had the proof of convergence to the optimal solution which Crazy Stone’s first approach did not have (explained in section “UCT Algorithm”).

At first, it seemed MCTS works only for small boards. However, soon after the UCT paper was published, a Go program named MoGo became the first Go program to achieve a shodan on 19 × 19 board (Gelly et al. 2006) (on an Internet Go server, KGS (KGS Go Server 2015)) and became famous among Go players.

Basic Framework of MCTS

The differences between one-ply Monte Carlo Go and MCTS seem simple. First, more playouts will be performed from more promising branches (Fig. 10). Then if the number of playouts on a leaf node exceeds a threshold, the leaf will be expanded (Fig. 11). With these modifications, the tree will grow in an unbalanced manner growing toward the promising parts of the tree. It covers the weakness of the one-ply Monte Carlo Go programs and significantly improved the strength.
Fig. 10

Enough playouts on promising branches

Fig. 11

Expand promising nodes

However, at this point, the definition of promising branch is not clear. The key point of the algorithm is the selection of promising branches which is explained in the following sections.

Theoretical Background: Multi-armed Bandit

The basic approach was surprisingly simple. However, promising branch has to be decided appropriately. Possibly the simplest approach is to select the branch with the highest mean reward. But it is obviously a bad idea, because if the first playout of the (unknown) optimal branch had lost, it will never be selected again. Therefore, the selection method has to give an advantage to branches with small number playouts. More formerly saying, for MCTS to be successful, branches with large confidence interval must be given a positive bias. Theories of the multi-armed bandit (MAB) problem gave a solution. MAB is an old problem which is studied from 1930s.

The problem settings are as follows. You have a certain number of coins and there is a slot machine which has a number of arms. Each arm returns a reward based on an unknown distribution. The goal is to find a strategy which minimizes the expected value of cumulative regret. Cumulative regret of a strategy is the difference between the sum of the expected reward of the strategy and the sum of the ideal optimal reward which could be obtained by pulling the optimal arm every time. (There are many different formulations of MAB but this entry focuses on the settings which is related to MCTS and Go.)

Intuitively, part of the coins must be used to explore the arm, and the majority of the coins should be spent on the optimal arm. This is called the exploration-exploitation dilemma.

Analysis of the theoretically optimal solution is already given in 1985 by Lai and Robbins (Lai and Robbins 1985), but their algorithm was complex and time consuming. Auer et al. proposed a tractable and also optimal (with constant factor difference) strategy based on upper confidence bound (UCB) (Auer et al. 2002). They proposed several strategies but here we only introduce UCB1. Algorithm UCB1 chooses the arm with the highest UCB1 value which is defined as
$$ \mathrm{U}\mathrm{C}\mathrm{B}1={\overline{X}}_i+C\sqrt{\frac{ \ln t}{s_i}} $$
(1)
where \( {\overline{X}}_i \) is the mean reward of i-th arm, s i is the number of coins spent for the i-th machine, t is the total number of coins spent so far, and C is a constant called the exploration constant which is defined based on the range of the reward. For the proof described in Auer et al. (2002) to hold, C should be \( \sqrt{2} \) if the range of the reward is [0, 1]. However, it is also proposed that C should be adjusted for the target problems to achieve better performance.

The first term is the mean term and second term is the bias term. While arms with higher mean tend to be chosen, the bias term gives an advantage to arms with small number of coins.

There is a close relation with MAB and one-ply Monte Carlo Go (Fig. 12). Each arm is a move, one coin is one playout, and the number of coins stands for the amount of the thinking time. The goal is slightly different, but as explained in the next section, UCB1 works well if combined with tree search.
Fig. 12

Relation between MAB and one-ply Monte Carlo Go

UCT Algorithm

UCT is a tree search algorithm which uses UCB1 for branch selection. UCT does the following procedure repeatedly until a given time limit is reached or a given number of playouts are performed:
  1. 1.

    Follow the branch with the highest UCB1 value until reaching the leaf node.

     
  2. 2.

    If the number of playouts at the leaf exceeds a given threshold, expand the node.

     
  3. 3.

    Do one playout.

     
  4. 4.

    Update the values of the nodes on the path.

     

UCT is a generic algorithm which works for various problems, and it also has a proof of converging to the optimal solution if the range of the playout reward is in [0, 1]. However, in the same way, as the constant in UCB1, exploration constant C should be adjusted for UCT also (e.g., to make Go programs stronger).

Reward Definition and Playing Style

Crazy Stone attracted the attention of Go programmers not only with the strength but also with the unique playing style. It won many games by the smallest possible margin by intentionally (it looked like so) playing safe-win moves.

Play aggressively when losing; play safely when winning. It was a very difficult task for minimax search-based programs. But MCTS-based Go programs naturally acquire this ability. It is based on the definition of playout rewards. Since Go is a score-based game, it is possible to use the score itself as the reward. However, if the reward is two valued (e.g., 1 for win and 0 for loss), MCTS tries to maximize the winning probability, not the score difference. The early version of Crazy Stone was using the score as the reward, and the winning rate against GNU Go was in 30–40 % range. After the reward was changed to 0, 1, it jumped up to higher than 60 %.

Why MCTS Works for Go (Or Weakness of MCTS)

MCTS has a generic framework and it drastically improved Go program strength. But, of course, it is not an all mighty algorithm. Theoretical and practical analysis revealed the weakness of MCTS if the tree has a deceptive structure or trap.

A trap is a tree where a small number of branches have significantly better (or worse) values than other branches. If a long sequence trap is in the tree, it is highly unlikely for MCTS to find the correct solution. In Go the situation typically occurs in a ladder where only one move is the correct move and all others are blunders. Early MCTS-based Go programs did actually miss long ladders in real games.

A Go proverb says, “if you don’t know ladders, don’t play Go.” It is impossible to make a strong Go program without correctly recognizing ladders. Recent Go programs handle ladders by playouts. As explained later in section “Playout Enhancements,” playouts used in recent Go programs are far from random. The ladder sequences in real games are simple and playouts can solve them. From the viewpoint of the tree search algorithm, the trap is removed by playouts.

MCTS is a combination of tree search and playout. Playout can read simple deep sequences. Tree search can select the best branch from various options. If the combination is effective, MCTS works well. However, there are often needs for reading long sequences of moves in tactical situations (capturing or life and death is typical). It is difficult to make playouts correctly read tactical sequence. This is widely regarded as the remaining weakness of MCTS-based Go programs.

Enhancements for MCTS-Based Go Programs

RAVE and AMAF

UCT has a proof of convergence and works fairly well, but state-of-the art Go programs (as of 2015) are not relying on UCT. Practitioners ignored the theory and replaced the bias term with other terms using Go knowledge. Rapid Action Value Estimation (RAVE) is one of the most popular techniques used in Go (Gelly et al. 2007).

Occupying a point is often crucial in Go regardless of the order of moves. A heuristic technique called All Moves As First (AMAF) heuristic is invented based on this observation. Instead of forgetting the sequence in playouts, AMAF updates the values of all moves that appeared in playout sequences. It is inaccurate but the update speed is improved by a large margin. In RAVE, branches with small number of playouts use AMAF-based values, and as the playouts increases, it is gradually replaced by true values of playouts.

Playout Enhancements

Improving playout quality is the most important and subtle part of MCTS-based Go programs. Both handcrafted approach and machine learning approach succeed (as of 2014).

MoGo had used handcrafted playouts, and it is said that program Zen (one of the strongest programs in 2014) also uses at least partly handcrafted approach. Many other programs use different approach. Pattern-based features are defined by programmers and the weights are adjusted by machine learning. Typically, game records played by strong players are used as training data, and the objective function will be the matching rate with the expert moves. In both approaches, the playouts will choose more “reasonable” moves which makes it possible to solve simple tactical situations including ladders. How to make good playout is still not clear because playout and tree search are correlated in a complex manner and theoretical analysis is difficult.

Progressive Widening

To find good moves in game playing, search must focus on promising part of the tree. In MCTS, progressive widening method is popularly used for pruning unpromising part. If the number of playout at a node is small, only few branches will be selected as the target of search. As the number of playouts increases, more branches are added.

Parallelization

Using shared memory parallel, MCTS is common for strong Go programs. Normal implementation based on lock mechanism achieves speedup on multi-core machines. It is also known that the performance could be improved by using lockless hash technique.

For distributed memory environment, root parallel approach is used by several strong programs. Each compute node independently searches with different random seeds, and a small part of the tree is shared among the compute nodes (e.g., tree nodes with depth 1–3 are shared). It is known to scale well for up to several dozens of computers.

Transpositions and MCTS

Game tree of Go is actually not tree but a directed cyclic graph. Transpositions often occur when different sequence of moves results in the same board position. As shown in the left of Fig. 13, it is not trivial to decide the win rate of nodes for DAGs. Efficient handling of transpositions in MCTS is still an interesting open problem (Fig. 13).
Fig. 13

Transpositions and UCT

Go programs uses mainly two ways. One is to ignore transpositions and use trees. This is wasting computational time, but it is possible to make strong enough programs based on trees. The other is to record the values separately for nodes and branches. UCT is proved to converge to the optimal solution if the values stored in nodes are used for mean term and values of the branches are used for the bias term, as shown in the right of Fig. 13.

Implementation Techniques

Here is a list of common components and techniques for modern Go programs:
  • Fast data structures for Go board, including block and pattern information.

  • Fast pattern matcher including simple 3 × 3 matcher and heuristic features needed in both machine learning phase and playing phase.

  • Machine learning methods.

  • Zobrist hashing for fast hash value calculation.

  • Game database used as training data for machine learning and opening book construction.

  • Time control for playing games in tournament.

  • Pondering (thinking while the opponent is thinking) and tree (or hash table) reuse.

  • Dynamic komi. Especially important for handicapped games. Adjust virtual komi to avoid playing too safe (too aggressive) moves.

  • Using the results of tactical searches such as capture search or life-and-death search.

  • Opening book.

Concluding Remarks

Current Computer Go Strength

N. Wedd maintains a Web page titled “Human-Computer Go Challenges” (Wedd 2015). After the invention of MCTS, strength of Go programs improved rapidly. From 2012 to 2014, strongest Go programs (Zen and Crazy Stone) have played several 4-stone handicapped games against professional players including former champions (4-stone handicap means approximately 4-dan difference.) The results include similar number of wins and losses.

Discussion

Before the invention of MCTS, Go was regarded as a grand challenge of game AI research because of the difficulty. The difficulty of Go led to the invention of an epoch-making algorithm, Monte Carlo tree search. Many MCTS-related researches exist both in theory and application and in game and nongame domains. Still, Go is the most intensively studied target for MCTS.

There are many studies about search algorithm and machine learning, which is combined with many implementation techniques. Many researchers are working how to exploit increasing computational power of recent computers. Recently, at the end of year 2014, the first success of deep learning approach for Go was reported. Deep learning could be the candidate for the future breakthrough. It is still in early research phase, but the results seem promising.

Computer Go is improving rapidly and it is difficult to predict even in the near future. At least for some more years, Go is likely to remain as one of the most interesting challenges in game AI.

Cross-References

References

  1. Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multi-armed bandit problem. Mach. Learn. 47, 235–256 (2002)CrossRefzbMATHGoogle Scholar
  2. Berlekamp, E., Wolfe, D.: Mathematical go: chilling gets the last point. A K Peters, Wellesley (1994)zbMATHGoogle Scholar
  3. Brügmann, B. Monte Carlo Go. Technical report, 1993. Unpublished draft, http://www.althofer.de/Bruegmann-MonteCarloGo.pdf
  4. Coulom, R.: Efficient selectivity and backup operators in Monte-Carlo tree search. In: Proceedings of the 5th International Conference on Computers and Games (CG’2006). Lecture Notes in Computer Science, vol. 4630, pp. 72–83 (2006)Google Scholar
  5. Gelly, S., Wang, Y., Munos, R., Teytaud, O.: Modification of UCT with patterns in Monte-Carlo Go. Technical report 6062, INRIA (2006)Google Scholar
  6. Gelly, S., Silver, D.: Combining online and offline knowledge in UCT. In: Proceedings of the 24th International Conference on Machine Learning (ICML 2007), pp. 273–280 (2007)Google Scholar
  7. Kgs go server. https://www.gokgs.com/ . Accessed 12 Feb 2015
  8. Kocsis, L., Szepesvári, C.: Bandit based Monte-Carlo planning. In: Proceedings 17th European Conference on Machine Learning (ECML 2006), pp. 282–293 (2006)Google Scholar
  9. Lai, T.L., Robbins, H.: Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6(1), 4–22 (1985)MathSciNetCrossRefzbMATHGoogle Scholar
  10. Müller, M.: Computer Go. Artif. Intell. 134(1–2), 145–179 (2002)CrossRefzbMATHGoogle Scholar
  11. Robson, J.M.: The complexity of go. In: IFIP Congress, pp. 413–417 (1983)Google Scholar
  12. Schaeffer, J., Müller, M., Kishimoto, A.: Ais have mastered chess. will go be next? IEEE Spectrum, July 2014.Google Scholar
  13. van der Werf, E.C.D.: First player scores for mxn go. http://erikvanderwerf.tengen.nl/mxngo.html . Accessed Dec 2015
  14. Wedd, N.: Human-computer go challenges. http://www.computer-go.info/h-c/index.html . Accessed 12 Feb 2015

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Graduate School of Frontier SciencesThe University of TokyoTokyoJapan
  2. 2.University of AlbertaEdmontonCanada