Computing and Software Science pp 200216  Cite as
Rating Computer Science via Chess
 3k Downloads
Abstract
Computer chess was originally purposed for insight into the human mind. It became a quest to get the most power out of computer hardware and software. The goal was specialized but the advances spanned multiple areas, from heuristic search to massive parallelism. Success was measured not by standard software or hardware benchmarks, nor theoretical aims like improving the exponents of algorithms, but by victory over the best human players. To gear up for limited human challenge opportunities, designers of chess machines needed to forecast their skill on the human rating scale. Our thesis is that this challenge led to ways of rating computers on the whole and also rating the effectiveness of our field at solving hard problems. We describe rating systems, the workings of chess programs, advances from computer science, the history of some prominent machines and programs, and ways of rating them.
1 Ratings
Computer chess was already recognized as a field when LNCS began in 1971. Its early history, from seminal papers by Shannon [1] and Turing [2], after earlier work by Zuse and Wiener, has been told in [3, 4, 5] among other sources. Its later history, climaxing with humanity’s dethronement in the victory by IBM’s Deep Blue over Garry Kasparov and further dominance even by programs on smartphones, will be subordinated to telling how rating the effectiveness of hardware and software components indicates the progress of computing. Whereas computer chess was first viewed as an AI problem, we will note contributions from diverse software and hardware areas that have also graced the volumes of LNCS.
In 1971, David Levy was feeling good about his bet made in 1968 with Alan Newell that no computer would defeat him in a match by 1978 [6]. That year also saw the adoption by the World Chess Federation (FIDE) of the Elo Rating System [7], which had been designed earlier for the United States Chess Federation (USCF). Levy’s FIDE rating of 2380, representative of his International Master (IM) title from FIDE, set a level of proficiency that any computer needed to achieve in order to challenge him on equal terms.
A difference of x rating points to one’s opponent corresponds to an expectation of scoring a \(p_x\) portion of the points in a series of games.
If your rating is R and you use your opponents’ ratings to add up your \(p_x\) for each of N games, that sum is your expected score s. If your actual score S is higher then you gain rating points, else your new rating \(R'\) stays even or goes down. Your performance rating over that set of games could be defined as the value \(R_p\) whose expectation \(s_p\) equals S; in practice other formulas with patches to handle the cases \(S = N\) or \(S = 0\) are employed. The last issue is how far to move R in the direction of \(R_p\) to give \(R'\). The amount of change is governed by a factor called K whose value is elective: FIDE makes K four times as large for young or beginning players as for those who have ever reached a rating of 2400.

2200 is the colloquial threshold to call a player a “master”;

2400 is required for granting the IM title, 2500 for grandmaster (GM);

2600 and above colloquially distinguishes “Strong GMs”;

2800+ has been achieved by 11 players; Bobby Fischer’s top was 2785.
Kasparov was the first player to pass 2800; current world champion Magnus Carlsen topped Kasparov’s peak of 2851 and reached 2882 in May 2014. Computer chess players, however, today range far over 3000. How did they progress through these ranks to get there? Many walks of computer science besides AI contributed to confront a hard problem. Just how hard in raw complexity terms, we discuss next.
2 Complexity and Endgame Tables
Chess players see all pertinent information. There are no hidden cards as in bridge or poker and no element of chance as in backgammon. Every chess position is welldefined as W, D, or L—that is, winning, drawing, or losing for the player to move. There is a nearuniversal belief that the starting position is D, as was proved for checkers on an \(8 \times 8\) board [11]. So how can chess players lose? The answer is that chess is complex.
Here is a remarkable fact. Take any program P that runs within n units of memory. We can set up a position \(P'\) on an \(N \times N\) board—where N and the number of extra pieces are “moderately” bigger than n—such that \(P'\) is W if and only if P terminates with a desired answer. Moreover, finding the winning strategy in \(P'\) quickly reveals a solution to the problem for which P was coded.
Most remarkably, even if P runs for \(2^n\) steps, such as for solving the Towers of Hanoi puzzle with n rings, individual plays of the game from \(P'\) will take far less time. The “Fifty Move Rule” in standard chess allows either side to claim a draw if 50 moves have been played with no capture or pawn advance. Various reasonable ways to extend it to \(N \times N\) boards will limit plays to time proportional to \(N^2\) or \(N^3\). The exponential time taken by P is sublimated into the branching of the strategy from \(P'\) within these time bounds. For the tower puzzle, the first move frames the middle step of transferring the bottom ring, then play branches into similar but separate combinations for the ‘before’ and ‘after’ stages of moving the other \(n1\) rings.
If we allow P on sizen cases z of the problem to use \(2^n\) memory as well as time, then we must lift the time limit on plays from \(P'\), but the size of the board and the time to calculate \(P'\) from P and z remain moderate—that is, bounded by a polynomial in n. In terms of computational complexity as represented by Allender’s contribution [12], \(N \times N\) chess is complete in polynomial space with a generalized fiftymove rule [13], and complete in exponential time without it [14]. This “doublerail” completeness also hints that the decision problem for chess is relatively hard to parallelize. Checkers, Go, Othello, and similar strategy games extended to \(N \times N\) boards enjoy at least one rail of hardness [15, 16, 17, 18].
These results as N grows do not dictate high complexity for \(N = 8\) but their strong hint manifests quickly in chess. The Lomonosov tables [19] give perfect strategies for all positions of up to 7 pieces. They reside only in Moscow and their webaccessible format takes up 140 terabytes. This huge message springs from a small seed because the rules of chess fit on a postcard, yet is computationally deep insofar as the effort required to generate it is extreme. The digits of \(\pi \) are as easy as pie by comparison [20]. These tables may be the deepest message we have ever computed.
Even with just 4 pieces, the first item in our history after 1971 shows how computers tapped complexity unsuspected by human players. When defending with king and rook versus king and queen, it was axiomatic that the rook needed to stay in guarding range of the king to avoid getting picked off by a fork from the queen. Such huddling made life easier for the attacker. Computers showed that the rook could often dance away with impunity and harass from the sides to delay up to 31 moves before falling to capture—out of the 50 allotted for the attacker to convert by reducing (or changing) the material. Ken Thompson tabulated this endgame for his program Belle and in 1978 challenged GM Walter Browne to execute the win. Browne failed in his first try, and after extensive study before a second try, squeaked through by capturing the rook on move 50.
Thompson generated perfect tables for 5 pieces with positions tiered by distancetoconversion (DTC)—that is, the maximum number of moves the defender could delay conversion. In distancetomate (DTM), the king and queen versus king and rook endgame can last 35 moves. The 5piece tables in Eugene Nalimov’s popular DTM format occupy 7.1 GB uncompressed. Distancetozero (DTZ) is the minimum number of moves to force a capture or pawn move while retaining a W value; if the DTZ is over 50 then its “Z50” flavor flips the position value from W to D in strict accordance with the 50move draw rule.
Thompson also generated tables for all 6piece positions without pawns. He found positions requiring up to 243 moves to convert and 262 moves to mate. In many more, the winning strategy is so subtle and painstaking as to be thought beyond human capability to execute. The Lomonosov tables, which are DTMbased, have upped the record to 545 moves to mate—more precisely, 1,088 ply with the loser moving first. Some work on 8piece tablebases is underway but no estimate of when they may finish seems possible. This goes to indicate that positions with full armies are intractably complex, so that navigating them becomes a heuristic activity. What ingredients allow programs to cope?
3 The Machines: Software to Hardware to Software
 1.
Position representation—by which the rules of chess are encoded and legal moves are generated;
 2.
Position evaluation—by which “knowledge” is converted into numbers; and
 3.
Search heuristics—whose ingenuity marches on through the present.
Generating legal moves is cumbersome especially for the sliding pieces bishop, rook, and queen. A software strategy used early on was to maintain and update their limits in each of the compass directions. Limit squares can be off the board, and the trick of situating the board inside a larger array pays a second dividend of disambiguating differences in square indices. For example, the “0x88” layout uses cells 0–7 then 16–23 and so on up to 112–119. Cell pairs with differences in the range [−7,7] must then belong to the same rank (that is, row). The 0x88 layout aligns differences a multiple of 15 southeastnorthwest, 16 southnorth, and 17 southwestnortheast. Offboard squares are distinguished by having nonzero bitwiseAND with 10001000, which is 0x88 in hexadecimal.
Such tricks go only yeafar, and it became incumbent to implement board operations directly in hardware. As noted by Lucci and Kopec [21], the best computer players from Belle through Deep Blue went this route in the 1980s and 1990s. They avoided the “von Neumann bottleneck” via multiprocessing of both data support and calculation. Chess programs realize less than full benefits of extra processing cores [22], an echo of the parallel hardness mentioned above.
Evaluation assigns to each position p a numerical value \(e_0(p)\). The values are commonly output in discrete units of 0.01 called centipawns (cp), figuratively 1/100 the base value of a pawn. The knight and bishop usually have base values between 300 and 350 cp, the rook around 500 cp, and the queen somewhere between 850 and 1,000 cp. The values are adjusted for positional factors, such as pawns becoming stronger when further advanced and “passed” but weaker when “doubled” or isolated. Greater mobility and attacks on forward and central squares bring higher values. King safety is a third important category, judged by the structure of the king’s pawn shield and the proximity of attackers and defenders. The fourth factor emphasized by Deep Blue [24] is “tempo,” meaning ability to generate threats and moves that enhance the position score. Additional factors face a tradeoff against the need for speedy evaluation, but this is helped by computing them in parallel pipes and by keeping the formula linear. Much human ingenuity goes into choosing and formulating the factors, but of late their weights have been determined by massive empirical testing (see [25]).
3.1 Search and Soundness
Search has a natural recursive structure. We can replace \(e_0(p)\) by the maximum—from the player to move’s point of view—of \(e_0(p')\) over the set \(F_1\) of positions \(p'\) reachable by one legal move, calling this \(e_1(p)\). From the other player’s point of view those positions have value \(e'_0(p') = e_0(p')\). Now let \(F_2\) be the set of positions \(p''\) reachable by a move from some \(p'\) and define \(e'_1(p')\) to be the maximum of \(e'_0(p'')\) over all \(p''\) reached from \(p'\). From the first player’s view this becomes a minimizing update \(e_1(p')\); then redoing the maximization at the root p over these values yields \(e_2(p)\). This socalled the negamax form of minimax search is often coded as a recursion exactly so. The sequence \(p',p''\) such that \(e_2(p) = e_1(p') = e_0(p'')\) (breaking any ties in the order nodes were considered) traces out the principal variation (PV) of the search, and the move \(m_1\) leading to \(p'\) is the engine’s bestmove (or firstmove).
Continuing for \(d \ge 3\), we define \(F_d\) to comprise all positions r reached by one move from a position \(q \in F_{d1}\). Multiple lines of play may go from p to r through different q. Such transpositions may also have different lengths so that \(F_d\) overlaps \(F_i\) for some \(i < d\) of the same parity. Given initial evaluations \(e_0(r)\) for all \(r \in F_d\), minimax welldefines \(e_d(p)\) and a PV to a node \(r \in F_d\) so that all nodes in the PV have value \(e_d(p) = e(r)\). In case of overlap at a node u in \(F_i\) the value with higher generation subscript—namely j in \(e_j(u)\)—is preferred. The simple depthd search has \(e(r) = e_0(r)\) for all \(r \in F_d\), but we may get other values e(r) by search extension beyond the base depth d, possibly counting them as having higher generation and extending the PV.

E includes enough of \(F_c\) that no value \(e_0(q)\) for an unvisited node \(q \in F_c \setminus E\) affects \(v_d(p)\) by minimax;

most of the time this is true for \(F_d\) in place of \(F_c\); and

\(v_d(p)\) approximates \(e_D(p)\).
The first clause is solidly defined and says that the search is sound for depth c. The second clause aspires to soundness for a stipulated depth d and motivates our first considering search strategies that alone cannot violate such soundness. The third clause is about trying to extend the search to depths \(D > d\) without reference to soundness.
Nearly all chess programs use a structure of iterative deepening in successive rounds \(d = 1,2,3,\dots \) of search. The sizes of the sets \(E = E_d\) of nodes visited in round d nearly always follow a geometric series so that the effective branching factor (ebf) of the search—variously reckoned as \(E_d/E_{d1}\) or as \(E_d^{1/d}\) for high enough d—is bounded by a constant. This constant should be significantly less than the “basic” branching factor \(F_d/F_{d1}\). Similar remarks apply for the overall time \(T_d\) to produce \(v_d(p)\) and the number \(N_d\) of node visits (counting multiple visits to the same node) in place of \(E_d\).
3.2 AlphaBeta
The first search strategy involves guessing \(\alpha \) and \(\beta \) such that our ultimate \(v_d = v_d(p)\) will belong to a window \((\alpha ,\beta )\) with \(\beta  \alpha \) as small as we dare. One motive for iterative deepening is to compute \(v_{d1}\) on which to center the window for round d. Values outside the window are reckoned as “\({\ge }\beta \)” or “\({\le }\alpha \)” and these endpointvalues work fine in minimax—if \(e_d(p)\) crosses one of them then we fail high or fail low, respectively. After a faillow we can double the lower window width by taking \(\alpha ' = 2\alpha  v_{d1}\) and try again, doing similar for a failhigh, and possibly winding back to an earlier round \(d' < d\). Using endpoints relieves the burden of being precise about values away from \(v_d\). This translates into search savings via cutoffs described next.
Suppose we enter node p as shown in Fig. 1 with window (1, 6) and the first child \(p'\) yields value 3 along the current PV. This lets us search the next child \(q'\) with the narrower window (3, 6). Now suppose this fails because its first child \(q''\) gives value 2. It returns the value “\({\le }2\)” for \(q'\) without needing to consider any more of its children, so search is cut off there and we pop back up to p to consider its next child, \(r'\). Next suppose \(r'\) yields value 7. This breaks \(\beta \) for p and all further children of p are considered betacutoffs. If p is the root then this failhigh restarts the search until we find a bound \(\beta '\) that holds up when \(v_d(p)\) is returned. If not—say if the \(\beta = 6\) value came from a sibling n of p as shown in the figure—then p gets the value “\({\ge }6\)” and pops up to its parent. A value \(v_{d1}(r') = 4\), however, would move the PV to go through \(r'\) and keep the search going with exact values in telescoping windows between \(\alpha \) and \(\beta \).
Returning to the betacutoff from \(v(r') = 7\), consider what happened along the new PV in nodes below \(r'\). Every defensive move \(m'\) at \(r'\) needed to be tried in order to show that none kept the lid under \(\beta = 6\); there were no alphacutoffs on these moves. This situation propagates downward so we’ve searched all children of half the nodes on the PV. If there are always \(\ell \) such children then we’ve done about \(\ell ^{d/2} = (\sqrt{\ell })^d\) work. This is the general bestcase for alphabeta search when soundness is kept at depth d, and it is often approachable. A further moveordering idea that helps is to try “killer moves” that achieved cutoffs in sibling positions first, even recalling them from searches at previous moves in the game. But with \(\ell \) between 30 and 40 in typical chess positions, optimizing cutoffs alone brings the ebf down only to about 6.
Further savings come from storing values \(e_j(q)\) at hashed locations h(q) in the transposition table. The most common scheme assigns a “random”butfixed 64bit code to each combination of 12 kinds of piece and square. This makes \(12 \times 64 = 768\) codes, plus one for the side to move, four for White and Black castling rights, and eight for the files of possible enpassant captures. The primary key H(q) is the bitwiseXOR of the basic codes that apply to q. Then the secondary key h(q) can be defined by H(q) modulo the size N of the hash table, or when \(N = 2^k\) for some k, by taking k bits off one end of H(q). Getting H(r) for the next or previous position r merely requires XORing the codes for the destination and source squares of the piece moved, any piece captured, the sidetomove code, and any other applicable codes. Besides storing \(e_j(q)\) we store H(q) and j (and/or other “age” information), the former to confirm sameness with the position probed and the latter to tell whether \(e_j(q)\) went as deep as we need. If so, we save searching an entire subtree of the current parent of q. We may ignore the possibility of primarykey collisions \(H(q) = H(r)\) for distinct positions q, r in the same search. Collisions of secondary keys \(h(q) = h(r)\) are frequent but errors from them are often “minimaxed away” (see also [26]).
3.3 Extensions and Heuristics
We can get more mileage by extending D beyond d. Shannon [1] already noted that many depthd floor nodes come after a capture or check or flight from check and have moves that continue in that vein. Those may be further expanded until a position identified as quiescent is reached. Human players tend to calculate such forced sequences as a unit. Thus the gamelogical floor for round d may be deeper along important branches than the nominal depthd floor.
Furthermore, the PV may accrue many nodes q whose value hangs strongly on one move m to a position \(q'\), so that a large change to \(e_i(q')\) would change \(e_{i+1}(q)\) by the same amount. The move m is called singular and warrants a better fix on its value by expanding it deeper. Such singular extensions could be reserved for cases of delaying moves by a defender on the ropes or moves known to affect positions already seen in the search, or liberalized to consider groups of two or more move options as “singular” [27, 28].
Other extensions have been tried. Search depths are commonly noted as “d/D” where d is the nominal depth and D is the maximum extended depth. Their values e(r) for \(r \in F_d\) may differ widely from \(e_0(r)\) but this does not violate our notion of depthd soundness which takes those values e(r) as given. We have added more nodes beyond \(F_d\) but not saved any more inside it than we had from cutoffs. Further progress needs compromise on soundness.
From numerous heuristics we mention two credited with much of the software side of performance gain. The idea of late move reductions (LMR) is simply to do only the first yeamany moves from the previous round’s rank order to nominal depth d, the rest to lower depths c. If \(d/c = 2\), say, this can prevent a subtle mateinnply from being seen until the search has reached round 2n. Even \(c = d4\) or \(d3\) can make terms in \((\sqrt{\ell })^c\) minor enough to replace \((\sqrt{\ell })^d\) by \((\sqrt{a})^d\) for \(a < 4\), which is enough to bring the ebf under 2.
The position at left in Fig. 2 illustrates many of the above concepts. The Lomonosov 7piece tables show it a draw with best play. Evaluation gives White a 100–200 cp edge on material with bishop and knight versus rook, but engines may differ on other factors such as which king is more exposed. After 1. Qd4+ Kc2 2. Qc5+, Black’s king cannot return to d1 because of the fork 3. Nc3+, so Black steps out with 2...Kb3. Then White has the option 3. Qb6+ Kc2 4. Qxb1+ Kxb1 5. Nc3+ Kc2 6. Nxe2. Since Black is not in check and has no captures, this position may be deemed quiescent and given a +600 to +700 value or even higher since the extra bishop plus knight is generally a winning advantage. However, Black has the quiet 6...Kd3 which forks the bishop and knight and wins one of them, leaving a completely drawn game. What makes this harder to see is that White can delay the reckoning over longer horizons by giving more checks: 4. Qc7+ Kb3 5. Qb8+ Kc2 6. Qc8+ Kb3 7. Qb7+ Kc2 8. Qc6+ Kb3. White has not repeated any position and now has three further moves 9. Qc3+ Ka2 (if Black rejects ...Ka4) 10. Qa5+ Kb3 11. Qb4+ Kc2 before needing to decide whether to take the plunge with 12.Qxb1+. Pushing things even further is that White can preface this with 1. Ke7 threatening 2. Nb4 with Black’s queen unable to give check. Black must answer by 1...Rb7+ and after 2. Kd6 must meekly return by 2...Rb1. Especially in the position after 1. Ke7 Rb7+, values can differ widely between engines and between depths for the same engine and be isolated to changes in the size of the hash table. Evidently the high degree of singularity raises the chance of a rogue e(r) value propagating to the root.
How often is the quality of play compromised? It is one thing to try these heuristics against human players, but surely a “sounder” engine is best equipped to punish any lapses. Silver [29] reports an experiment where a current engine running on a smartphone trounced one from ten years ago that was given hardware fifty times faster. Although asking for depth d really gives a mélange of c and D with envelope E lopsidedly bunched along the PV, it all works.
4 Benchmarking Progress
All the notable humancomputer matchups under standard tournament conditions over the past 40 years total in the low hundreds of games. A dozen such games at most are available for major iterations of any one machine or program. Games in computercomputer play do not connect into the human rating system. With ratings based only on a few bits of information—the outcomes of games and opponents’ ratings—the sample size is too small to get a fix. Ratings based on 25 or fewer games are labeled “provisional” by the USCF. However much we feel the lack in retrospect, it applied all the more years ago looking forward.
Various internal ways were used to project skill. Programs could be played against themselves with different depth or time limits of search. The scoring rate of the stronger over the weaker translates into an Elo difference by the curve (1). Thompson [33] carried this out with Belle at singledigit search depths, finding a steady gain of about 200 Elo per extra ply, but a followup experiment joined by Condon [34] found diminishing returns beyond depth 7.
Hsu et al. [36] reached the opposite conclusion regarding Deep Thought, projecting that a 14 or 15ply basic search with extensions beyond 30 ply would achieve a 3400 rating. The Thoresen engine competition site today shows no rating above 3230 [37]. One can say that its evolution into Deep Blue landed between the two projections. A chart from 1998 by Moravec [38] seems to justify the extrapolation to 3400 by its notably linear plot of ascribed engine ratings up to Deep Thought II near 2700 and 11 ply in 1991 and 1994, but it plots Deep Blue well under the line at 13 ply and only a 2700–2750 rating.Projections of potential gain have time and again been found to overestimate the actual gain. [Our work] suggests that once a certain knowledge gap has been opened up, it cannot be overcome by small increments in searching depth. The conclusion ... is that extending the depth of search without increasing the present level of knowledge will not in any foreseeable time lead to World Championship level chess.
What further distinguished the BratkoKopec work were tests on human subjects rated below1600, 1600–1799, 1800–1999, 2000–2199, 2200–2399, and 2400+. The results filled the whole range from only two correct out of 24 to 21of24, showing a clear correspondence to rating. The Elo rating chart in [40] assigned 2150 to Belle, 2050 to Chess 4.9, and ratings 1900 and under to Duchess and other tested programs. Their results were broadly in accord with those ratings. But all these results were from small data.Although one may disagree with the choice of test set, question its adequacy and completeness, and so on, the fact remains that the designers of computer chess programs still do not have an acceptable means of estimating the performance of chess programs, without resorting to timeconsuming and expensive “matches” against other subjects. Clearly there is considerable scope for such test sets, as successes in related areas like pattern recognition attest.
Haworth [42] proposed using endgame tables to benchmark humans—and computers not equipped with them. The DTM, DTC, and/or DTZ metrics furnish numerical scores that are indisputable and objective, and the 6 and later 7piece tables expand the range of realistic test positions. Humans of a given rating class could be benchmarked from games in actual competition that entered these endgames.
Matej Guid led Bratko back into benchmarking with a scheme using depth 12 of Crafty as authority to judge all moves (after the first twelve turns) in all games from world championship matches by intrinsic quality [43]. This was repeated with other engines as judges [44] including thenchampion Rybka 3 to reported depth 10, which arguably compares best to depth 13 or 14 on other engines since Rybka treats its bottom four search levels as a unit [45]. Coming back to Haworth and company joined by this writer, two innovations of [46, 47] were doing unpruned fulldepth analysis of multiple move options besides the best and played moves, and judging prior likelihoods of those moves by fallible agents modeling player skill profiles in a Bayesian setting. This led in [48] to using Rybka 3 to analyze essentially all legal moves to reported depth 13, training a frequentist model on thousands of games over all rating classes from 1600 to 2700, and conditioning noise from the observed greater magnitude of errors in positions where one side has a nonnegligible advantage. The model supplies not only metrics and projections but also error bars for various statistical tests of concordance with the judging engine(s) and an “Intrinsic Performance Rating” (IPR) based only on analysis of one’s moves rather than results of games.
For continuity with this past work—and because an expanded model with versions of Komodo and Stockfish as judges is not fully trained and calibrated at press time—we apply the scheme of [48] to rate the most prominent humancomputer matches as well as some ICCA/ICGA World Computer Chess Championships (WCCC). This comes with cupfuls of caveats: Rybka 3 to reported depth 13 is far stronger than Crafty to depth 12 but needs the defense [49] of the latter to justify IPR values over 2900 and probably loses resolution before 3100. The IPR currently measures accuracy more than challenge put to the opponent and is really measuring similarity to Rybka 3. Although moves from turn 9 onward (skipping repeating sequences and positions with one side ahead over 300 cp) give larger sample sizes than games, the wide twosigma error bars reflect the overall paucity of data and provisional nature of this work.
5 A “Moore’s Law of Games” and Future Prospects

There has been steady progress.

Early estimated ratings of computers were basically right.

Computers had GM level in sight before Deep Thought’s breakthrough.

Not long after the retirement of Deep Blue, championship quality became accessible to offtheshelf hardware and software.

A few years later smartphones had it, e.g. Hiarcs 13 as “Pocket Fritz.”

Progress as measured by Elo gain flattens out over time.
The last point bears comparison with Moore’s Law and arguments over its slowing or cessation. Those arguments pivot on whether the law narrowly addresses chip density or clock speed or speaks more general measure of productivity. With games we have a fixed measure—results backed by ratings—but a freeforall on how this productivity is gained.
We may need to use Elo’s transportability to other games to meter future progress. The argument that Elo sets a hard ceiling in chess goes as follows: We can imagine that today’s strong engines E could hold a nonnegligible portion d of draws against any strategy. This may need randomly selecting slightly inferior moves to avoid strategies with foresight of deterministic weaknesses. If E has rating R, then no opponent can ever be rated higher than \(R + x\) by playing E, where with reference to (1), \(p_{x} = 0.5d\). The ceiling \(R + x\) may be near at hand for chess but higher for Go—despite its recent conquest by Google DeepMind’s AlphaGo [50]. Games of Go last over a hundred moves for each player and have hairtrigger difference between win and loss.
A greater potential benefit comes from how largescale data from deep engine analysis of human games may reveal new regularities of the human mind, especially in decisionmaking under pressure. Why and when do we stop thinking and take action, and what causes us to err? For instance, this may enable transforming the analysis of blunders in [51] into a smooth treatment of error in perception. Although computer chess left the envisaged mind and knowledgebased trajectory, its powerplay success may boost the original AI aims.
References
 1.Shannon, C.: Programming a computer for playing chess. Philos. Mag. 41, 256–275 (1950)MathSciNetCrossRefGoogle Scholar
 2.Turing, A.: Computing machinery and intelligence. Mind 59, 633–660 (1950)MathSciNetGoogle Scholar
 3.Marsland, T.A.: A short history of computer chess. In: Marsland, T.A., Schaeffer, J. (eds.) Computers, Chess, and Cognition, pp. 3–7. Springer, New York (1990). https://doi.org/10.1007/9781461390800_1CrossRefGoogle Scholar
 4.Campbell, M., Feigenbaum, E., Levy, D., McCarthy, J., Newborn, M.: The History of Computer Chess: An AI Perspective (2005). http://www.computerhistory.org/collections/catalog/102651382. Video, The Computer History Museum
 5.Larson, E.: A brief history of computer chess. Best Sch. Mag. (2015)Google Scholar
 6.Levy, D.: Computer chesspast, present and future. Chess Life Rev. 28, 723–726 (1973)Google Scholar
 7.Elo, A.: The Rating of Chessplayers, Past and Present. Arco Pub., New York (1978)Google Scholar
 8.Silver, N.: Introducing Elo Ratings (2014). https://fivethirtyeight.com/datalab/introducingnfleloratings/
 9.Glickman, M.E.: Parameter estimation in large dynamic paired comparison experiments. Appl. Stat. 48, 377–394 (1999)zbMATHGoogle Scholar
 10.Sonas, J., Kaggle.com: Chess ratings: Elo versus the Rest of the World (2011). http://www.kaggle.com/c/chess
 11.Schaeffer, J., Burch, N., Björnsson, Y., Kishimoto, A., Müller, M., Lake, R., Lu, P., Sutphen, S.: Checkers is solved. Science 317, 1518–1522 (2007)MathSciNetCrossRefGoogle Scholar
 12.Allender, E.: The complexity of complexity. In: Day, A., Fellows, M., Greenberg, N., Khoussainov, B., Melnikov, A., Rosamond, F. (eds.) Computability and Complexity. LNCS, vol. 10010, pp. 79–94. Springer, Cham (2017). https://doi.org/10.1007/9783319500621_6 CrossRefzbMATHGoogle Scholar
 13.Storer, J.: On the complexity of chess. J. Comput. Syst. Sci. 27, 77–100 (1983)MathSciNetCrossRefGoogle Scholar
 14.Fraenkel, A., Lichtenstein, D.: Computing a perfect strategy for n x n chess requires time exponential in n. J. Comb. Theory 31, 199–214 (1981)MathSciNetCrossRefGoogle Scholar
 15.Lichtenstein, D., Sipser, M.: Go is polynomialspace hard. J. ACM 27, 393–401 (1980)MathSciNetCrossRefGoogle Scholar
 16.Robson, J.: The complexity of Go. In: Proceedings of the IFIP Congress, pp. 413–417 (1983)Google Scholar
 17.Robson, J.: N by N checkers is Exptime complete. SIAM J. Comput. 3, 252–267 (1984)MathSciNetCrossRefGoogle Scholar
 18.Iwata, S., Kasai, T.: The Othello game on an n*n board is PSPACEcomplete. Theoret. Comput. Sci. 123, 329–340 (1994)MathSciNetCrossRefGoogle Scholar
 19.Zakharov, V., Makhnychev, V.: Creating tables of chess 7piece endgames on the Lomonosov supercomputer. Superkomp’yutery 15 (2013)Google Scholar
 20.Bailey, D., Borwein, P., Plouffe, S.: On the rapid computation of various polylogarithmic constants. Math. Comput. 66, 903–913 (1997)MathSciNetCrossRefGoogle Scholar
 21.Lucci, S., Kopec, D.: Artificial Intelligence in the 21st Century. Mercury Learning, Dulles (2013)Google Scholar
 22.Chess Programming Wiki: Parallel Search. chessprogramming.wikispaces.com/Parallel+Search. Accessed 2017
 23.Hyatt, R.: Rotated bitmaps, a new twist on an old idea. ICCA J. 22, 213–222 (1999)Google Scholar
 24.IBM Research: How Deep Blue works (1997). https://www.research.ibm.com/deepblue/meet/html/d.3.2.html
 25.Chess Programming Wiki: Automated Tuning. https://chessprogramming.wikispaces.com/Automated+Tuning. Accessed 2017
 26.Hyatt, R., Cozzie, A.: The effect of hash signature collisions in a computer chess program. ICGA J. 28, 131–139 (2005)CrossRefGoogle Scholar
 27.Anantharaman, T., Campbell, M., Hsu, F.: Singular extensions: adding selectivity to bruteforce searching. Artif. Intell. 43, 99–110 (1990)CrossRefGoogle Scholar
 28.Hsu, F.H.: Behind Deep Blue: Building the Computer that Defeated the World Chess Champion. Princeton University Press, Princeton (2002)Google Scholar
 29.Silver, A.: Komodo 8: the smartphone vs desktop challenge (2014). https://en.chessbase.com/post/komodo8thesmartphonevsdesktopchallenge
 30.Berliner, H.: The B* tree search algorithm: a bestfirst proof procedure. Artif. Intell. 12, 23–40 (1979)MathSciNetCrossRefGoogle Scholar
 31.ChessBase: Big 2017 Chess Database (2017)Google Scholar
 32.Ban, A.: Automatic learning of evaluation, with applications to computer chess. Technical Report Discussion Paper 613, Center for the Study of Rationality, Hebrew University (2012)Google Scholar
 33.Thompson, K.: Computer chess strength. In: Advances in Computer Chess, vol. 3, pp. 55–56. Pergamon Press (1982)Google Scholar
 34.Condon, J., Thompson, K.: Belle. In: Frey, P. (ed.) Chess Skill in Man and Machine, pp. 201–210. Springer, Heidelberg (1982). https://www.springer.com/us/book/9780387908151CrossRefGoogle Scholar
 35.Berliner, H., Geotsch, G., Campbell, M., Ebeling, C.: Measuring the performance potential of chess programs. Artif. Intell. 43(1), 7–20 (1990)CrossRefGoogle Scholar
 36.Hsu, F.H., Anantharaman, T., Campbell, M., Nowatzyk, A.: A grandmaster chess machine. Sci. Am. 263, 44–50 (1990)CrossRefGoogle Scholar
 37.Top Chess Engine Championship: Ratings after Season 9  Superfinal. http://tcec.chessdom.com/archive.php. Accessed 2017
 38.Moravec, H.: When will computer hardware match the human brain? J. Evol. Technol. 1 (1998)Google Scholar
 39.Bratko, I., Kopec, D.: A test for comparison of human and computer performance in chess. In: Advances in Computer Chess, vol. 3, pp. 31–56. Elsevier (1982)Google Scholar
 40.Kopec, D., Bratko, I.: The BratkoKopec experiment: a comparison of human and computer performance in chess. In: Advances in Computer Chess, vol. 3, pp. 57–72. Elsevier (1982)Google Scholar
 41.Marsland, T.: The BratkoKopec test revisited. ICCA J. 13, 15–19 (1990)Google Scholar
 42.Haworth, G.: Reference fallible endgame play. ICGA J. 26, 81–91 (2003)CrossRefGoogle Scholar
 43.Guid, M., Bratko, I.: Computer analysis of world chess champions. ICGA J. 29, 65–73 (2006)CrossRefGoogle Scholar
 44.Guid, M., Bratko, I.: Using heuristicsearch based engines for estimating human skill at chess. ICGA J. 34, 71–81 (2011)CrossRefGoogle Scholar
 45.Rajlich, V., Kaufman, L.: Rybka 3 chess engine (2008). www.rybkachess.com
 46.DiFatta, G., Haworth, G., Regan, K.: Skill rating by Bayesian inference. In: Proceedings of 2009 IEEE Symposium on Computational Intelligence and Data Mining (CIDM 2009), Nashville, TN, pp. 89–94 (2009)Google Scholar
 47.Haworth, G., Regan, K., Di Fatta, G.: Performance and prediction: Bayesian modelling of fallible choice in chess. In: van den Herik, H.J., Spronck, P. (eds.) ACG 2009. LNCS, vol. 6048, pp. 99–110. Springer, Heidelberg (2010). https://doi.org/10.1007/9783642129933_10CrossRefGoogle Scholar
 48.Regan, K., Haworth, G.: Intrinsic chess ratings. In: Proceedings of AAAI 2011, San Francisco, pp. 834–839 (2011)Google Scholar
 49.Guid, M., Pérez, A., Bratko, I.: How trustworthy is Crafty’s analysis of world chess champions? ICGA J. 31, 131–144 (2008)CrossRefGoogle Scholar
 50.Silver, D., et al.: Mastering the game of Go with deep neural networks and tree search. Nature 529, 484–489 (2016)CrossRefGoogle Scholar
 51.Chabris, C., Hearst, E.: Visualization, pattern recognition, and forward search: effects of playing speed and sight of the position on grandmaster chess errors. Cogn. Sci. 27, 637–648 (2003)CrossRefGoogle Scholar