Our interest here is to explore the aims, methods, relative strengths and limitations of human and machine learning in the context of games. To this end, it might be useful to distinguish between learning to play a game and learning to play a game intelligently.Footnote 2 A mere ability to play a game only requires a knowledge of the rules – allowable or legal operations – of the game in question. This in itself may involve specific skills from human beings and require detailed instructions in terms of programs for a machine (digital computer). However, this aspect of learning is relatively straightforward for the present purpose and can be taught. On the other hand, we will be primarily concerned with learning to play in the latter sense, where the play is explicitly goal-oriented and there is an improvement in performance (in terms of some defined criterion) or capability of the decision-making entity.Footnote 3
Learning: A Game-Theoretic View
Learning has been a well-researched subject in different branches of game theory. We briefly (even if only inadequately) review how learning is conceived within the conventional game-theoretic literature and behavioural game theory (Camerer 2003). Learning in situations that called for strategic interaction can be traced as far back as Cournout’s analysis of duopoly, but research in this area has been revitalized only since the 1950s. To speak about learning in a meaningful manner, the setting must naturally be dynamic (at least in principle), where agents can be seen as learning something over time through experience, trial and error, acquiring new information and so on.Footnote 4 Hence, much of the discussion on learning in the literature on game-theory concerns dynamic, repeated games.
There are broadly two classes of models of learning in conventional non-cooperative game theory. We will call them, for want of better terms, the rational and adaptive class of models. In the former class, agents are rational, but may not necessarily be in equilibrium. The learning activity for these agents involves forecasting the behaviour of their opponents and responding to them optimally. These learning processes can vary greatly in terms of the sophistication that the agents employ while forecasting their opponent’s behaviour. Typically, agents possess a prediction rule that maps the history of the game until a given point to a probability distribution over the future actions of the opponent. This prediction rule can be deterministic or stochastic, with perfect or imperfect information, and agents are assumed to respond optimally with respect to it when choosing their actions at every stage of the game. The broader question of interest is whether and how these rational agents learn to play equilibrium strategies of the game starting from out-of-equilibrium situations. Learning in this context is seen as a dynamic adjustment process of agents groping to equilibrium. A well researched learning model in this class is belief learning in which agents dynamically update their beliefs about what their opponent will do based on actions carried out in previous periods of the game. Models such as best-response dynamics, fictitious play (Brown 1951) fall under this category. In the case of fictitious play, the agent tracks the relative frequency with which different strategies are played by their opponent in the past. This information is then used to form beliefs or expectations about strategic choices of the opponent in the current period.Footnote 5 Agents are typically assumed to choose a strategy which maximizes their expected payoffs in the light of their beliefs about opponent’s behaviour. In contrast, agents in Bayesian learning models choose a strategy that is a best-response to a prior, i.e., a probability distribution over the strategies of the opponent.Footnote 6 Apart from these relatively unsophisticated models of learning which focus only on information about the history of the game, we also have models of sophisticated learning (Kalai and Lehrer 1993) that take in to account the information available to the opponents, their payoffs and degree of rationality.Footnote 7
The second class of learning models, viz., adaptive models, do not assume that the agents optimize and behave rationally as understood in the conventional sense. Instead, they employ heuristic methods or behavioural rules to choose their actions at every stage of the game. A simple example of such a behavioural rule is imitation, which is a basic form of social learning. Another learning rule that has received a lot of attention in the literature is reinforcement learning (or stimulus response learning), which has its origins in behavioural psychology (Bush and Mosteller 1951; Roth and Erev 1995; Erev and Roth 1998). The intuition behind this learning scheme is that better performing choices are reinforced over time (and hence more likely to chosen in the future), while those that lead to unfavourable outcomes are not. Other examples of adaptive learning methods include regret minimization(Hart and Mas-Colell 2000), imitate the best (Axelrod 1984), learning direction theory (Selten and Stoecker 1986).Footnote 8 In addition, there are hybrid learning models, such as experience-weighted attraction (EWA) model (Camerer and Ho 1999), which combine elements from belief-learning and reinforcement learning models. Note that belief learning pays no heed to choices made by agents themselves in the past (the focus is only on the history of the opponent’s strategies and choices). Similarly, in reinforcement learning, agents ignore the structure of the game, information about strategies used by their opponents and foregone payoffs. EWA combines this potentially relevant information taking into account both forces, i.e., attraction and experience weight, in a single learning model.
The extent to which the above game-theoretic models shed light on how players might actually play games in reality and learn is a relevant question. Research on experimental and behavioural game theory (Camerer 2003) explore the empirical relevance of these models. However much of this research is limited to laboratory experiments and fitting learning models to experimental data, which poses a restriction.Footnote 9 In this paper, we are concerned in particular with learning and gaining expertise in perfect information, complex games like chess and Go. These games typically contain very large search spaces and the possibility of a player optimizing over all possible strategies is near impossible. We will analyse this in the next section.
Learning, Chess and Procedural Rationality: A Simonian View
Board games, chess in particular, have always piqued the interest of many scholars interested in the holy grail of human intelligence and AI. Chess has often been viewed as the Drosophila of artificial intelligence (Ensmenger 2012) and cognitive science (Simon and Schaeffer 1992, p. 2)Footnote 10. What makes chess (and Go) different from the games considered in traditional non-cooperative game theory and why is it a conducive and fertile ground to study actual human intelligence?
First, the game of chess may be considered as being trivial or uninteresting from the classical game-theoretic standpoint.Footnote 11 Chess is a finite, alternating, two-person, zero-sum game of perfect information, without any chance moves.Footnote 12\(^{,}\)Footnote 13 For such games, a classical game theorist might prescribe a potential rational strategy in which one goes through every branch of the game tree to see whether it leads to a win, loss or a draw, assign values to each of those accordingly and employ a minimax strategy backwards. But, lo and behold, the existence of a strategy does not necessarily imply that learning such a strategy or implementing it is a trivial task. Second, classical game theory focuses largely on substantive rationality (the focus is on what), focusing on optimal strategies and consequently, often remains silent on how agents might choose a particular move. The actual decision processes and their plausibility are not of central concern. Third, skills such as pattern recognition, knowledge acquisition, long-term memory and intuition are often vital in the actual game of chess. However, these factors, or cognitive limitations in general, are seldom seriously considered in games investigated by classical game theory.
Apart from these differences, a crucial distinguishing factor is that games like chess have deterministic but very large search spaces. The state-space and size of the game tree associated are too big and agents cannot realistically engage in exhaustive search and calculations as they do in conventional game theoretic models. These considerations qualitatively hold, pari passu, for the game of Go, if anything with increasing severity given the relatively larger and wider search space. Consequently, highly selective search may be the only viable option in the light of computational limitations faced by agents. Yet the skill which human players display in games like chess and Go is truly astounding. Players who are relatively good often employ highly selective search and engage in in-depth reasoning of a select few variations.
Not surprisingly, these factors made games – at least chess – an ideal experimental bed for AI attempting to gain insights into human intelligence. The goal was to unearth how computationally constrained humans coped with complex and rich task environments like chess. Decision processes which transformed insights from the weird and mysterious world of intuition into actual intelligent behaviour were seen to embody the essence of human thought. Thus, chess presented an ideal microcosm within which to develop and test theories concerning intelligence and cognition. The relatively straightforward rules of chess, its discrete nature that was readily amenable to analysis by digital computers together with centuries of accumulated knowledge together cemented the position of chess as a testing bed for AI. Newell and Simon, who were among the participants in the seminal AI conference in Dartmouth in 1956, shortly declared: ‘chess is the intellectual game par excellence....If one could devise a successful chess machine, one would seem to have penetrated to the core of human intellectual endeavor.’ (Newell et al. 1958, p. 320, italics added).
For Simon, human decision makers are constrained by computational limitations while operating in complex environments. In such environments, Simon argued that they are best seen as problem solvers (not maximisers or optimizers), who in turn are characterized as Information Processing Systems (IPS) performing symbolic manipulations (Newell and Simon 1972, pp. 4–13 and ch. 14). They engage in a structured search in the problem space and only the relevant aspects of the task environment are represented while structuring. The possibility of optimization or rational play is out of reach for such players in these complex problems. Unlike those agents in classical game theory who maximize or optimize, these agents can satisfice at best.Footnote 14 According to him, ‘The task is not to characterize optimality or substantive rationality, but to define strategies for finding good moves – procedural rationality’ (Simon and Schaeffer 1992). All of these render Simon’s research program a completely different character from that of von Neumann–Morgenstern approach.
In chess,the practical amount of computation feasible for players and the inability to exhaustively search the entire search space creates a wedge between the best moves in the shorter and longer horizons. The task of choosing a‘best move’ to respond to a given problem on the board involving a smaller search space may differ from strategy or choices which might be better when the entire course of the game, i.e., when larger search spaces (or horizons) are considered.Footnote 15 This is because the actual search space under consideration by the problem solver varies over different stages of the game. The oft-discussed distinction between well-structured and ill-structured problems can be understood in terms of this wedge. According to this view, the actual problems presented to agents are best seen as ill-structured problems that are continually transformed into well-structured problems. These transformations in structuring the problem representation and using relevant heuristics for a particular representation are at the core of problem solving (Simon 1973, p. 185–187). It is here that the notion of learning in Simon becomes evident:
If the continuing alteration of the problem representation is short-term and reversible, we generally call it “adaptation” or “feedback”, if the alteration is more or less permanent (e.g., revising the “laws of nature”), we refer to it as “learning”. Thus the robot modifies its problem representation temporarily by attending in turn to selected features of the environment; it modifies it more permanently by changing its conceptions of the structure of the external environment or the laws that govern it.
Borrowing from Miller, Simon postulated that learning involves an increasing complexity of informative cognitive representations, or chunks in the long-term memory. Since short-term memory is rather limited, the number of chunks that agents can handle at any given point in time cannot be more than a handful (seven plus or minus two according to Miller). A learned or an expert player is seen as employing processes that recall only a few, but increasingly complex chunks (or groups) from her long term memory. In addition to this, Newell and Simon also distinguish between adaptive changes in heuristics in the short and long-run. For them, learning is a change in the repertoire of heuristics itself and not just a change in specific heuristics that are actively guiding a search. Thus, learning from a Simonian standpoint can be seen as a mixture of (i) increasing nuance in structuring problems, (ii) the ability to group relevant information or knowledge into chunks in the long-term memory and (iii) reshaping the repertoire of heuristics that can be employed to chose a good move for the problem at hand (Simon 1979, p. 167). When Simon speaks about learning, note however that there is no reference to equilibrium of any sort or a presumed movement towards it.
It is quite evident that the approaches of classical game theory and Simon are drastically different. Taking the route of classical game theory, one would consider a simplified approximation of the actual game, and focus on a game-theoretic optimum for that approximation. Simon’s route was to depart more ‘from exhaustive minimax search in the approximation and use a variety of pattern-recognition and selective search strategies to seek satisfactory moves’(Simon and Schaeffer 1992, p.16). The second approach to game theory is inherently computational (procedural) and intimately related to the idea of bounded rationality and satisficing.
What is emerging, therefore, from research on games like chess, is a computational theory of games: a theory of what it is reasonable to do when it is impossible to determine what is best - a theory of bounded rationality. - (Simon and Schaeffer 1992, p.16)
Before we end this section, two remarks are in order: first, as Simon has clarified in various places, he concerned himself with procedures that agents use to solves problems at hand. In this goal-oriented problem-solving set up, agents who are boundedly rational should not be viewed against the benchmark of substantive rationality or utility maximization in economics. Bounded rationality is, in fact, a more general notion and a procedural theory that can sufficiently account for bounded rational behaviour can naturally accommodate substantive rationality, but not vice-versa.Footnote 16 They are best viewed as different approaches altogether. Second, considering the computational limitations of the decision maker alone does not make the approach Simonian since it is insufficient to understanding the nature and emergence of procedural rationality. For Simon, the complexity of the task environment and the practical limitations on the computational capabilities of the agent are equally important.Footnote 17