Reinforcement Learning in Games

Part of the Adaptation, Learning, and Optimization book series (ALO, volume 12)


Reinforcement learning and games have a long and mutually beneficial common history. From one side, games are rich and challenging domains for testing reinforcement learning algorithms. From the other side, in several games the best computer players use reinforcement learning. The chapter begins with a selection of games and notable reinforcement learning implementations.Without any modifications, the basic reinforcement learning algorithms are rarely sufficient for high-level gameplay, so it is essential to discuss the additional ideas, ways of inserting domain knowledge, implementation decisions that are necessary for scaling up. These are reviewed in sufficient detail to understand their potentials and their limitations. The second part of the chapter lists challenges for reinforcement learning in games, together with a review of proposed solution methods. While this listing has a game-centric viewpoint, and some of the items are specific to games (like opponent modelling), a large portion of this overview can provide insight for other kinds of applications, too. In the third part we review how reinforcement learning can be useful in game development and find its way into commercial computer games. Finally, we provide pointers for more in-depth reviews of specific games and solution approaches.


Computer Game Reinforcement Learning Reinforcement Learning Algorithm Temporal Difference Learning Reinforcement Learning Approach 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Aha, D.W., Molineaux, M., Ponsen, M.: Learning to win: Case-based plan selection in a real-time strategy game. Case-Based Reasoning Research and Development, 5–20 (2005)Google Scholar
  2. Amit, A., Markovitch, S.: Learning to bid in bridge. Machine Learning 63(3), 287–327 (2006)CrossRefGoogle Scholar
  3. Andrade, G., Santana, H., Furtado, A., Leitão, A., Ramalho, G.: Online adaptation of computer games agents: A reinforcement learning approach. Scientia 15(2) (2004)Google Scholar
  4. Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Machine Learning 47, 235–256 (2002)CrossRefGoogle Scholar
  5. Bartók, G., Szepesvári, C., Zilles, S.: Models of active learning in group-structured state spaces. Information and Computation 208, 364–384 (2010)MathSciNetCrossRefGoogle Scholar
  6. Baxter, J., Tridgell, A., Weaver, L.: Learning to play chess using temporal-differences. Machine learning 40(3), 243–263 (2000)CrossRefGoogle Scholar
  7. Baxter, J., Tridgell, A., Weaver, L.: Reinforcement learning and chess. In: Machines that learn to play games, pp. 91–116. Nova Science Publishers, Inc. (2001)Google Scholar
  8. Beal, D., Smith, M.C.: Learning piece values using temporal differences. ICCA Journal 20(3), 147–151 (1997)Google Scholar
  9. Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-Dynamic Programming. Athena Scientific (1996)Google Scholar
  10. Billings, D., Davidson, A., Schauenberg, T., Burch, N., Bowling, M., Holte, R.C., Schaeffer, J., Szafron, D.: Game-Tree Search with Adaptation in Stochastic Imperfect-Information Games. In: van den Herik, H.J., Björnsson, Y., Netanyahu, N.S. (eds.) CG 2004. LNCS, vol. 3846, pp. 21–34. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  11. Björnsson, Y., Finnsson, H.: Cadiaplayer: A simulation-based general game player. IEEE Transactions on Computational Intelligence and AI in Games 1(1), 4–15 (2009)CrossRefGoogle Scholar
  12. Böhm, N., Kókai, G., Mandl, S.: Evolving a heuristic function for the game of tetris. In: Proc. Lernen, Wissensentdeckung und Adaptivität LWA, pp. 118–122 (2004)Google Scholar
  13. Boumaza, A.: On the evolution of artificial Tetris players. In: IEEE Symposium on Computational Intelligence and Games (2009)Google Scholar
  14. Bouzy, B., Helmstetter, B.: Monte Carlo Go developments. In: Advances in Computer Games, pp. 159–174 (2003)Google Scholar
  15. Bowling, M.: Convergence and no-regret in multiagent learning. In: Neural Information Processing Systems, pp. 209–216 (2004)Google Scholar
  16. Buro, M.: From simple features to sophisticated evaluation functions. In: International Conference on Computers and Games, pp. 126–145 (1998)Google Scholar
  17. Buro, M., Furtak, T.: RTS games as test-bed for real-time research. JCIS, 481–484 (2003)Google Scholar
  18. Buro, M., Lanctot, M., Orsten, S.: The second annual real-time strategy game AI competition. In: GAME-ON NA (2007)Google Scholar
  19. Chaslot, G., Winands, M., Herik, H., Uiterwijk, J., Bouzy, B.: Progressive strategies for monte-carlo tree search. New Mathematics and Natural Computation 4(3), 343 (2008)MathSciNetCrossRefGoogle Scholar
  20. Chaslot, G., Fiter, C., Hoock, J.B., Rimmel, A., Teytaud, O.: Adding Expert Knowledge and Exploration in Monte-Carlo Tree Search. In: van den Herik, H.J., Spronck, P. (eds.) ACG 2009. LNCS, vol. 6048, pp. 1–13. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  21. Chatriot, L., Gelly, S., Jean-Baptiste, H., Perez, J., Rimmel, A., Teytaud, O.: Including expert knowledge in bandit-based Monte-Carlo planning, with application to computer-Go. In: European Workshop on Reinforcement Learning (2008)Google Scholar
  22. Coquelin, P.A., Munos, R.: Bandit algorithms for tree search. In: Uncertainty in Artificial Intelligence (2007)Google Scholar
  23. Coulom, R.: Efficient Selectivity and Backup Operators in Monte-carlo Tree Search. In: van den Herik, H.J., Ciancarini, P., Donkers, H.H.L.M(J.) (eds.) CG 2006. LNCS, vol. 4630, pp. 72–83. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  24. Coulom, R.: Computing Elo ratings of move patterns in the game of go. ICGA Journal 30(4), 198–208 (2007)Google Scholar
  25. Dahl, F.A.: Honte, a Go-playing program using neural nets. In: Machines that learn to play games, pp. 205–223. Nova Science Publishers (2001)Google Scholar
  26. Davidson, A.: Opponent modeling in poker: Learning and acting in a hostile and uncertain environment. Master’s thesis, University of Alberta (2002)Google Scholar
  27. Diuk, C., Cohen, A., Littman, M.L.: An object-oriented representation for efficient reinforcement learning. In: International Conference on Machine Learning, pp. 240–247 (2008)Google Scholar
  28. Droste, S., Fürnkranz, J.: Learning of piece values for chess variants. Tech. Rep. TUD–KE–2008-07, Knowledge Engineering Group, TU Darmstadt (2008)Google Scholar
  29. Džeroski, S., Raedt, L.D., Driessens, K.: Relational reinforcement learning. Machine Learning 43(1-2), 7–52 (2001)CrossRefGoogle Scholar
  30. Epstein, S.L.: Toward an ideal trainer. Machine Learning 15, 251–277 (1994)Google Scholar
  31. Farias, V.F., van Roy, B.: Tetris: A Study of Randomized Constraint Sampling. In: Probabilistic and Randomized Methods for Design Under Uncertainty. Springer, UK (2006)Google Scholar
  32. Fawcett, T., Utgoff, P.: Automatic feature generation for problem solving systems. In: International Conference on Machine Learning, pp. 144–153 (1992)Google Scholar
  33. Finkelstein, L., Markovitch, S.: Learning to play chess selectively by acquiring move patterns. ICCA Journal 21, 100–119 (1998)Google Scholar
  34. Fudenberg, D., Levine, D.K.: The theory of learning in games. MIT Press (1998)Google Scholar
  35. Fürnkranz, J.: Machine learning in games: a survey. In: Machines that Learn to Play Games, pp. 11–59. Nova Science Publishers (2001)Google Scholar
  36. Fürnkranz, J.: Recent advances in machine learning and game playing. Tech. rep., TU Darmstadt (2007)Google Scholar
  37. Galway, L., Charles, D., Black, M.: Machine learning in digital games: a survey. Artificial Intelligence Review 29(2), 123–161 (2008)CrossRefGoogle Scholar
  38. Gelly, S., Silver, D.: Achieving master-level play in 9x9 computer go. In: AAAI, pp. 1537–1540 (2008)Google Scholar
  39. Gelly, S., Wang, Y., Munos, R., Teytaud, O.: Modification of UCT with patterns in Monte-Carlo go. Tech. rep., INRIA (2006)Google Scholar
  40. Gherrity, M.: A game-learning machine. PhD thesis, University of California, San Diego, CA (1993)Google Scholar
  41. Ghory, I.: Reinforcement learning in board games. Tech. rep., Department of Computer Science, University of Bristol (2004)Google Scholar
  42. Gilgenbach, M.: Fun game AI design for beginners. In: AI Game Programming Wisdom, vol. 3. Charles River Media, Inc. (2006)Google Scholar
  43. Gilpin, A., Sandholm, T.: Lossless abstraction of imperfect information games. Journal of the ACM 54(5), 25 (2007)MathSciNetCrossRefGoogle Scholar
  44. Gilpin, A., Sandholm, T., Sørensen, T.B.: Potential-aware automated abstraction of sequential games, and holistic equilibrium analysis of Texas Hold’em poker. In: AAAI, vol. 22, pp. 50–57 (2007)Google Scholar
  45. Ginsberg, M.L.: Gib: Imperfect information in a computationally challenging game. Journal of Artificial Intelligence Research 14, 313–368 (2002)Google Scholar
  46. Gould, J., Levinson, R.: Experience-based adaptive search. Tech. Rep. UCSC-CRL-92-10, University of California at Santa Cruz (1992)Google Scholar
  47. Günther, M.: Automatic feature construction for general game playing. PhD thesis, Dresden University of Technology (2008)Google Scholar
  48. Hagelbäck, J., Johansson, S.J.: Measuring player experience on runtime dynamic difficulty scaling in an RTS game. In: International Conference on Computational Intelligence and Games (2009)Google Scholar
  49. Hartley, T., Mehdi, Q., Gough, N.: Online learning from observation for interactive computer games. In: International Conference on Computer Games: Artificial Intelligence and Mobile Systems, pp. 27–30 (2005)Google Scholar
  50. van den Herik, H.J., Uiterwijk, J.W.H.M., van Rijswijck, J.: Games solved: Now and in the future. Artificial Intelligence 134, 277–311 (2002)CrossRefGoogle Scholar
  51. Hsu, F.H.: Behind Deep Blue: Building the Computer that Defeated the World Chess Champion. Princeton University Press, Princeton (2002)Google Scholar
  52. Hunicke, R., Chapman, V.: AI for dynamic difficult adjustment in games. In: Challenges in Game AI Workshop (2004)Google Scholar
  53. Kakade, S.: A natural policy gradient. In: Advances in Neural Information Processing Systems, vol. 14, pp. 1531–1538 (2001)Google Scholar
  54. Kalles, D., Kanellopoulos, P.: On verifying game designs and playing strategies using reinforcement learning. In: ACM Symposium on Applied Computing, pp. 6–11 (2001)Google Scholar
  55. Kerbusch, P.: Learning unit values in Wargus using temporal differences. BSc thesis (2005)Google Scholar
  56. Kocsis, L., Szepesvári, C.: Bandit Based Monte-Carlo Planning. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 282–293. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  57. Kocsis, L., Szepesvári, C., Winands, M.H.M.: RSPSA: Enhanced Parameter Optimization in Games. In: van den Herik, H.J., Hsu, S.-C., Hsu, T.-s., Donkers, H.H.L.M(J.) (eds.) CG 2005. LNCS, vol. 4250, pp. 39–56. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  58. Kok, E.: Adaptive reinforcement learning agents in RTS games. Master’s thesis, University of Utrecht, The Netherlands (2008)Google Scholar
  59. Koza, J.: Genetic programming: on the programming of computers by means of natural selection. MIT Press (1992)Google Scholar
  60. Kuhlmann, G.J.: Automated domain analysis and transfer learning in general game playing. PhD thesis, University of Texas at Austin (2010)Google Scholar
  61. Lagoudakis, M.G., Parr, R., Littman, M.L.: Least-Squares Methods in Reinforcement Learning for Control. In: Vlahavas, I.P., Spyropoulos, C.D. (eds.) SETN 2002. LNCS (LNAI), vol. 2308, pp. 249–260. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  62. Laursen, R., Nielsen, D.: Investigating small scale combat situations in real time strategy computer games. Master’s thesis, University of Aarhus (2005)Google Scholar
  63. Levinson, R., Weber, R.: Chess Neighborhoods, Function Combination, and Reinforcement Learning. In: Marsland, T., Frank, I. (eds.) CG 2001. LNCS, vol. 2063, pp. 133–150. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  64. Lorenz, U.: Beyond Optimal Play in Two-Person-Zerosum Games. In: Albers, S., Radzik, T. (eds.) ESA 2004. LNCS, vol. 3221, pp. 749–759. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  65. Mańdziuk, J.: Knowledge-Free and Learning-Based Methods in Intelligent Game Playing. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  66. Marthi, B., Russell, S., Latham, D.: Writing Stratagus-playing agents in concurrent alisp. In: IJCAI Workshop on Reasoning, Representation, and Learning in Computer Games, pp. 67–71 (2005)Google Scholar
  67. McGlinchey, S.J.: Learning of AI players from game observation data. In: GAME-ON, pp. 106–110 (2003)Google Scholar
  68. Molineaux, M., Aha, D.W., Ponsen, M.: Defeating novel opponents in a real-time strategy game. In: IJCAI Workshop on Reasoning, Representation, and Learning in Computer Games, pp. 72–77 (2005)Google Scholar
  69. Moriarty, D.E., Miikkulainen, R.: Discovering complex Othello strategies through evolutionary neural networks. Connection Science 7, 195–209 (1995)Google Scholar
  70. Müller, M.: Position evaluation in computer go. ICGA Journal 25(4), 219–228 (2002)Google Scholar
  71. Naddaf, Y.: Game-independent AI agents for playing Atari 2600 console games. Master’s thesis, University of Alberta (2010)Google Scholar
  72. Pollack, J.B., Blair, A.D.: Why did TD-Gammon work? In: Neural Information Processing Systems, vol. 9, pp. 10–16 (1997)Google Scholar
  73. Ponsen, M., Spronck, P.: Improving adaptive game AI with evolutionary learning. In: Computer Games: Artificial Intelligence, Design and Education (2004)Google Scholar
  74. Ponsen, M., Muñoz-Avila, H., Spronck, P., Aha, D.W.: Automatically acquiring adaptive real-time strategy game opponents using evolutionary learning. In: Proceedings of the 17th Innovative Applications of Artificial Intelligence Conference (2005)Google Scholar
  75. Ponsen, M., Spronck, P., Tuyls, K.: Hierarchical reinforcement learning in computer games. In: Adaptive Learning Agents and Multi-Agent Systems, pp. 49–60 (2006)Google Scholar
  76. Ponsen, M., Taylor, M.E., Tuyls, K.: Abstraction and Generalization in Reinforcement Learning: A Summary and Framework. In: Taylor, M.E., Tuyls, K. (eds.) ALA 2009. LNCS, vol. 5924, pp. 1–33. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  77. Ramanujan, R., Sabharwal, A., Selman, B.: Adversarial search spaces and sampling-based planning. In: International Conference on Automated Planning and Scheduling (2010)Google Scholar
  78. Risk, N., Szafron, D.: Using counterfactual regret minimization to create competitive multiplayer poker agents. In: International Conference on Autonomous Agents and Multiagent Systems, pp. 159–166 (2010)Google Scholar
  79. Rubin, J., Watson, I.: Computer poker: A review. Artificial Intelligence 175(5-6), 958–987 (2011)MathSciNetCrossRefGoogle Scholar
  80. Schaeffer, J.: The games computers (and people) play. In: Zelkowitz, M. (ed.) Advances in Computers, vol. 50, pp. 89–266. Academic Press (2000)Google Scholar
  81. Schaeffer, J., Hlynka, M., Jussila, V.: Temporal difference learning applied to a high-performance game-playing program. In: International Joint Conference on Artificial Intelligence, pp. 529–534 (2001)Google Scholar
  82. Schnizlein, D., Bowling, M., Szafron, D.: Probabilistic state translation in extensive games with large action sets. In: International Joint Conference on Artificial Intelligence, pp. 278–284 (2009)Google Scholar
  83. Schraudolph, N.N., Dayan, P., Sejnowski, T.J.: Learning to evaluate go positions via temporal difference methods. In: Computational Intelligence in Games. Studies in Fuzziness and Soft Computing, ch. 4, vol. 62, pp. 77–98. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  84. Scott, B.: The illusion of intelligence. In: AI Game Programming Wisdom, pp. 16–20. Charles River Media (2002)Google Scholar
  85. Shapiro, A., Fuchs, G., Levinson, R.: Learning a Game Strategy Using Pattern-Weights and Self-Play. In: Schaeffer, J., Müller, M., Björnsson, Y. (eds.) CG 2002. LNCS, vol. 2883, pp. 42–60. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  86. Sharifi, A.A., Zhao, R., Szafron, D.: Learning companion behaviors using reinforcement learning in games. In: AIIDE (2010)Google Scholar
  87. Sharma, S., Kobti, Z., Goodwin, S.: General game playing: An overview and open problems. In: International Conference on Computing, Engineering and Information, pp. 257–260 (2009)Google Scholar
  88. Silver, D., Tesauro, G.: Monte-carlo simulation balancing. In: International Conference on Machine Learning (2009)Google Scholar
  89. Silver, D., Sutton, R., Mueller, M.: Sample-based learning and search with permanent and transient memories. In: ICML (2008)Google Scholar
  90. Spronck, P., Sprinkhuizen-Kuyper, I., Postma, E.: Difficulty scaling of game AI. In: GAME-ON 2004: 5th International Conference on Intelligent Games and Simulation (2004)Google Scholar
  91. Spronck, P., Ponsen, M., Sprinkhuizen-Kuyper, I., Postma, E.: Adaptive game AI with dynamic scripting. Machine Learning 63(3), 217–248 (2006)CrossRefGoogle Scholar
  92. Stanley, K.O., Bryant, B.D., Miikkulainen, R.: Real-time neuroevolution in the NERO video game. IEEE Transactions on Evolutionary Computation 9(6), 653–668 (2005)CrossRefGoogle Scholar
  93. Sturtevant, N., White, A.: Feature construction for reinforcement learning in Hearts. In: Advances in Computers and Games, pp. 122–134 (2007)Google Scholar
  94. Szczepański, T., Aamodt, A.: Case-based reasoning for improved micromanagement in real-time strategy games. In: Workshop on Case-Based Reasoning for Computer Games, 8th International Conference on Case-Based Reasoning, pp. 139–148 (2009)Google Scholar
  95. Szita, I., Lőrincz, A.: Learning Tetris using the noisy cross-entropy method. Neural Computation 18(12), 2936–2941 (2006a)CrossRefGoogle Scholar
  96. Szita, I., Lőrincz, A.: Learning to play using low-complexity rule-based policies: Illustrations through Ms. Pac-Man. Journal of Articial Intelligence Research 30, 659–684 (2006b)Google Scholar
  97. Szita, I., Szepesvári, C.: Sz-tetris as a benchmark for studying key problems of rl. In: ICML 2010 Workshop on Machine Learning and Games (2010)Google Scholar
  98. Szita, I., Chaslot, G., Spronck, P.: Monte-Carlo Tree Search in Settlers of Catan. In: van den Herik, H.J., Spronck, P. (eds.) ACG 2009. LNCS, vol. 6048, pp. 21–32. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  99. Tesauro, G.: Practical issues in temporal difference learning. Machine Learning 8, 257–277 (1992)Google Scholar
  100. Tesauro, G.: Temporal difference learning and TD-gammon. Communications of the ACM 38(3), 58–68 (1995)CrossRefGoogle Scholar
  101. Tesauro, G.: Comments on co-evolution in the successful learning of backgammon strategy’. Machine Learning 32(3), 241–243 (1998)CrossRefGoogle Scholar
  102. Tesauro, G.: Programming backgammon using self-teaching neural nets. Artificial Intelligence 134(1-2), 181–199 (2002)CrossRefGoogle Scholar
  103. Thiery, C., Scherrer, B.: Building controllers for Tetris. ICGA Journal 32(1), 3–11 (2009)Google Scholar
  104. Thrun, S.: Learning to play the game of chess. In: Neural Information Processing Systems, vol. 7, pp. 1069–1076 (1995)Google Scholar
  105. Utgoff, P.: Feature construction for game playing. In: Fürnkranz, J., Kubat, M. (eds.) Machines that Learn to Play Games, pp. 131–152. Nova Science Publishers (2001)Google Scholar
  106. Utgoff, P., Precup, D.: Constructive function approximation. In: Liu, H., Motoda, H. (eds.) Feature Extraction, Construction and Selection: A Data Mining Perspective, vol. 453, pp. 219–235. Kluwer Academic Publishers (1998)Google Scholar
  107. Veness, J., Silver, D., Uther, W., Blair, A.: Bootstrapping from game tree search. In: Neural Information Processing Systems, vol. 22, pp. 1937–1945 (2009)Google Scholar
  108. Weber, B.G., Mateas, M.: Case-based reasoning for build order in real-time strategy games. In: Artificial Intelligence and Interactive Digital Entertainment, pp. 1313–1318 (2009)Google Scholar
  109. Wender, S., Watson, I.: Using reinforcement learning for city site selection in the turn-based strategy game Civilization IV. In: Computational Intelligence and Games, pp. 372–377 (2009)Google Scholar
  110. Wiering, M.A.: Self-play and using an expert to learn to play backgammon with temporal difference learning. Journal of Intelligent Learning Systems and Applications 2, 57–68 (2010)CrossRefGoogle Scholar
  111. Zinkevich, M., Johanson, M., Bowling, M., Piccione, C.: Regret minimization in games with incomplete information. In: Neural Information Processing Systems, pp. 1729–1736 (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  1. 1.University of AlbertaAlbertaCanada

Personalised recommendations