An Efficient Training Strategy for a Temporal Difference Learning Based Tic-Tac-Toe Automatic Player

  • Jesús Fernández-CondeEmail author
  • Pedro Cuenca-Jiménez
  • José María Cañas
Conference paper
Part of the Lecture Notes in Networks and Systems book series (LNNS, volume 98)


Temporal Difference (TD) learning is a well-known technique used to train automatic players by self-play, in board games in which the number of possible states is relatively small. TD learning has been widely used due to its simplicity, but there are important issues that need to be addressed. Training the AI agent against a random player is not effective, as several millions of games are needed until the automatic player starts to play intelligently. On the other hand, training it against a perfect player is not an acceptable option due to exploratory concerns. In this paper we present an efficient training strategy for a TD-based automatic game player, which proves to outperform other techniques, needing only roughly two hundred thousand games of training to behave like a perfect player. We present the results obtained by simulation for the classic Tic-Tac-Toe game.


Reinforcement Learning Temporal difference learning Computer board games AI efficient training 


  1. 1.
    Sutton, R., Barto, A.: Reinforcement Learning: An Introduction, vol. 156, pp. 10–15. MIT Press, Cambridge (1998)zbMATHGoogle Scholar
  2. 2.
    Samuel, A.: Some studies in machine learning using the game of checkers. IBM J. Res. Dev. 3(3), 210–229 (1959)MathSciNetCrossRefGoogle Scholar
  3. 3.
    Tesauro G.: Temporal difference learning of backgammon strategy. In: Proceedings of the 9th International Workshop on Machine learning, pp. 451–457. Morgan Kaufmann Publishers Inc. (1992)Google Scholar
  4. 4.
    Tesauro, G.: Practical issues in temporal difference learning: Reinforcement Learning, pp. 33–53. Springer (1992)Google Scholar
  5. 5.
    Konen, W.: Reinforcement Learning for Board Games: The Temporal Difference Algorithm (2015).
  6. 6.
    Gatti, C.J., Embrechts, M.J., Linton, J.D.: Reinforcement learning and the effects of parameter settings in the game of Chung Toi. In: 2011 IEEE International Conference on Systems, Man, and Cybernetics, pp. 3530–3535, Anchorage, AK (2011).
  7. 7.
    Russell, S., Norvig, P.: Artificial Intelligence: A Modern Approach, 3rd edn. Prentice Hall Press, Upper Saddle River (2009). 0136042597 9780136042594zbMATHGoogle Scholar
  8. 8.
    Gherrity, M.: A game-learning machine. Ph.D. Dissertation, University of California, San Diego (1993)Google Scholar
  9. 9.
    Schraudolph, N., Dayan, P., Sejnowski, T.: Using the TD(λ) algorithm to learn an evaluation function for the game of go. In: Advances in Neural Information Processing Systems, vol. 6 (1994)Google Scholar
  10. 10.
    Szubert, M., Jaskowski, W., Krawiec, K.: Coevolutionary temporal difference learning for Othello. In: Proceedings of 5th International Conference on Computational Intelligence and Games (CIG 2009), pp. 104–111. IEEE Press, Piscataway (2009)Google Scholar
  11. 11.
    Krawiec, K., Szubert, M.: Learning n-tuple networks for Othello by coevolutionary gradient search. In: Proceedings of GECCO 2011, Dublin, pp. 355–362. ACM, New York (2011)Google Scholar
  12. 12.
    Lucas, S.M.: Learning to play Othello with n-tuple systems. Aust. J. Intell. Inf. Process. 4, 1–20 (2008)Google Scholar
  13. 13.
    Thill, M., Bagheri, S., Koch, P., Konen, W.: Temporal difference learning with eligibility traces for the game connect-4. In: IEEE Conference on Computational Intelligence and Games (CIG), Dortmund (2014)Google Scholar
  14. 14.
    Van De Steeg, M., Drugan, M.M., Wiering, M.: Temporal difference learning for the game Tic-Tac-Toe 3D: applying structure to neural networks. In: 2015 IEEE Symposium Series on Computational Intelligence Cape Town, pp. 564–570 (2015).
  15. 15.
    Baum, P.: Tic-Tac-Toe, Thesis for the Master of Science Degree, Computer Science Department, Southern Illinois University (1975)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  • Jesús Fernández-Conde
    • 1
    Email author
  • Pedro Cuenca-Jiménez
    • 1
  • José María Cañas
    • 1
  1. 1.GSyC Department (ETSIT)Universidad Rey Juan CarlosFuenlabrada, MadridSpain

Personalised recommendations