Abstract
Recently, Szubert and Jaskowski successfully used TD learning together with n-tuple networks for playing the game 2048. In this paper, we first improve their result by modifying the n-tuple networks. However, we observe a phenomenon that the programs based on TD learning still hardly reach large tiles, such as 32768-tiles (the tiles with value 32768). In this paper, we propose a new learning method, named multi-stage TD learning, to effectively improve the performance, especially for maximum scores and the reaching ratio of 32768-tiles. After incorporating shallow expectimax search, our 2048 program can reach 32768-tiles with probability 10.9%, and obtain the maximum score 605752 and the averaged score 328946. To the best of our knowledge, our program outperforms all the known 2048 programs up to date, except for the program developed by the programmers, nicknamed nneonneo and xificurk, which heavily relies on deep search heuristics tuned manually. The program can reach 32768-tiles with probability 32%, but ours runs about 100 times faster. Also interestingly, our new learning method can be easily applied to other 2048-like games, such as Threes. Our program for Threes outperforms all the known Threes programs up to date.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ballard, B.W.: The *-Minimax Search Procedure for Trees Containing Chance Nodes. Artificial Intelligence 21, 327–350 (1983)
Baxter, J., Tridgell, A., Weaver, L.: Learning to Play Chess Using Temporal Differences. Machine Learning 40(3), 243–263 (2000)
Beal, D.F., Smith, M.C.: First Results from Using Temporal Difference Learning in Shogi. In: van den Herik, H.J., Iida, H. (eds.) CG 1998. LNCS, vol. 1558, pp. 113–125. Springer, Heidelberg (1999)
Buro, M.: Experiments with Multi-ProbCut and a New High-Quality Evaluation Function for Othello. Games in AI Research, 77–96 (1997)
Game 1024, http://1024game.org/
Game Threes!, http://asherv.com/threes/
Game 2048, http://gabrielecirulli.github.io/2048/
Knuth, D.E., Moore, R.W.: An analysis of alpha-beta pruning. Artificial Intelligence 6, 293–326 (1975)
Melko, E., Nagy, B.: Optimal Strategy in games with chance nodes. Acta Cybernetica 18(2), 171–192 (2007)
Nneonneo and xificurk (nicknames), Improved algorithm reaching 32k tile, https://github.com/nneonneo/2048-ai/pull/27
Overlan, M.: 2048 AI, http://ov3y.github.io/2048-AI/
Pearl, J.: The solution for the branching factor of the alpha-beta pruning algorithm and its optimality. Communications of ACM 25(8), 559–564 (1982)
Schaeffer, J., Hlynka, M., Jussila, V.: Temporal Difference Learning Applied to a High-Performance Game-Playing Program. In: Proceedings of the 17th International Joint Conference on Artificial Intelligence, pp. 529–534 (August 2001)
Silver, D.: Reinforcement Learning and Simulation-Based Search in Computer Go, Ph.D. Dissertation, Dept. Comput. Sci., Univ. Alberta, Edmonton, AB, Canada (2009)
StackOverflow.: What is the optimal algorithm for the game, 2048?, http://stackoverflow.com/questions/22342854/what-is-the-optimal-algorithm-for-the-game-2048/22674149#22674149
Sutton, R.S., Barto, A.G.: Temporal-Difference Learning, An Introduction to Reinforcement Learning. MIT Press, Cambridge (1998)
Szubert, M., Jaskowaski, W.: Temporal Difference Learning of N-tuple Networks for the Game 2048. In: IEEE CIG 2014 Conference (August 2014)
Taiwan 2048-bot, http://2048-botcontest.twbbs.org/
Tesauro, G.: TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play. Neural Computation 6, 215–219 (1994)
Trinh, T., Bashi, A., Deshpande, N.: Temporal Difference Learning in Chinese Chess. In: Tasks and Methods in Applied Artificial Intelligence, pp. 612–618 (1998)
Wu, K.C.: 2048-c, https://github.com/kcwu/2048-c/
Wu, I.-C., Tsai, H.-T., Lin, H.-H., Lin, Y.-S., Chang, C.-M., Lin, P.-H.: Temporal Difference Learning for Connect6. In: van den Herik, H.J., Plaat, A. (eds.) ACG 2011. LNCS, vol. 7168, pp. 121–133. Springer, Heidelberg (2012)
Zobrist, A.L.: A New Hashing Method With Application For Game Playing. Technical Report #88 (April 1970)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Wu, IC., Yeh, KH., Liang, CC., Chang, CC., Chiang, H. (2014). Multi-Stage Temporal Difference Learning for 2048. In: Cheng, SM., Day, MY. (eds) Technologies and Applications of Artificial Intelligence. TAAI 2014. Lecture Notes in Computer Science(), vol 8916. Springer, Cham. https://doi.org/10.1007/978-3-319-13987-6_34
Download citation
DOI: https://doi.org/10.1007/978-3-319-13987-6_34
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-13986-9
Online ISBN: 978-3-319-13987-6
eBook Packages: Computer ScienceComputer Science (R0)