Multi-Stage Temporal Difference Learning for 2048

Wu, I-Chen; Yeh, Kun-Hao; Liang, Chao-Chin; Chang, Chia-Chuan; Chiang, Han

doi:10.1007/978-3-319-13987-6_34

I-Chen Wu²¹,
Kun-Hao Yeh²¹,
Chao-Chin Liang²¹,
Chia-Chuan Chang²¹ &
…
Han Chiang²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8916))

Included in the following conference series:

International Conference on Technologies and Applications of Artificial Intelligence

1762 Accesses
8 Citations
1 Altmetric

Abstract

Recently, Szubert and Jaskowski successfully used TD learning together with n-tuple networks for playing the game 2048. In this paper, we first improve their result by modifying the n-tuple networks. However, we observe a phenomenon that the programs based on TD learning still hardly reach large tiles, such as 32768-tiles (the tiles with value 32768). In this paper, we propose a new learning method, named multi-stage TD learning, to effectively improve the performance, especially for maximum scores and the reaching ratio of 32768-tiles. After incorporating shallow expectimax search, our 2048 program can reach 32768-tiles with probability 10.9%, and obtain the maximum score 605752 and the averaged score 328946. To the best of our knowledge, our program outperforms all the known 2048 programs up to date, except for the program developed by the programmers, nicknamed nneonneo and xificurk, which heavily relies on deep search heuristics tuned manually. The program can reach 32768-tiles with probability 32%, but ours runs about 100 times faster. Also interestingly, our new learning method can be easily applied to other 2048-like games, such as Threes. Our program for Threes outperforms all the known Threes programs up to date.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ballard, B.W.: The *-Minimax Search Procedure for Trees Containing Chance Nodes. Artificial Intelligence 21, 327–350 (1983)
Article MATH Google Scholar
Baxter, J., Tridgell, A., Weaver, L.: Learning to Play Chess Using Temporal Differences. Machine Learning 40(3), 243–263 (2000)
Article MATH Google Scholar
Beal, D.F., Smith, M.C.: First Results from Using Temporal Difference Learning in Shogi. In: van den Herik, H.J., Iida, H. (eds.) CG 1998. LNCS, vol. 1558, pp. 113–125. Springer, Heidelberg (1999)
Chapter Google Scholar
Buro, M.: Experiments with Multi-ProbCut and a New High-Quality Evaluation Function for Othello. Games in AI Research, 77–96 (1997)
Google Scholar
Game 1024, http://1024game.org/
Game Threes!, http://asherv.com/threes/
Game 2048, http://gabrielecirulli.github.io/2048/
Knuth, D.E., Moore, R.W.: An analysis of alpha-beta pruning. Artificial Intelligence 6, 293–326 (1975)
Article MATH MathSciNet Google Scholar
Melko, E., Nagy, B.: Optimal Strategy in games with chance nodes. Acta Cybernetica 18(2), 171–192 (2007)
MATH MathSciNet Google Scholar
Nneonneo and xificurk (nicknames), Improved algorithm reaching 32k tile, https://github.com/nneonneo/2048-ai/pull/27
Overlan, M.: 2048 AI, http://ov3y.github.io/2048-AI/
Pearl, J.: The solution for the branching factor of the alpha-beta pruning algorithm and its optimality. Communications of ACM 25(8), 559–564 (1982)
Article MATH MathSciNet Google Scholar
Schaeffer, J., Hlynka, M., Jussila, V.: Temporal Difference Learning Applied to a High-Performance Game-Playing Program. In: Proceedings of the 17th International Joint Conference on Artificial Intelligence, pp. 529–534 (August 2001)
Google Scholar
Silver, D.: Reinforcement Learning and Simulation-Based Search in Computer Go, Ph.D. Dissertation, Dept. Comput. Sci., Univ. Alberta, Edmonton, AB, Canada (2009)
Google Scholar
StackOverflow.: What is the optimal algorithm for the game, 2048?, http://stackoverflow.com/questions/22342854/what-is-the-optimal-algorithm-for-the-game-2048/22674149#22674149
Sutton, R.S., Barto, A.G.: Temporal-Difference Learning, An Introduction to Reinforcement Learning. MIT Press, Cambridge (1998)
Google Scholar
Szubert, M., Jaskowaski, W.: Temporal Difference Learning of N-tuple Networks for the Game 2048. In: IEEE CIG 2014 Conference (August 2014)
Google Scholar
Taiwan 2048-bot, http://2048-botcontest.twbbs.org/
Tesauro, G.: TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play. Neural Computation 6, 215–219 (1994)
Article Google Scholar
Trinh, T., Bashi, A., Deshpande, N.: Temporal Difference Learning in Chinese Chess. In: Tasks and Methods in Applied Artificial Intelligence, pp. 612–618 (1998)
Google Scholar
Wu, K.C.: 2048-c, https://github.com/kcwu/2048-c/
Wu, I.-C., Tsai, H.-T., Lin, H.-H., Lin, Y.-S., Chang, C.-M., Lin, P.-H.: Temporal Difference Learning for Connect6. In: van den Herik, H.J., Plaat, A. (eds.) ACG 2011. LNCS, vol. 7168, pp. 121–133. Springer, Heidelberg (2012)
Chapter Google Scholar
Zobrist, A.L.: A New Hashing Method With Application For Game Playing. Technical Report #88 (April 1970)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, National Chiao Tung University, Hsinchu, Taiwan
I-Chen Wu, Kun-Hao Yeh, Chao-Chin Liang, Chia-Chuan Chang & Han Chiang

Authors

I-Chen Wu
View author publications
You can also search for this author in PubMed Google Scholar
Kun-Hao Yeh
View author publications
You can also search for this author in PubMed Google Scholar
Chao-Chin Liang
View author publications
You can also search for this author in PubMed Google Scholar
Chia-Chuan Chang
View author publications
You can also search for this author in PubMed Google Scholar
Han Chiang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Information Engineering, National Taiwan University of Science and Technology, No. 43, Sec. 4, Keelung Rd., Da’an Dist., 106, Taipei City, Taiwan
Shin-Ming Cheng
Department of Information Management, Tamkang University, No. 151, Yingzhuan Rd., Danshui Dist., 25137, New Taipei City, Taiwan
Min-Yuh Day

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wu, IC., Yeh, KH., Liang, CC., Chang, CC., Chiang, H. (2014). Multi-Stage Temporal Difference Learning for 2048. In: Cheng, SM., Day, MY. (eds) Technologies and Applications of Artificial Intelligence. TAAI 2014. Lecture Notes in Computer Science(), vol 8916. Springer, Cham. https://doi.org/10.1007/978-3-319-13987-6_34

Download citation

DOI: https://doi.org/10.1007/978-3-319-13987-6_34
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-13986-9
Online ISBN: 978-3-319-13987-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics