ECML 2003: Machine Learning: ECML 2003 pp 35-46 | Cite as
Abalearn: A Risk-Sensitive Approach to Self-play Learning in Abalone
Abstract
This paper presents Abalearn, a self-teaching Abalone program capable of automatically reaching an intermediate level of play without needing expert-labeled training examples, deep searches or exposure to competent play.
Our approach is based on a reinforcement learning algorithm that is risk-seeking, since defensive players in Abalone tend to never end a game.
We show that it is the risk-sensitivity that allows a successful self-play training. We also propose a set of features that seem relevant for achieving a good level of play.
We evaluate our approach using a fixed heuristic opponent as a benchmark, pitting our agents against human players online and comparing samples of our agents at different times of training.
Keywords
Reinforcement Learn Algorithm Search Depth Greedy Policy Training Game Chess ProgramReferences
- 1.Aichholzer, O., Aurenhammer, F., Werner, T.: Algorithmic fun: Abalone. Technical report, Institut for Theoretical Computer Science, Graz University of Technology (2002)Google Scholar
- 2.Baxter, J., Tridgell, A., Weaver, L.: Knightcap: a chess program that learns by combining TD(λ) with game-tree search. In: Proc. 15th International Conf. on Machine Learning, pp. 28–36. Morgan Kaufmann, San Francisco (1998)Google Scholar
- 3.Baxter, J., Tridgell, A., Weaver, L.: Learning to play chess using temporal differences. Machine Learning 40(3), 243–263 (2000)MATHCrossRefGoogle Scholar
- 4.Beal, D.F., Smith, M.C.: Temporal difference learning for heuristic search and game playing. Information Sciences 122(1), 3–21 (2000)CrossRefGoogle Scholar
- 5.Dahl, F.A.: Honte, a go-playing program using neural nets (1999)Google Scholar
- 6.Michie, D.: Experiments on the mechanization of game-learning – part i. characterization of the model and its parameters. The Computer Journal 6, 232–236 (1963)Google Scholar
- 7.Mihatsch, O., Neuneier, R.: Risk-sensitive reinforcement learning. Machine Learning 49, 267–290 (2002)MATHCrossRefGoogle Scholar
- 8.Pollack, J.B., Blair, A.D.: Co-evolution in the successful learning of backgammon strategy. Machine Learning 32(1), 225–240 (1998)MATHCrossRefGoogle Scholar
- 9.Samuel, A.: Some studies in machine learning using the game of checkers. IBM Journal of Research and Development 3(3), 211–229 (1959)CrossRefMathSciNetGoogle Scholar
- 10.Schaeffer, J., Hlynka, M., Jussila, V.: Temporal difference learning applied to a high-performance game-playing program. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pp. 529–534 (2001)Google Scholar
- 11.Schraudolph, N., Dayan, P., Sejnowski, T.J.: Temporal difference learning of position evaluation in the game of go. In: Advances in Neural Information Processing Systems, vol. 6, Morgan Kaufmann Publishers, Inc., San Francisco (1994)Google Scholar
- 12.Sutton, R.S.: Learning to predict by the methods of temporal differences. Machine Learning 3, 9–44 (1988)Google Scholar
- 13.Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction Reinforcement Reinforcement Learning: an Introduction, 1st edn. The MIT Press, Cambridge (1998)Google Scholar
- 14.Tesauro, G.: Practical issues in temporal difference learning. In: Moody, J.E., Hanson, S.J., Lippmann, R.P. (eds.) Advances in Neural Information Processing Systems, vol. 4 (1992)Google Scholar
- 15.Tesauro, G.: Td-gammon, a self-teaching backgammon program, achieves masterlevel play. In: Proceedings of the AAAI Fall Symposium on Intelligent Games: Planning and Learning, pp. 19–23. The AAAI Press, Menlo Park (1993)Google Scholar
- 16.Tesauro, G.: Temporal difference learning and td-gammon. Communications of the ACM 38(3), 58–68 (1995)CrossRefGoogle Scholar
- 17.Tesauro, G.: Comments on co-evolution in the successful learning of backgammon strategy. Machine Learning 32(3), 41–243 (1998)CrossRefGoogle Scholar
- 18.Tesauro, G.: Programming backgammon using self-teaching neural nets. Artificial Intelligence 134, 181–199 (2002)MATHCrossRefGoogle Scholar
- 19.Thrun, S.: Learning to play the game of chess. In: Tesauro, G., Touretzky, D., Leen, T. (eds.) Advances in Neural Information Processing Systems 7, pp. 1069–1076. The MIT Press, Cambridge (1995)Google Scholar
- 20.Yoshioka, T., Ishii, S., Ito, M.: Strategy acquisition for the game othello based on reinforcement learning. IEICE Transactions on Inf. and Syst. 12(E82 D) (December 1999)Google Scholar