Comparison Training of Shogi Evaluation Functions with Self-Generated Training Positions and Moves

Ura, Akira; Miwa, Makoto; Tsuruoka, Yoshimasa; Chikayama, Takashi

doi:10.1007/978-3-319-09165-5_18

Akira Ura¹⁸,
Makoto Miwa¹⁹,
Yoshimasa Tsuruoka¹⁸ &
…
Takashi Chikayama¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8427))

Included in the following conference series:

International Conference on Computers and Games

1040 Accesses
2 Citations

Abstract

Automated tuning of parameters in computer game playing is an important technique for building strong computer programs. Comparison training is a supervised learning method for tuning the parameters of an evaluation function. It has proven to be effective in the game of Chess and Shogi. The training method requires a large number of training positions and moves extracted from game records of human experts; however, the number of such game records is limited. In this paper, we propose a practical approach to create additional training data for comparison training by using the program itself. We investigate three methods for generating additional positions and moves. Then we evaluate them using a Shogi program. Experimental results show that the self-generated training data can improve the playing strength of the program.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://wdoor.c.u-tokyo.ac.jp/shogi/floodgate.html
2.
We can get a sufficient variety of game positions by making the first 36 moves from game records of experts. A shogi game is usually still in the opening stage even after playing the first 36 moves. The generation of Leaf and Random is done with 35 moves while Self-play uses only 30 moves because the base player may make the same moves as experts. Some extra moves of the base player are needed in Self-play to generate different positions from game records of experts.
3.
It takes several tens of seconds for Gekisashi to perform a search with a depth of 20 in a typical middle-game position.
4.
For example, when the training data included the Leaf training data and the Random training data, the test data included the Leaf test data and the Random test data.
5.
Players with a rating higher than 2550 as of June 10, 2013.

References

Baxter, J., Tridgell, A., Weaver, L.: Reinforcement learning and chess. In: Furnkranz, J., Kubat, M. (eds.) Machines That Learn to Play Games, pp. 91–116. Nova Science Publishers, Inc., Hauppauge (2001)
Google Scholar
Beal, D.F., Smith, M.C.: Temporal difference learning applied to game playing and the results of application to shogi. Theor. Comput. Sci. 252(1–2), 105–119 (2001)
Article MATH MathSciNet Google Scholar
Bošković, B., Brest, J., Zamuda, A., Greiner, S., Žumer, V.: History mechanism supported differential evolution for chess evaluation function tuning. Soft Comput. 15(4), 667–683 (2010)
Article Google Scholar
Buro, M.: From simple features to sophisticated evaluation functions. In: van den Herik, H.J., Iida, H. (eds.) CG 1998. LNCS, vol. 1558, pp. 126–145. Springer, Heidelberg (1999)
Chapter Google Scholar
Buro, M.: Improving heuristic mini-max search by supervised learning. Artif. Intell. 134(1–2), 85–99 (2002)
Article MATH Google Scholar
Campbell, M., Hoane, A., et al.: Deep blue. Artif. Intell. 134(1–2), 57–83 (2002)
Article MATH Google Scholar
Collins, M.: Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms. In: EMNLP ’02, pp. 1–8. Association for Computational Linguistics (2002)
Google Scholar
David-Tabibi, O., Koppel, M., Netanyahu, N.S.: Expert-driven genetic algorithms for simulating evaluation functions. Genet. Program. Evolvable Mach. 12(1), 5–22 (2011)
Article Google Scholar
Fogel, D.B., Hays, T.J., Hahn, S.L., Quon, J.: A self-learning evolutionary chess program. Proc. IEEE 92(12), 1947–1954 (2004)
Article Google Scholar
Fürnkranz, J.: Machine learning in games: a survey. In: Fürnkranz, J., Kubat, M. (eds.) Machines That Learn to Play Games, pp. 11–59. Nova Science Publishers, Inc., Hauppauge (2001)
Google Scholar
Hoki, K., Kaneko, T.: The global landscape of objective functions for the optimization of shogi piece values with a game-tree search. In: van den Herik, H.J., Plaat, A. (eds.) ACG 2011. LNCS, vol. 7168, pp. 184–195. Springer, Heidelberg (2012)
Chapter Google Scholar
Kaneko, T.: Evaluation functions of computer shogi programs and supervised learning using game records. J. Jpn. Soc. Artif. Intell. 27(1), 75–82 (2012) (In Japanese)
Google Scholar
Kaneko, T., Hoki, K.: Analysis of evaluation-function learning by comparison of sibling nodes. In: van den Herik, H.J., Plaat, A. (eds.) ACG 2011. LNCS, vol. 7168, pp. 158–169. Springer, Heidelberg (2012)
Chapter Google Scholar
Lee, K.F., Mahajan, S.: A pattern classification approach to evaluation function learning. Artif. Intell. 36(1), 1–25 (1988)
Article Google Scholar
Mandziuk, J.: Knowledge-Free and Learning-Based Methods in Intelligent Game Playing. Springer, Heidelberg (2010)
Book MATH Google Scholar
Sato, Y., Miwa, M., Takeuchi, S., Takahashi, D.: Optimizing objective function parameters for strength in computer game-playing. In: AAAI ’13, pp. 869–875 (2013)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. Cambridge University Press, Cambridge (1998)
Google Scholar
Tesauro, G.: Comparison training of chess evaluation functions. Machines That Learn To play Games, pp. 117–130. Nova Science Publishers, Inc., New York (2001)
Google Scholar
Tesauro, G.: Programming backgammon using self-teaching neural nets. Artif. Intell. 134(1–2), 181–199 (2002)
Article MATH Google Scholar
Tsuruoka, Y., Yokoyama, D., Chikayama, T.: Game-tree search algorithm based on realization probability. ICGA J. 25(3), 146–153 (2002)
Google Scholar
Vázquez-Fernández, E., Coello, C.A.C., Troncoso, F.D.S.: An evolutionary algorithm coupled with the Hooke-Jeeves algorithm for tuning a chess evaluation function. In: IEEE CEC ’12, pp. 1–8 (2012)
Google Scholar
Veness, J., Silver, D., Uther, W., Blair, A.: Bootstrapping from game tree search. Adv. Neural Inf. Process. Syst. 22, 1937–1945 (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Graduate School of Engineering, The University of Tokyo, Tokyo, 113-8656, Japan
Akira Ura, Yoshimasa Tsuruoka & Takashi Chikayama
School of Computer Science, The University of Manchester, Manchester, M1 7DN, UK
Makoto Miwa

Authors

Akira Ura
View author publications
You can also search for this author in PubMed Google Scholar
Makoto Miwa
View author publications
You can also search for this author in PubMed Google Scholar
Yoshimasa Tsuruoka
View author publications
You can also search for this author in PubMed Google Scholar
Takashi Chikayama
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Akira Ura .

Editor information

Editors and Affiliations

Tilburg University, Tilburg, The Netherlands
H. Jaap van den Herik
JAIST, Ishikawa, Japan
Hiroyuki Iida
Tilburg University, Tilburg, The Netherlands
Aske Plaat

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ura, A., Miwa, M., Tsuruoka, Y., Chikayama, T. (2014). Comparison Training of Shogi Evaluation Functions with Self-Generated Training Positions and Moves. In: van den Herik, H., Iida, H., Plaat, A. (eds) Computers and Games. CG 2013. Lecture Notes in Computer Science(), vol 8427. Springer, Cham. https://doi.org/10.1007/978-3-319-09165-5_18

Download citation

DOI: https://doi.org/10.1007/978-3-319-09165-5_18
Published: 12 July 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09164-8
Online ISBN: 978-3-319-09165-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics