Advertisement

Comparison Training of Shogi Evaluation Functions with Self-Generated Training Positions and Moves

  • Akira UraEmail author
  • Makoto Miwa
  • Yoshimasa Tsuruoka
  • Takashi Chikayama
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8427)

Abstract

Automated tuning of parameters in computer game playing is an important technique for building strong computer programs. Comparison training is a supervised learning method for tuning the parameters of an evaluation function. It has proven to be effective in the game of Chess and Shogi. The training method requires a large number of training positions and moves extracted from game records of human experts; however, the number of such game records is limited. In this paper, we propose a practical approach to create additional training data for comparison training by using the program itself. We investigate three methods for generating additional positions and moves. Then we evaluate them using a Shogi program. Experimental results show that the self-generated training data can improve the playing strength of the program.

References

  1. 1.
    Baxter, J., Tridgell, A., Weaver, L.: Reinforcement learning and chess. In: Furnkranz, J., Kubat, M. (eds.) Machines That Learn to Play Games, pp. 91–116. Nova Science Publishers, Inc., Hauppauge (2001)Google Scholar
  2. 2.
    Beal, D.F., Smith, M.C.: Temporal difference learning applied to game playing and the results of application to shogi. Theor. Comput. Sci. 252(1–2), 105–119 (2001)CrossRefzbMATHMathSciNetGoogle Scholar
  3. 3.
    Bošković, B., Brest, J., Zamuda, A., Greiner, S., Žumer, V.: History mechanism supported differential evolution for chess evaluation function tuning. Soft Comput. 15(4), 667–683 (2010)CrossRefGoogle Scholar
  4. 4.
    Buro, M.: From simple features to sophisticated evaluation functions. In: van den Herik, H.J., Iida, H. (eds.) CG 1998. LNCS, vol. 1558, pp. 126–145. Springer, Heidelberg (1999) CrossRefGoogle Scholar
  5. 5.
    Buro, M.: Improving heuristic mini-max search by supervised learning. Artif. Intell. 134(1–2), 85–99 (2002)CrossRefzbMATHGoogle Scholar
  6. 6.
    Campbell, M., Hoane, A., et al.: Deep blue. Artif. Intell. 134(1–2), 57–83 (2002)CrossRefzbMATHGoogle Scholar
  7. 7.
    Collins, M.: Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms. In: EMNLP ’02, pp. 1–8. Association for Computational Linguistics (2002)Google Scholar
  8. 8.
    David-Tabibi, O., Koppel, M., Netanyahu, N.S.: Expert-driven genetic algorithms for simulating evaluation functions. Genet. Program. Evolvable Mach. 12(1), 5–22 (2011)CrossRefGoogle Scholar
  9. 9.
    Fogel, D.B., Hays, T.J., Hahn, S.L., Quon, J.: A self-learning evolutionary chess program. Proc. IEEE 92(12), 1947–1954 (2004)CrossRefGoogle Scholar
  10. 10.
    Fürnkranz, J.: Machine learning in games: a survey. In: Fürnkranz, J., Kubat, M. (eds.) Machines That Learn to Play Games, pp. 11–59. Nova Science Publishers, Inc., Hauppauge (2001) Google Scholar
  11. 11.
    Hoki, K., Kaneko, T.: The global landscape of objective functions for the optimization of shogi piece values with a game-tree search. In: van den Herik, H.J., Plaat, A. (eds.) ACG 2011. LNCS, vol. 7168, pp. 184–195. Springer, Heidelberg (2012) CrossRefGoogle Scholar
  12. 12.
    Kaneko, T.: Evaluation functions of computer shogi programs and supervised learning using game records. J. Jpn. Soc. Artif. Intell. 27(1), 75–82 (2012) (In Japanese)Google Scholar
  13. 13.
    Kaneko, T., Hoki, K.: Analysis of evaluation-function learning by comparison of sibling nodes. In: van den Herik, H.J., Plaat, A. (eds.) ACG 2011. LNCS, vol. 7168, pp. 158–169. Springer, Heidelberg (2012) CrossRefGoogle Scholar
  14. 14.
    Lee, K.F., Mahajan, S.: A pattern classification approach to evaluation function learning. Artif. Intell. 36(1), 1–25 (1988)CrossRefGoogle Scholar
  15. 15.
    Mandziuk, J.: Knowledge-Free and Learning-Based Methods in Intelligent Game Playing. Springer, Heidelberg (2010)CrossRefzbMATHGoogle Scholar
  16. 16.
    Sato, Y., Miwa, M., Takeuchi, S., Takahashi, D.: Optimizing objective function parameters for strength in computer game-playing. In: AAAI ’13, pp. 869–875 (2013)Google Scholar
  17. 17.
    Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. Cambridge University Press, Cambridge (1998)Google Scholar
  18. 18.
    Tesauro, G.: Comparison training of chess evaluation functions. Machines That Learn To play Games, pp. 117–130. Nova Science Publishers, Inc., New York (2001) Google Scholar
  19. 19.
    Tesauro, G.: Programming backgammon using self-teaching neural nets. Artif. Intell. 134(1–2), 181–199 (2002)CrossRefzbMATHGoogle Scholar
  20. 20.
    Tsuruoka, Y., Yokoyama, D., Chikayama, T.: Game-tree search algorithm based on realization probability. ICGA J. 25(3), 146–153 (2002)Google Scholar
  21. 21.
    Vázquez-Fernández, E., Coello, C.A.C., Troncoso, F.D.S.: An evolutionary algorithm coupled with the Hooke-Jeeves algorithm for tuning a chess evaluation function. In: IEEE CEC ’12, pp. 1–8 (2012)Google Scholar
  22. 22.
    Veness, J., Silver, D., Uther, W., Blair, A.: Bootstrapping from game tree search. Adv. Neural Inf. Process. Syst. 22, 1937–1945 (2009)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Akira Ura
    • 1
    Email author
  • Makoto Miwa
    • 2
  • Yoshimasa Tsuruoka
    • 1
  • Takashi Chikayama
    • 1
  1. 1.Graduate School of EngineeringThe University of TokyoTokyoJapan
  2. 2.School of Computer ScienceThe University of ManchesterManchesterUK

Personalised recommendations