Advertisement

Deep Reinforcement Learning in Strategic Board Game Environments

  • Konstantia Xenou
  • Georgios ChalkiadakisEmail author
  • Stergos Afantenos
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11450)

Abstract

In this paper we propose a novel Deep Reinforcement Learning (DRL) algorithm that uses the concept of “action-dependent state features”, and exploits it to approximate the Q-values locally, employing a deep neural network with parallel Long Short Term Memory (LSTM) components, each one responsible for computing an action-related Q-value. As such, all computations occur simultaneously, and there is no need to employ “target” networks and experience replay, which are techniques regularly used in the DRL literature. Moreover, our algorithm does not require previous training experiences, but trains itself online during game play. We tested our approach in the Settlers Of Catan multi-player strategic board game. Our results confirm the effectiveness of our approach, since it outperforms several competitors, including the state-of-the-art jSettler heuristic algorithm devised for this particular domain.

Keywords

Deep Reinforcement Learning Strategic board games 

References

  1. 1.
    Afantenos, S., Kow, E., Asher, N., Perret, J.: Discourse parsing for multi-party chat dialogues. Proc. EMNLP 2015, 928–937 (2015)Google Scholar
  2. 2.
    Anschel, O., Baram, N., Shimkin, N.: Deep reinforcement learning with averaged target DQN. CoRR abs/1611.01929 (2016)Google Scholar
  3. 3.
    Bellman, R.: Dynamic programming. Courier Corporation, Chelmsford (2013)zbMATHGoogle Scholar
  4. 4.
    Cuayáhuitl, H., Keizer, S., Lemon, O.: Strategic dialogue management via deep reinforcement learning. In: Proceedings of the NIPS Deep Reinforcement Learning Workshop (NIPS 2015) (2015)Google Scholar
  5. 5.
    Dearden, R., Friedman, N., Russell, S.: Bayesian Q-learning. In: AAAI/IAAI, pp. 761–768 (1998)Google Scholar
  6. 6.
    Dobre, M.S., Lascarides, A.: Online learning and mining human play in complex games. In: 2015 IEEE Conference on Computational Intelligence and Games (CIG), pp. 60–67. IEEE (2015)Google Scholar
  7. 7.
    Finnman, P., Winberg, M.: Deep reinforcement learning compared with Q-table learning applied to backgammon (2016)Google Scholar
  8. 8.
    Guhe, M., Lascarides, A.: Game strategies for the Settlers of Catan. In: Computational Intelligence and Games (CIG), pp. 1–8. IEEE (2014)Google Scholar
  9. 9.
    van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double Q-learning. CoRR abs/1509.06461 (2015)Google Scholar
  10. 10.
    Hausknecht, M., Stone, P.: Deep recurrent Q-learning for partially observable MDPs. CoRR, abs/1507.06527 7(1) (2015)Google Scholar
  11. 11.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRefGoogle Scholar
  12. 12.
    Karamalegkos, E.: Monte Carlo tree search in the “Settlers of Catan” strategy game, Senior Undergraduate Diploma thesis, School of Electrical and Computer Engineering, Technical University of Crete (2014). https://goo.gl/rU9vG8
  13. 13.
    Keizer, S., et al.: Evaluating persuasion strategies and deep reinforcement learning methods for negotiation dialogue agents. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, vol. 2, pp. 480–484 (2017)Google Scholar
  14. 14.
    Kok, J.R., Vlassis, N.: Collaborative multiagent reinforcement learning by payoff propagation. J. Mach. Learn. Res. 7(Sep), 1789–1828 (2006)MathSciNetzbMATHGoogle Scholar
  15. 15.
    Lai, M.: Giraffe: using deep reinforcement learning to play Chess. arXiv preprint arXiv:1509.01549 (2015)
  16. 16.
    Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. CoRR abs/1509.02971 (2015)Google Scholar
  17. 17.
    Mnih, V., Kavukcuoglu, K., Silver, D., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)CrossRefGoogle Scholar
  18. 18.
    Oh, J., Guo, X., Lee, H., Lewis, R.L., Singh, S.: Action-conditional video prediction using deep networks in Atari games. In: Advances in Neural Information Processing Systems, pp. 2863–2871 (2015)Google Scholar
  19. 19.
    Osband, I., Blundell, C., Pritzel, A., Roy, B.V.: Deep exploration via bootstrapped DQN. CoRR abs/1602.04621 (2016)Google Scholar
  20. 20.
    Panousis, K.P.: Real-time planning and learning in the “Settlers of Catan” strategy game, Senior Undergraduate Diploma thesis, School of Electrical and Computer Engineering, Technical University of Crete (2014). https://goo.gl/4Hpx8w
  21. 21.
    Pfeiffer, M.: Reinforcement learning of strategies for Settlers of Catan. In: International Conference on Computer Games: Artificial Intelligence (2018)Google Scholar
  22. 22.
    Russell, S.J., Zimdars, A.: Q-decomposition for reinforcement learning agents. In: Proceedings of the 20th International Conference on Machine Learning (ICML-03), pp. 656–663 (2003)Google Scholar
  23. 23.
    Schmidhuber, J.: Deep learning in neural networks: an overview. Neural Netw. 61, 85–117 (2015)CrossRefGoogle Scholar
  24. 24.
    Silver, D., et al.: Mastering the game of go without human knowledge. Nature 550, 354–359 (2017)CrossRefGoogle Scholar
  25. 25.
    Stone, P., Veloso, M.: Team-partitioned, opaque-transition reinforcement learning. In: Proceedings of the Third Annual Conference on Autonomous Agents, pp. 206–212. ACM (1999)Google Scholar
  26. 26.
    Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)Google Scholar
  27. 27.
    Szita, I., Chaslot, G., Spronck, P.: Monte-Carlo tree search in Settlers of Catan. In: van den Herik, H.J., Spronck, P. (eds.) ACG 2009. LNCS, vol. 6048, pp. 21–32. Springer, Heidelberg (2010).  https://doi.org/10.1007/978-3-642-12993-3_3CrossRefGoogle Scholar
  28. 28.
    Thomas, R.S.: Real-time decision making for adversarial environments using a plan-based heuristic. Ph.D. thesis, Northwestern University (2003)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Konstantia Xenou
    • 1
  • Georgios Chalkiadakis
    • 1
    Email author
  • Stergos Afantenos
    • 2
  1. 1.School of Electrical and Computer EngineeringTechnical University of CreteChaniaGreece
  2. 2.Institut de recherche en informatique de Toulouse (IRIT)Université Paul SabatierToulouseFrance

Personalised recommendations