Abstract
Jaipur is a challenging two-player score-based strategy game where the players take turns to trade and sell cards for points, with the objective of having more points than the opponent at the end of the game. This game contains multiple factors which make self-play learning challenging, such as being partially observable, having stochastic actions, and having a very large action space of 25,469 possible discrete actions. Moreover, the game contains both immediate and long-term rewards, and the players have the possibility of adopting different strategies as the game is adversarial. In this work we benchmark the state-of-the-art PPO, A2C, DQN and DDQN reinforcement learning algorithms using self-play without any domain knowledge and starting from random play. Due to the large action space of the game, we propose to use action masks. The policy generated by each algorithm was evaluated quantitatively against typical Jaipur scores, and also qualitatively by checking which actions each agent was selecting. The results show that all the algorithms converged to policies that played the game strongly, with the PPO algorithm obtaining the best results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Jaipur is a strategy card game created by Sébastien Pauchon and published by Space Cowboys (Asmodee).
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
References
Dulac-Arnold, G., et al.: Deep reinforcement learning in large discrete action spaces. arXiv preprint arXiv:1512.07679 (2015)
Fujita, K.: AlphaDDA: strategies for adjusting the playing strength of a fully trained AlphaZero system to a suitable human training partner. PeerJ Comput. Sci. 8, e1123 (2022)
Ghory, I.: Reinforcement learning in board games. Technical report 105, Department of Computer Science, University of Bristol (2004)
van Hasselt, H.: Double Q-learning. In: Advances in Neural Information Processing Systems, vol. 23 (2010)
van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double Q-learning. In: Proceedings of the 30th AAAI Conference on Artificial Intelligence (AAAI 2016) (2016)
Huang, S., Kanervisto, A., Raffin, A., Wang, W., Ontañón, S., Dossa, R.F.J.: A2C is a special case of PPO. arXiv preprint arXiv:2205.09123 (2022)
Justesen, N., Uth, L.M., Jakobsen, C., Moore, P.D., Togelius, J., Risi, S.: Blood bowl: a new board game challenge and competition for AI. In: 2019 IEEE Conference on Games (CoG), pp. 1–8. IEEE (2019)
Kanervisto, A., Scheller, C., Hautamäki, V.: Action space shaping in deep reinforcement learning. In: 2020 IEEE Conference on Games (CoG), pp. 479–486. IEEE (2020)
Karagiannakos, S.: The idea behind actor-critics and how A2C and A3C improve them (2018). https://theaisummer.com/Actor_critics
Karunakaran, D., Worrall, S., Nebot, E.: Efficient statistical validation with edge cases to evaluate highly automated vehicles. In: 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC), pp. 1–8. IEEE (2020)
Konen, W.: Reinforcement learning for board games: the temporal difference algorithm. Technical report, Research Center CIOP (Computational Intelligence, Optimization and Data Mining), TH Köln-Cologne University of Applied Sciences (2015)
Liu, J., Hou, P., Mu, L., Yu, Y., Huang, C.: Elements of effective deep reinforcement learning towards tactical driving decision making. arXiv preprint arXiv:1802.00332 (2018)
Liu, Y., Zheng, J., Chang, F.: Learning and planning in partially observable environments without prior domain knowledge. Int. J. Approximate Reasoning 142, 147–160 (2022). https://doi.org/10.1016/j.ijar.2021.12.004
Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning, pp. 1928–1937. PMLR (2016)
Mnih, V., et al.: Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013)
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Plaat, A.: Deep Reinforcement Learning. Springer, Singapore (2022). https://doi.org/10.1007/978-981-19-0638-1
Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897. PMLR (2015)
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)
Silver, D., et al.: Mastering chess and shogi by self-play with a general reinforcement learning algorithm. arXiv preprint arXiv:1712.01815 (12 2017)
Silver, D., et al.: A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science 362(6419), 1140–1144 (2018). https://doi.org/10.1126/science.aar6404
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, 2nd edn. MIT Press, Cambridge (2018)
Tang, C.Y., Liu, C.H., Chen, W.K., You, S.D.: Implementing action mask in proximal policy optimization (PPO) algorithm. ICT Express 6, 200–203 (2020). https://doi.org/10.1016/j.icte.2020.05.003
Wang, Z., Schaul, T., Hessel, M., Hasselt, H., Lanctot, M., Freitas, N.: Dueling network architectures for deep reinforcement learning. In: International Conference on Machine Learning, pp. 1995–2003. PMLR (2016)
Watkins, C.J.: Learning from delayed rewards. Ph.D. thesis, King’s College, Cambridge United Kingdom (1989)
Watkins, C.J., Dayan, P.: Q-learning. Mach. Learn. 8, 279–292 (1992)
Wiering, M.A., Patist, J.P., Mannen, H.: Learning to play board games using temporal difference methods. Technical report UU-CS-2005-048, Utrecht University (2005)
Wu, Y., Mansimov, E., Grosse, R.B., Liao, S., Ba, J.: Scalable trust-region method for deep reinforcement learning using kronecker-factored approximation. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Yao, Z., et al.: Towards modern card games with large-scale action spaces through action representation. In: 2022 IEEE Conference on Games (CoG), pp. 576–579. IEEE (2022)
Ye, D., et al.: Mastering complex control in MOBA games with deep reinforcement learning. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI-20), pp. 6672–6679 (2020)
Yin, Q.Y., et al.: Ai in human-computer gaming: techniques, challenges and opportunities. Mach. Intell. Res. 20, 1–19 (2023)
Yu, C., et al.: The surprising effectiveness of PPO in cooperative multi-agent games. Adv. Neural. Inf. Process. Syst. 35, 24611–24624 (2022)
Zahavy, T., Haroush, M., Merlis, N., Mankowitz, D.J., Mannor, S.: Learn what not to learn: action elimination with deep reinforcement learning. In: Advances in Neural Information Processing Systems, vol. 31 (2018)
Zha, D., et al.: Douzero: Mastering doudizhu with self-play deep reinforcement learning. In: International Conference on Machine Learning, pp. 12333–12344. PMLR (2021)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Cutajar, C., Bajada, J. (2023). Mastering the Card Game of Jaipur Through Zero-Knowledge Self-Play Reinforcement Learning and Action Masks. In: Basili, R., Lembo, D., Limongelli, C., Orlandini, A. (eds) AIxIA 2023 – Advances in Artificial Intelligence. AIxIA 2023. Lecture Notes in Computer Science(), vol 14318. Springer, Cham. https://doi.org/10.1007/978-3-031-47546-7_16
Download citation
DOI: https://doi.org/10.1007/978-3-031-47546-7_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-47545-0
Online ISBN: 978-3-031-47546-7
eBook Packages: Computer ScienceComputer Science (R0)