Mastering the Card Game of Jaipur Through Zero-Knowledge Self-Play Reinforcement Learning and Action Masks

Cutajar, Cristina; Bajada, Josef

doi:10.1007/978-3-031-47546-7_16

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14318))

Included in the following conference series:

International Conference of the Italian Association for Artificial Intelligence

476 Accesses

Abstract

Jaipur is a challenging two-player score-based strategy game where the players take turns to trade and sell cards for points, with the objective of having more points than the opponent at the end of the game. This game contains multiple factors which make self-play learning challenging, such as being partially observable, having stochastic actions, and having a very large action space of 25,469 possible discrete actions. Moreover, the game contains both immediate and long-term rewards, and the players have the possibility of adopting different strategies as the game is adversarial. In this work we benchmark the state-of-the-art PPO, A2C, DQN and DDQN reinforcement learning algorithms using self-play without any domain knowledge and starting from random play. Due to the large action space of the game, we propose to use action masks. The policy generated by each algorithm was evaluated quantitatively against typical Jaipur scores, and also qualitatively by checking which actions each agent was selecting. The results show that all the algorithms converged to policies that played the game strongly, with the PPO algorithm obtaining the best results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

From mimic to counteract: a two-stage reinforcement learning algorithm for Google research football

Article 22 February 2024

Modified action decoder using Bayesian reasoning for multi-agent deep reinforcement learning

Article 27 July 2021

Efficient policy detecting and reusing for non-stationarity in Markov games

Article 26 October 2020

Notes

1.
Jaipur is a strategy card game created by Sébastien Pauchon and published by Space Cowboys (Asmodee).
2.
https://github.com/Cristina0702/JaipurRL.
3.
https://pettingzoo.farama.org/.
4.
https://gymnasium.farama.org/.
5.
https://docs.ray.io/en/latest/rllib/index.html.
6.
https://boardgamegeek.com/thread/702405/what-your-best-round-score-jaipur.
7.
https://www.reddit.com/r/boardgames/comments/dhxwa4/data_analysis_on_jaipu r_games_between_my_wife_and/.
8.
https://imgur.com/gallery/GizBVGW.

References

Dulac-Arnold, G., et al.: Deep reinforcement learning in large discrete action spaces. arXiv preprint arXiv:1512.07679 (2015)
Fujita, K.: AlphaDDA: strategies for adjusting the playing strength of a fully trained AlphaZero system to a suitable human training partner. PeerJ Comput. Sci. 8, e1123 (2022)
Article Google Scholar
Ghory, I.: Reinforcement learning in board games. Technical report 105, Department of Computer Science, University of Bristol (2004)
Google Scholar
van Hasselt, H.: Double Q-learning. In: Advances in Neural Information Processing Systems, vol. 23 (2010)
Google Scholar
van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double Q-learning. In: Proceedings of the 30th AAAI Conference on Artificial Intelligence (AAAI 2016) (2016)
Google Scholar
Huang, S., Kanervisto, A., Raffin, A., Wang, W., Ontañón, S., Dossa, R.F.J.: A2C is a special case of PPO. arXiv preprint arXiv:2205.09123 (2022)
Justesen, N., Uth, L.M., Jakobsen, C., Moore, P.D., Togelius, J., Risi, S.: Blood bowl: a new board game challenge and competition for AI. In: 2019 IEEE Conference on Games (CoG), pp. 1–8. IEEE (2019)
Google Scholar
Kanervisto, A., Scheller, C., Hautamäki, V.: Action space shaping in deep reinforcement learning. In: 2020 IEEE Conference on Games (CoG), pp. 479–486. IEEE (2020)
Google Scholar
Karagiannakos, S.: The idea behind actor-critics and how A2C and A3C improve them (2018). https://theaisummer.com/Actor_critics
Karunakaran, D., Worrall, S., Nebot, E.: Efficient statistical validation with edge cases to evaluate highly automated vehicles. In: 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC), pp. 1–8. IEEE (2020)
Google Scholar
Konen, W.: Reinforcement learning for board games: the temporal difference algorithm. Technical report, Research Center CIOP (Computational Intelligence, Optimization and Data Mining), TH Köln-Cologne University of Applied Sciences (2015)
Google Scholar
Liu, J., Hou, P., Mu, L., Yu, Y., Huang, C.: Elements of effective deep reinforcement learning towards tactical driving decision making. arXiv preprint arXiv:1802.00332 (2018)
Liu, Y., Zheng, J., Chang, F.: Learning and planning in partially observable environments without prior domain knowledge. Int. J. Approximate Reasoning 142, 147–160 (2022). https://doi.org/10.1016/j.ijar.2021.12.004
Article MathSciNet MATH Google Scholar
Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning, pp. 1928–1937. PMLR (2016)
Google Scholar
Mnih, V., et al.: Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013)
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Article Google Scholar
Plaat, A.: Deep Reinforcement Learning. Springer, Singapore (2022). https://doi.org/10.1007/978-981-19-0638-1
Book MATH Google Scholar
Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897. PMLR (2015)
Google Scholar
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)
Silver, D., et al.: Mastering chess and shogi by self-play with a general reinforcement learning algorithm. arXiv preprint arXiv:1712.01815 (12 2017)
Silver, D., et al.: A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science 362(6419), 1140–1144 (2018). https://doi.org/10.1126/science.aar6404
Article MathSciNet MATH Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, 2nd edn. MIT Press, Cambridge (2018)
MATH Google Scholar
Tang, C.Y., Liu, C.H., Chen, W.K., You, S.D.: Implementing action mask in proximal policy optimization (PPO) algorithm. ICT Express 6, 200–203 (2020). https://doi.org/10.1016/j.icte.2020.05.003
Article Google Scholar
Wang, Z., Schaul, T., Hessel, M., Hasselt, H., Lanctot, M., Freitas, N.: Dueling network architectures for deep reinforcement learning. In: International Conference on Machine Learning, pp. 1995–2003. PMLR (2016)
Google Scholar
Watkins, C.J.: Learning from delayed rewards. Ph.D. thesis, King’s College, Cambridge United Kingdom (1989)
Google Scholar
Watkins, C.J., Dayan, P.: Q-learning. Mach. Learn. 8, 279–292 (1992)
Article MATH Google Scholar
Wiering, M.A., Patist, J.P., Mannen, H.: Learning to play board games using temporal difference methods. Technical report UU-CS-2005-048, Utrecht University (2005)
Google Scholar
Wu, Y., Mansimov, E., Grosse, R.B., Liao, S., Ba, J.: Scalable trust-region method for deep reinforcement learning using kronecker-factored approximation. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Yao, Z., et al.: Towards modern card games with large-scale action spaces through action representation. In: 2022 IEEE Conference on Games (CoG), pp. 576–579. IEEE (2022)
Google Scholar
Ye, D., et al.: Mastering complex control in MOBA games with deep reinforcement learning. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI-20), pp. 6672–6679 (2020)
Google Scholar
Yin, Q.Y., et al.: Ai in human-computer gaming: techniques, challenges and opportunities. Mach. Intell. Res. 20, 1–19 (2023)
Article Google Scholar
Yu, C., et al.: The surprising effectiveness of PPO in cooperative multi-agent games. Adv. Neural. Inf. Process. Syst. 35, 24611–24624 (2022)
Google Scholar
Zahavy, T., Haroush, M., Merlis, N., Mankowitz, D.J., Mannor, S.: Learn what not to learn: action elimination with deep reinforcement learning. In: Advances in Neural Information Processing Systems, vol. 31 (2018)
Google Scholar
Zha, D., et al.: Douzero: Mastering doudizhu with self-play deep reinforcement learning. In: International Conference on Machine Learning, pp. 12333–12344. PMLR (2021)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Artificial Intelligence, Faculty of ICT, University of Malta, Msida, Malta
Cristina Cutajar & Josef Bajada

Authors

Cristina Cutajar
View author publications
You can also search for this author in PubMed Google Scholar
Josef Bajada
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Cristina Cutajar .

Editor information

Editors and Affiliations

University of Rome Tor Vergata, Rome, Italy
Roberto Basili
Sapienza University of Rome, Rome, Italy
Domenico Lembo
Roma Tre University, Rome, Italy
Carla Limongelli
National Research Council, Rome, Italy
Andrea Orlandini

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cutajar, C., Bajada, J. (2023). Mastering the Card Game of Jaipur Through Zero-Knowledge Self-Play Reinforcement Learning and Action Masks. In: Basili, R., Lembo, D., Limongelli, C., Orlandini, A. (eds) AIxIA 2023 – Advances in Artificial Intelligence. AIxIA 2023. Lecture Notes in Computer Science(), vol 14318. Springer, Cham. https://doi.org/10.1007/978-3-031-47546-7_16

Download citation

DOI: https://doi.org/10.1007/978-3-031-47546-7_16
Published: 02 November 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-47545-0
Online ISBN: 978-3-031-47546-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Mastering the Card Game of Jaipur Through Zero-Knowledge Self-Play Reinforcement Learning and Action Masks

Abstract

Access this chapter

Similar content being viewed by others

From mimic to counteract: a two-stage reinforcement learning algorithm for Google research football

Modified action decoder using Bayesian reasoning for multi-agent deep reinforcement learning

Efficient policy detecting and reusing for non-stationarity in Markov games

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Mastering the Card Game of Jaipur Through Zero-Knowledge Self-Play Reinforcement Learning and Action Masks

Abstract

Access this chapter

Similar content being viewed by others

From mimic to counteract: a two-stage reinforcement learning algorithm for Google research football

Modified action decoder using Bayesian reasoning for multi-agent deep reinforcement learning

Efficient policy detecting and reusing for non-stationarity in Markov games

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation