Skip to main content

Mastering the Card Game of Jaipur Through Zero-Knowledge Self-Play Reinforcement Learning and Action Masks

  • Conference paper
  • First Online:
AIxIA 2023 – Advances in Artificial Intelligence (AIxIA 2023)

Abstract

Jaipur is a challenging two-player score-based strategy game where the players take turns to trade and sell cards for points, with the objective of having more points than the opponent at the end of the game. This game contains multiple factors which make self-play learning challenging, such as being partially observable, having stochastic actions, and having a very large action space of 25,469 possible discrete actions. Moreover, the game contains both immediate and long-term rewards, and the players have the possibility of adopting different strategies as the game is adversarial. In this work we benchmark the state-of-the-art PPO, A2C, DQN and DDQN reinforcement learning algorithms using self-play without any domain knowledge and starting from random play. Due to the large action space of the game, we propose to use action masks. The policy generated by each algorithm was evaluated quantitatively against typical Jaipur scores, and also qualitatively by checking which actions each agent was selecting. The results show that all the algorithms converged to policies that played the game strongly, with the PPO algorithm obtaining the best results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Jaipur is a strategy card game created by Sébastien Pauchon and published by Space Cowboys (Asmodee).

  2. 2.

    https://github.com/Cristina0702/JaipurRL.

  3. 3.

    https://pettingzoo.farama.org/.

  4. 4.

    https://gymnasium.farama.org/.

  5. 5.

    https://docs.ray.io/en/latest/rllib/index.html.

  6. 6.

    https://boardgamegeek.com/thread/702405/what-your-best-round-score-jaipur.

  7. 7.

    https://www.reddit.com/r/boardgames/comments/dhxwa4/data_analysis_on_jaipur_games_between_my_wife_and/.

  8. 8.

    https://imgur.com/gallery/GizBVGW.

References

  1. Dulac-Arnold, G., et al.: Deep reinforcement learning in large discrete action spaces. arXiv preprint arXiv:1512.07679 (2015)

  2. Fujita, K.: AlphaDDA: strategies for adjusting the playing strength of a fully trained AlphaZero system to a suitable human training partner. PeerJ Comput. Sci. 8, e1123 (2022)

    Article  Google Scholar 

  3. Ghory, I.: Reinforcement learning in board games. Technical report 105, Department of Computer Science, University of Bristol (2004)

    Google Scholar 

  4. van Hasselt, H.: Double Q-learning. In: Advances in Neural Information Processing Systems, vol. 23 (2010)

    Google Scholar 

  5. van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double Q-learning. In: Proceedings of the 30th AAAI Conference on Artificial Intelligence (AAAI 2016) (2016)

    Google Scholar 

  6. Huang, S., Kanervisto, A., Raffin, A., Wang, W., Ontañón, S., Dossa, R.F.J.: A2C is a special case of PPO. arXiv preprint arXiv:2205.09123 (2022)

  7. Justesen, N., Uth, L.M., Jakobsen, C., Moore, P.D., Togelius, J., Risi, S.: Blood bowl: a new board game challenge and competition for AI. In: 2019 IEEE Conference on Games (CoG), pp. 1–8. IEEE (2019)

    Google Scholar 

  8. Kanervisto, A., Scheller, C., Hautamäki, V.: Action space shaping in deep reinforcement learning. In: 2020 IEEE Conference on Games (CoG), pp. 479–486. IEEE (2020)

    Google Scholar 

  9. Karagiannakos, S.: The idea behind actor-critics and how A2C and A3C improve them (2018). https://theaisummer.com/Actor_critics

  10. Karunakaran, D., Worrall, S., Nebot, E.: Efficient statistical validation with edge cases to evaluate highly automated vehicles. In: 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC), pp. 1–8. IEEE (2020)

    Google Scholar 

  11. Konen, W.: Reinforcement learning for board games: the temporal difference algorithm. Technical report, Research Center CIOP (Computational Intelligence, Optimization and Data Mining), TH Köln-Cologne University of Applied Sciences (2015)

    Google Scholar 

  12. Liu, J., Hou, P., Mu, L., Yu, Y., Huang, C.: Elements of effective deep reinforcement learning towards tactical driving decision making. arXiv preprint arXiv:1802.00332 (2018)

  13. Liu, Y., Zheng, J., Chang, F.: Learning and planning in partially observable environments without prior domain knowledge. Int. J. Approximate Reasoning 142, 147–160 (2022). https://doi.org/10.1016/j.ijar.2021.12.004

    Article  MathSciNet  MATH  Google Scholar 

  14. Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning, pp. 1928–1937. PMLR (2016)

    Google Scholar 

  15. Mnih, V., et al.: Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013)

  16. Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)

    Article  Google Scholar 

  17. Plaat, A.: Deep Reinforcement Learning. Springer, Singapore (2022). https://doi.org/10.1007/978-981-19-0638-1

    Book  MATH  Google Scholar 

  18. Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897. PMLR (2015)

    Google Scholar 

  19. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)

  20. Silver, D., et al.: Mastering chess and shogi by self-play with a general reinforcement learning algorithm. arXiv preprint arXiv:1712.01815 (12 2017)

  21. Silver, D., et al.: A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science 362(6419), 1140–1144 (2018). https://doi.org/10.1126/science.aar6404

    Article  MathSciNet  MATH  Google Scholar 

  22. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, 2nd edn. MIT Press, Cambridge (2018)

    MATH  Google Scholar 

  23. Tang, C.Y., Liu, C.H., Chen, W.K., You, S.D.: Implementing action mask in proximal policy optimization (PPO) algorithm. ICT Express 6, 200–203 (2020). https://doi.org/10.1016/j.icte.2020.05.003

    Article  Google Scholar 

  24. Wang, Z., Schaul, T., Hessel, M., Hasselt, H., Lanctot, M., Freitas, N.: Dueling network architectures for deep reinforcement learning. In: International Conference on Machine Learning, pp. 1995–2003. PMLR (2016)

    Google Scholar 

  25. Watkins, C.J.: Learning from delayed rewards. Ph.D. thesis, King’s College, Cambridge United Kingdom (1989)

    Google Scholar 

  26. Watkins, C.J., Dayan, P.: Q-learning. Mach. Learn. 8, 279–292 (1992)

    Article  MATH  Google Scholar 

  27. Wiering, M.A., Patist, J.P., Mannen, H.: Learning to play board games using temporal difference methods. Technical report UU-CS-2005-048, Utrecht University (2005)

    Google Scholar 

  28. Wu, Y., Mansimov, E., Grosse, R.B., Liao, S., Ba, J.: Scalable trust-region method for deep reinforcement learning using kronecker-factored approximation. In: Advances in Neural Information Processing Systems, vol. 30 (2017)

    Google Scholar 

  29. Yao, Z., et al.: Towards modern card games with large-scale action spaces through action representation. In: 2022 IEEE Conference on Games (CoG), pp. 576–579. IEEE (2022)

    Google Scholar 

  30. Ye, D., et al.: Mastering complex control in MOBA games with deep reinforcement learning. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI-20), pp. 6672–6679 (2020)

    Google Scholar 

  31. Yin, Q.Y., et al.: Ai in human-computer gaming: techniques, challenges and opportunities. Mach. Intell. Res. 20, 1–19 (2023)

    Article  Google Scholar 

  32. Yu, C., et al.: The surprising effectiveness of PPO in cooperative multi-agent games. Adv. Neural. Inf. Process. Syst. 35, 24611–24624 (2022)

    Google Scholar 

  33. Zahavy, T., Haroush, M., Merlis, N., Mankowitz, D.J., Mannor, S.: Learn what not to learn: action elimination with deep reinforcement learning. In: Advances in Neural Information Processing Systems, vol. 31 (2018)

    Google Scholar 

  34. Zha, D., et al.: Douzero: Mastering doudizhu with self-play deep reinforcement learning. In: International Conference on Machine Learning, pp. 12333–12344. PMLR (2021)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cristina Cutajar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Cutajar, C., Bajada, J. (2023). Mastering the Card Game of Jaipur Through Zero-Knowledge Self-Play Reinforcement Learning and Action Masks. In: Basili, R., Lembo, D., Limongelli, C., Orlandini, A. (eds) AIxIA 2023 – Advances in Artificial Intelligence. AIxIA 2023. Lecture Notes in Computer Science(), vol 14318. Springer, Cham. https://doi.org/10.1007/978-3-031-47546-7_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-47546-7_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-47545-0

  • Online ISBN: 978-3-031-47546-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics