DeepFP for Finding Nash Equilibrium in Continuous Action Spaces

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11836)


Finding Nash equilibrium in continuous action spaces is a challenging problem and has applications in domains such as protecting geographic areas from potential attackers. We present DeepFP, an approximate extension of fictitious play in continuous action spaces. DeepFP represents players’ approximate best responses via generative neural networks which are highly expressive implicit density approximators. It additionally uses a game-model network which approximates the players’ expected payoffs given their actions, and trains the networks end-to-end in a model-based learning regime. Further, DeepFP allows using domain-specific oracles if available and can hence exploit techniques such as mathematical programming to compute best responses for structured games. We demonstrate stable convergence to Nash equilibrium on several classic games and also apply DeepFP to a large forest security domain with a novel defender best response oracle. We show that DeepFP learns strategies robust to adversarial exploitation and scales well with growing number of players’ resources.


Security games Nash equilibrium Fictitious Play 



This research was supported in part by NSF Research Grant IIS-1254206, NSF Research Grant IIS-1850477 and MURI Grant W911NF-11-1-0332.


  1. 1.
    Amin, K., Singh, S., Wellman, M.P.: Gradient methods for stackelberg security games. In: UAI, pp. 2–11 (2016)Google Scholar
  2. 2.
    Balduzzi, D., Racaniere, S., Martens, J., Foerster, J., Tuyls, K., Graepel, T.: The mechanics of n-player differentiable games. In: International Conference on Machine Learning (2018)Google Scholar
  3. 3.
    Basilico, N., Celli, A., De Nittis, G., Gatti, N.: Coordinating multiple defensive resources in patrolling games with alarm systems. In: Proceedings of the 16th Conference on Autonomous Agents and Multiagent Systems, pp. 678–686 (2017)Google Scholar
  4. 4.
    Behnezhad, S., Derakhshan, M., Hajiaghayi, M., Seddighin, S.: Spatio-temporal games beyond one dimension. In: Proceedings of the 2018 ACM Conference on Economics and Computation, pp. 411–428 (2018)Google Scholar
  5. 5.
    Cermák, J., Bošanský, B., Durkota, K., Lisý, V., Kiekintveld, C.: Using correlated strategies for computing stackelberg equilibria in extensive-form games. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, AAAI 2016, pp. 439–445 (2016)Google Scholar
  6. 6.
    Fang, F., Jiang, A.X., Tambe, M.: Optimal patrol strategy for protecting moving targets with multiple mobile resources. In: AAMAS, pp. 957–964 (2013)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Ferguson, T.S.: Game Theory, vol. 2 (2014).
  8. 8.
    Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep networks. arXiv preprint arXiv:1703.03400 (2017)
  9. 9.
    Gan, J., An, B., Vorobeychik, Y., Gauch, B.: Security games on a plane. In: AAAI, pp. 530–536 (2017)Google Scholar
  10. 10.
    Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. arXiv preprint arXiv:1801.01290 (2018)
  11. 11.
    Haskell, W., Kar, D., Fang, F., Tambe, M., Cheung, S., Denicola, E.: Robust protection of fisheries with compass. In: IAAI (2014)Google Scholar
  12. 12.
    Heinrich, J., Lanctot, M., Silver, D.: Fictitious self-play in extensive-form games. In: International Conference on Machine Learning, pp. 805–813 (2015)Google Scholar
  13. 13.
    Johnson, M.P., Fang, F., Tambe, M.: Patrol strategies to maximize pristine forest area. In: AAAI (2012)Google Scholar
  14. 14.
    Kamra, N., Fang, F., Kar, D., Liu, Y., Tambe, M.: Handling continuous space security games with neural networks. In: IWAISe: First International Workshop on Artificial Intelligence in Security (2017)Google Scholar
  15. 15.
    Kamra, N., Gupta, U., Fang, F., Liu, Y., Tambe, M.: Policy learning for continuous space security games using neural networks. In: AAAI (2018)Google Scholar
  16. 16.
    Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
  17. 17.
    Korzhyk, D., Yin, Z., Kiekintveld, C., Conitzer, V., Tambe, M.: Stackelberg vs. Nash in security games: an extended investigation of interchangeability, equivalence, and uniqueness. JAIR 41, 297–327 (2011)MathSciNetCrossRefGoogle Scholar
  18. 18.
    Krishna, V., Sjöström, T.: On the convergence of fictitious play. Math. Oper. Res. 23(2), 479–511 (1998)MathSciNetCrossRefGoogle Scholar
  19. 19.
    Lanctot, M., et al.: A unified game-theoretic approach to multiagent reinforcement learning. In: Advances in Neural Information Processing Systems, pp. 4190–4203 (2017)Google Scholar
  20. 20.
    Leslie, D.S., Collins, E.J.: Generalised weakened fictitious play. Games Econ. Behav. 56(2), 285–298 (2006)MathSciNetCrossRefGoogle Scholar
  21. 21.
    Lowe, R., Wu, Y., Tamar, A., Harb, J., Abbeel, O.P., Mordatch, I.: Multi-agent actor-critic for mixed cooperative-competitive environments. In: Advances in Neural Information Processing Systems, pp. 6379–6390 (2017)Google Scholar
  22. 22.
    Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning, pp. 1928–1937 (2016)Google Scholar
  23. 23.
    Perkins, S., Leslie, D.: Stochastic fictitious play with continuous action sets. J. Econ. Theory 152, 179–213 (2014)MathSciNetCrossRefGoogle Scholar
  24. 24.
    Rosenfeld, A., Kraus, S.: When security games hit traffic: optimal traffic enforcement under one sided uncertainty. In: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI-2017, pp. 3814–3822 (2017)Google Scholar
  25. 25.
    Shamma, J.S., Arslan, G.: Unified convergence proofs of continuous-time fictitious play. IEEE Trans. Autom. Control 49(7), 1137–1141 (2004)MathSciNetCrossRefGoogle Scholar
  26. 26.
    Wang, B., Zhang, Y., Zhong, S.: On repeated stackelberg security game with the cooperative human behavior model for wildlife protection. In: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems, AAMAS 2017, pp. 1751–1753 (2017)Google Scholar
  27. 27.
    Yang, R., Ford, B., Tambe, M., Lemieux, A.: Adaptive resource allocation for wildlife protection against illegal poachers. In: AAMAS (2014)Google Scholar
  28. 28.
    Yin, Y., An, B., Jain, M.: Game-theoretic resource allocation for protecting large public events. In: AAAI, pp. 826–833 (2014)Google Scholar
  29. 29.
    Zaheer, M., Kottur, S., Ravanbakhsh, S., Poczos, B., Salakhutdinov, R.R., Smola, A.J.: Deep sets. In: Advances in Neural Information Processing Systems, pp. 3394–3404 (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.University of Southern CaliforniaLos AngelesUSA
  2. 2.Carnegie Mellon UniversityPittsburghUSA
  3. 3.Harvard UniversityCambridgeUSA

Personalised recommendations