Combining Online Learning and Equilibrium Computation in Security Games

  • Richard KlímaEmail author
  • Viliam Lisý
  • Christopher Kiekintveld
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9406)


Game-theoretic analysis has emerged as an important method for making resource allocation decisions in both infrastructure protection and cyber security domains. However, static equilibrium models defined based on inputs from domain experts have weaknesses; they can be inaccurate, and they do not adapt over time as the situation (and adversary) evolves. In cases where there are frequent interactions with an attacker, using learning to adapt to an adversary revealed behavior may lead to better solutions in the long run. However, learning approaches need a lot of data, may perform poorly at the start, and may not be able to take advantage of expert analysis. We explore ways to combine equilibrium analysis with online learning methods with the goal of gaining the advantages of both approaches. We present several hybrid methods that combine these techniques in different ways, and empirically evaluated the performance of these methods in a game that models a border patrolling scenario.


Game theory Security games Online learning Stackelberg game Stackelberg equilibrium Nash equilibrium Border patrol Multi-armed bandit problem 



This research was supported by the Office of Naval Research Global (grant no. N62909-13-1-N256).


  1. 1.
    2012–2016 border patrol strategic plan. U.S. Customs and Border Protection (2012)Google Scholar
  2. 2.
    An, B., Brown, M., Vorobeychik, Y., Tambe, M.: Security games with surveillance cost and optimal timing of attack execution. In: AAMAS, pp. 223–230 (2013)Google Scholar
  3. 3.
    An, B., Kiekintveld, C., Shieh, E., Singh, S., Tambe, M., Vorobeychik, Y.: Security games with limited surveillance. In: AAAI, pp. 1241–1248 (2012)Google Scholar
  4. 4.
    Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multi-armed bandit problem. Mach. Learn. 47, 235–256 (2002)CrossRefzbMATHGoogle Scholar
  5. 5.
    Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.E.: The non-stochastic multi-armed bandit problem. SIAM J. Comput. 32(1), 48–77 (2001)MathSciNetCrossRefGoogle Scholar
  6. 6.
    Balcan, M.-F., Blum, A., Haghtalab, N., Procaccia, A.D.: Commitment without regrets: online learning in stackelberg security games. In: ACM Conference on Economics and Computation (EC-2015), pp. 61–78 (2015)Google Scholar
  7. 7.
    Bard, N., Johanson, M., Burch, N., Bowling, M.: Online implicit agent modelling. In: AAMAS, pp. 255–262 (2013)Google Scholar
  8. 8.
    Bard, N., Nicholas, D., Szepesvari, C., Bowling, M.: Decision-theoretic clustering of strategies. In: AAMAS, pp. 17–25 (2015)Google Scholar
  9. 9.
    Blum, A., Nika, H., Procaccia, F.D.: Lazy defenders are almost optimal against diligent attackers. In: AAAI, pp. 573–579 (2014)Google Scholar
  10. 10.
    Combes, R., Lelarge, M., Proutiere, A., Talebi, M.S.: Stochastic and adversarial combinatorial bandits (2015). arXiv:1502.03475
  11. 11.
    Cowling, P.I., Powley, E.J., Whitehouse, D.: Information set monte carlo tree search. IEEE Trans. Comput. Intell. AI Games 4, 120–143 (2012)CrossRefGoogle Scholar
  12. 12.
    Fudenberg, D., Levine, D.K.: The Theory of Learning in Games. The MIT Press, Cambridge (1998)zbMATHGoogle Scholar
  13. 13.
    Garivier, A., Moulines, E.: On upper-confidence bound policies for non-stationary bandit problems. In: ALT, pp. 174–188 (2011)Google Scholar
  14. 14.
    Kiekintveld, C., Jain, M., Tsai, J., Pita, J., Ordonez, F., Tambe, M.: Computing optimal randomized resource allocations for massive security games. In: AAMAS, pp. 689–696 (2009)Google Scholar
  15. 15.
    Kiekintveld, C., Kreinovich, V.: Efficient approximation for security games with interval uncertainty. In: AAAI, pp. 42–45 (2012)Google Scholar
  16. 16.
    Kiekintveld, C., Marecki, J., Tambe, M.: Approximation methods for infinite Bayesian Stackelberg games: modeling distributional payoff uncertainty. In: AAMAS, pp. 1005–1012 (2011)Google Scholar
  17. 17.
    Klima, R., Kiekintveld, C., Lisy, V.: Online learning methods for border patrol resource allocation. In: GAMESEC, pp. 340–349 (2014)Google Scholar
  18. 18.
    Nguyen, T.H., Jiang, A., Tambe, M.: Stop the compartmentalization: unified robust algorithms for handling uncertainties in security games. In: AAMAS, pp. 317–324 (2014)Google Scholar
  19. 19.
    Pita, J., Jain, M., Ordonez, F., Portway, C., Tambe, M., Western, C., Paruchuri, P., Kraus, S.: ARMOR security for los angeles international airport. In: AAAI, pp. 1884–1885 (2008)Google Scholar
  20. 20.
    Pita, J., Jain, M., Ordonez, F., Tambe, M., Kraus, S.: Robust solutions to stackelberg games: addressing bounded rationality and limited observations in human cognition. Artif. Intell. J. 174(15), 1142–1171 (2010)MathSciNetCrossRefzbMATHGoogle Scholar
  21. 21.
    Pita, J., John, R., Maheswaran, R., Tambe, M., Kraus, S.: A robust approach to addressing human adversaries in security games. In: European Conference on Artificial Intelligence (ECAI), pp. 660–665 (2012)Google Scholar
  22. 22.
    Shieh, E., An, B., Yang, R., Tambe, M., Baldwin, C., Direnzo, J., Meyer, G., Baldwin, C.W., Maule, B.J., Meyer, G.R.: PROTECT : a deployed game theoretic system to protect the ports of the United States. In: AAMAS, pp. 13–20 (2012)Google Scholar
  23. 23.
    Tambe, M.: Security and Game Theory: Algorithms, Deployed Systems, Lessons Learned. Cambridge University Press, Cambridge (2011)CrossRefGoogle Scholar
  24. 24.
    Tsai, J., Rathi, S., Kiekintveld, C., Ordóñez, F., Tambe, M.: IRIS - a tools for strategic security allocation in transportation networks. In: AAMAS, pp. 37–44 (2009)Google Scholar
  25. 25.
    Tsai, J., Yin, Z., Kwak, J.-Y., Kempe, D., Kiekintveld, C., Tambe, M.: Urban security: game-theoretic resource allocation in networked physical domains. In: AAAI, pp. 881–886 (2010)Google Scholar
  26. 26.
    Yang, R., Ford, B., Tambe, M., Lemieux, A.: Adaptive resource allocation for wildlife protection against illegal poachers. In: AAMAS, pp. 453–460 (2014)Google Scholar
  27. 27.
    Yang, R., Kiekintvled, C., Ordonez, F., Tambe, M., John, R.: Improving resource allocation strategies against human adversaries in security games: an extended study. Artif. Intell. J. (AIJ) 195, 440–469 (2013)CrossRefzbMATHGoogle Scholar
  28. 28.
    Yin, Z., Jain, M., Tambe, M., Ordonez, F.: Risk-averse strategies for security games with execution and observational uncertainty. In: AAAI, pp. 758–763 (2011)Google Scholar
  29. 29.
    Yin, Z., Korzhyk, D., Kiekintveld, C., Conitzer, V., Tambe, M.: Stackelberg vs. nash in security games: interchangeability, equivalence, and uniqueness. In: AAMAS, pp. 1139–1146 (2010)Google Scholar
  30. 30.
    Zhang, C., Sinha, A., Tambe, M.: Keeping pace with criminals: designing patrol allocation against adaptive opportunistic criminals. In: AAMAS, pp. 1351–1359 (2015)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Richard Klíma
    • 1
    • 4
    Email author
  • Viliam Lisý
    • 1
    • 2
  • Christopher Kiekintveld
    • 3
  1. 1.Department of Computer ScienceFEE, Czech Technical University in PraguePragueCzech Republic
  2. 2.Department of Computing ScienceUniversity of AlbertaEdmontonCanada
  3. 3.Computer Science DepartmentUniversity of Texas at El PasoEl PasoUSA
  4. 4.Department of Computer ScienceUniversity of LiverpoolLiverpoolUK

Personalised recommendations