Counter-Factual Reinforcement Learning: How to Model Decision-Makers That Anticipate the Future

  • Ritchie Lee
  • David H. Wolpert
  • James Bono
  • Scott Backhaus
  • Russell Bent
  • Brendan Tracey
Part of the Studies in Computational Intelligence book series (SCI, volume 474)


This chapter introduces a novel framework for modeling interacting humans in a multi-stage game. This ”iterated semi network-form game” framework has the following desirable characteristics: (1) Bounded rational players, (2) strategic players (i.e., players account for one another’s reward functions when predicting one another’s behavior), and (3) computational tractability even on real-world systems. We achieve these benefits by combining concepts from game theory and reinforcement learning. To be precise, we extend the bounded rational ”level-K reasoning” model to apply to games over multiple stages. Our extension allows the decomposition of the overall modeling problem into a series of smaller ones, each of which can be solved by standard reinforcement learning algorithms. We call this hybrid approach ”level-K reinforcement learning”. We investigate these ideas in a cyber battle scenario over a smart power grid and discuss the relationship between the behavior predicted by our model and what one might expect of real human defenders and attackers.


Reactive Power Reinforcement Learning Power Grid Solution Concept Reward Function 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bono, J., Wolpert, D.H.: Decision-theoretic prediction and policy design of gdp slot auctions (2011), Available at SSRN:
  2. 2.
    Brunner, C., Camerer, C.F., Goeree, J.K.: A correction and re-examination of ’stationary concepts for experimental 2x2 games’. American Economic Review (2010)Google Scholar
  3. 3.
    Busoniu, L., Babuska, R., De Schutter, B., Damien, E.: Reinforcement learning and dynamic programming using function approximators. CRC Press (2010)Google Scholar
  4. 4.
    Camerer, C.F.: An experimental test of several generalized utility theories. Journal of Risk and Uncertainty 2(1), 61–104 (1989)CrossRefGoogle Scholar
  5. 5.
    Camerer, C.F.: Behavioral game theory: experiments in strategic interaction. Princeton University Press (2003)Google Scholar
  6. 6.
    Camerer, C., Ho, T.H., Chong, J.K.: A cognitive hierarchy model of games. Quarterly Journal of Economics 119(3), 861–898 (2006)CrossRefGoogle Scholar
  7. 7.
    Cárdenas, A., Amin, A., Sastry, S.: Research challenges for the security of control systems. In: Proceedings of the 3rd Conference on Hot Topics in Security, Berkeley, CA, USA, pp. 6:1–6:6. USENIX Association (2008)Google Scholar
  8. 8.
    Chellapilla, K., Fogel, D.B.: Evolving an expert checkers playing program without using human expertise. IEEE Transactions on Evolutionary Computation 5(4), 422–428 (2001)CrossRefGoogle Scholar
  9. 9.
    Costa-Gomes, M., Crawford, V.: Cognition and behavior in two-person guessing games: An experimental study. American Economic Review 96(5), 1737–1768 (2006)CrossRefGoogle Scholar
  10. 10.
    Costa-Gomes, M.A., Crawford, V.P., Iriberri, N.: Comparing models of strategic thinking in Van Huyck, Battalio, and Beil’s coordination games. Journal of the European Economic Association (2009)Google Scholar
  11. 11.
    Crawford, V.P.: Level-k thinking. Plenary lecture. 2007 North American Meeting of the Economic Science Association. Tucson, Arizona (2007)Google Scholar
  12. 12.
    Dayan, P., Balleine, B.W.: Reward, motivation, and reinforcement learning. Neuron 36(2), 285–298 (2002)CrossRefGoogle Scholar
  13. 13.
    Fogel, D.B.: Evolutionary computation: Toward a new philosophy of machine intelligence, 3rd edn. IEEE Press (2006)Google Scholar
  14. 14.
    Fudenberg, D., Levine, D.K.: The theory of learning in games. MIT Press (1998)Google Scholar
  15. 15.
    Gmytrasiewicz, P.J., Doshi, P.: A framework for sequential planning in multi-agent settings. Journal of Artificial Intelligence Research 24, 49–79 (2005)zbMATHGoogle Scholar
  16. 16.
    Halpern, J.Y., Rego, L.C.: Extensive games with possibly unaware players (2007) (Working paper),
  17. 17.
    Harsanyi, J.: Games with Incomplete Information Played by Bayesian Players, I-III. Part I. The Basic Model. Management Science 14(3) (1967)Google Scholar
  18. 18.
    Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: A survey. Journal of Artificial Intelligence Research 4, 237–285 (1996)Google Scholar
  19. 19.
    Kagel, J.H., Roth, A.E.: The handbook of experimental economics. Princeton University Press (1995)Google Scholar
  20. 20.
    Kandori, M., Mailath, M., Rob, R.: Learning, mutation, and long run equilibria in games. Econometrica 61(1), 29–53 (1993)MathSciNetzbMATHCrossRefGoogle Scholar
  21. 21.
    Koller, D., Friedman, N.: Probabilistic graphical models: principles and techniques. MIT Press (2009)Google Scholar
  22. 22.
    Kullback, S.: Information theory and statistics. John Wiley and Sons, New York (1959)zbMATHGoogle Scholar
  23. 23.
    Kundur, P.: Power system stability and control. McGraw-Hill, New York (1993)Google Scholar
  24. 24.
    Lee, R., Wolpert, D.: Game Theoretic Modeling of Pilot Behavior during Mid-Air Encounters. In: Guy, T.V., Kárný, M., Wolpert, D.H. (eds.) Decision Making with Imperfect Decision Makers. ISRL, vol. 28, pp. 75–111. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  25. 25.
    Maia, T.: Reinforcement learning, conditioning, and the brain: Successes and challenges. Cognitive, Affective, & Behavioral Neuroscience 9(4), 343–364 (2009)MathSciNetCrossRefGoogle Scholar
  26. 26.
    Maia, T.V., Frank, M.J.: From reinforcement learning models to psychiatric and neurological. Nature Neuroscience 14, 154–162 (2011)CrossRefGoogle Scholar
  27. 27.
    McKelvey, R., Palfrey, T.R.: Quantal response equilibria for normal form games. Games and Economic Behavior 10(1), 6–38 (1995)MathSciNetzbMATHCrossRefGoogle Scholar
  28. 28.
    McKelvey, R., Palfrey, T.R.: Quantal response equilibria for extensive form games. Experimental Economics 1, 9–41 (1998), 10.1023/A:1009905800005Google Scholar
  29. 29.
    Moriarty, D.E., Schultz, A.C., Grefenstette, J.J.: Evolutionary algorithms for reinforcement learning. The Journal of Artificial Intelligence Research 11, 241–276 (1999)zbMATHGoogle Scholar
  30. 30.
    Myerson, R.B.: Game theory: Analysis of conflict. Harvard University Press (1997)Google Scholar
  31. 31.
    Nagel, R.: Unraveling in guessing games: An experimental study. The American Economic Review 85(5), 1313–1326 (1995)Google Scholar
  32. 32.
    Plott, C.R., Smith, V.L.: The handbook of experimental economics. North-Holland, Oxford (2008)Google Scholar
  33. 33.
    Robert, C.P., Casella, G.: Monte Carlo statistical methods, 2nd edn. Springer (2004)Google Scholar
  34. 34.
    Rummery, G.A., Niranja, M.: Online Q-learning using connectionist systems. Technical report CUED/F-INFENG/TR 166. Engineering department, Cambridge University (1994)Google Scholar
  35. 35.
    Simon, H.A.: Rational choice and the structure of the environment. Psychological Review 63(2), 129–138 (1956)CrossRefGoogle Scholar
  36. 36.
    Singh, S.P., Jaakkola, T., Jordan, M.I.: Learning without state-estimation in partially observable Markovian decision problems. In: Proceedings of the Eleventh International Conference on Machine Learning, San Francisco, pp. 284–292 (1994)Google Scholar
  37. 37.
    Stahl, D.O., Wilson, P.W.: On players’ models of other players: Theory and experimental evidence. Games and Economic Behavior 10(1), 218–254 (1995)MathSciNetzbMATHCrossRefGoogle Scholar
  38. 38.
    Sutton, R.S., Barto, A.G.: Reinforcement learning: An introduction. MIT Press (1998)Google Scholar
  39. 39.
    Tomsovic, K., Bakken, D.E., Venkatasubramanian, V., Bose, A.: Designing the next generation of real-time control, communication, and computations for large power systems. Proceedings of the IEEE 93(5), 965–979 (2005)CrossRefGoogle Scholar
  40. 40.
    Turitsyn, K., Sulc, P., Backhaus, S., Chertkov, M.: Options for control of reactive power by distributed photovoltaic generators. Proceedings of the IEEE 99(6), 1063–1073 (2011)CrossRefGoogle Scholar
  41. 41.
    Wolpert, D.H., Bono, J.W.: Distribution-valued solution concepts. Working paper (2011)Google Scholar
  42. 42.
    Wolpert, D.H.: Unawareness, information theory, and multiagent influence diagrams. Working paper (2012)Google Scholar
  43. 43.
    Wright, J.R., Leyton-Brown, K.: Beyond equilibrium: Predicting human behavior in normal form games. In: Twenty-Fourth Conference on Artificial Intelligence, AAAI 2010 (2010)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Ritchie Lee
    • 1
  • David H. Wolpert
    • 2
    • 3
  • James Bono
    • 4
  • Scott Backhaus
    • 5
  • Russell Bent
    • 6
  • Brendan Tracey
    • 7
  1. 1.Carnegie Mellon University Silicon ValleyNASA Ames Research ParkMoffett FieldUSA
  2. 2.Santa Fe InstituteSanta FeUSA
  3. 3.Los Alamos National LaboratoryLos AlamosUSA
  4. 4.American UniversityNWUSA
  5. 5.Los Alamos National LaboratoryLos AlamosUSA
  6. 6.Los Alamos National LaboratoryLos AlamosUSA
  7. 7.Stanford UniversityStanfordUSA

Personalised recommendations