Skip to main content
Log in

A Theoretical Analysis of Temporal Difference Learning in the Iterated Prisoner’s Dilemma Game

  • Original Article
  • Published:
Bulletin of Mathematical Biology Aims and scope Submit manuscript

Abstract

Direct reciprocity is a chief mechanism of mutual cooperation in social dilemma. Agents cooperate if future interactions with the same opponents are highly likely. Direct reciprocity has been explored mostly by evolutionary game theory based on natural selection. Our daily experience tells, however, that real social agents including humans learn to cooperate based on experience. In this paper, we analyze a reinforcement learning model called temporal difference learning and study its performance in the iterated Prisoner’s Dilemma game. Temporal difference learning is unique among a variety of learning models in that it inherently aims at increasing future payoffs, not immediate ones. It also has a neural basis. We analytically and numerically show that learners with only two internal states properly learn to cooperate with retaliatory players and to defect against unconditional cooperators and defectors. Four-state learners are more capable of achieving a high payoff against various opponents. Moreover, we numerically show that four-state learners can learn to establish mutual cooperation for sufficiently small learning rates.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Axelrod, R., 1984. Evolution of Cooperation. Basic Books, New York.

    Google Scholar 

  • Camerer, C., Ho, T.-H., 1999. Experience-weighted attraction learning in normal form games. Econometrica 67, 827–874.

    Article  MATH  Google Scholar 

  • Camerer, C.F., 2003. Behavioral Game Theory. Princeton University Press, New York.

    MATH  Google Scholar 

  • Cheung, Y.-W., Friedman, D., 1997. Individual learning in normal form games: some laboratory results. Games Econ. Behav. 19, 46–76.

    Article  MATH  MathSciNet  Google Scholar 

  • Daw, N.D., Doya, K., 2006. The computational neurobiology of learning and reward. Curr. Opin. Neurobiol. 16, 199–204.

    Article  Google Scholar 

  • Erev, I., Roth, A.E., 1998. Predicting how people play games: reinforcement learning in experimental games with unique, mixed strategy equilibria. Am. Econ. Rev. 88, 848–881.

    Google Scholar 

  • Erev, I., Roth, A.E., 2001. On simple reinforcement learning models and reciprocation in the prisoner dilemma game. In: Gigerenzer, G., Selten, R. (Eds.), The Adaptive Toolbox, pp. 215–231. MIT Press, Cambridge

    Google Scholar 

  • Fudenberg, D., Levine, D.K., 1998. The Theory of Learning in Games. MIT Press, Cambridge.

    MATH  Google Scholar 

  • Gutnisky, D.A., Zanutto, B.S., 2004. Cooperation in the iterated Prisoner’s Dilemma is learned by operant conditioning mechanisms. Artif. Life 10, 433–461.

    Article  Google Scholar 

  • Hauert, C., Stenull, O., 2002. Simple adaptive strategy wins the Prisoner’s Dilemma. J. Theor. Biol. 218, 261–272.

    Article  MathSciNet  Google Scholar 

  • Kaelbling, L.P., Littman, M.L., Moore, A.W., 1996. Reinforcement learning: a survey. J. Artif. Intell. Res. 4, 237–285.

    Google Scholar 

  • Kraines, D., Kraines, V., 1993. Learning to cooperate with Pavlov. An adaptive strategy for the iterated Prisoner’s Dilemma with noise. Theory Decis. 35, 107–150.

    Article  MATH  MathSciNet  Google Scholar 

  • Macy, M.W., 1991. Learning to cooperate: stochastic and tacit collusion in social exchange. Am. J. Sociol. 97, 808–843.

    Article  Google Scholar 

  • Macy, M., 1996. Natural selection and social learning in Prisoner’s Dilemma. Sociol. Methods Res. 25, 103–137.

    Article  Google Scholar 

  • Macy, M.W., Flache, A., 2002. Learning dynamics in social dilemmas. Proc. Natl. Acad. Sci. USA 99(3), 7229–7236.

    Article  Google Scholar 

  • Montague, P.R., Berns, G.S., 2002. Neural economics and the biological substrates of valuation. Neuron 36, 265–284.

    Article  Google Scholar 

  • Montague, P.R., King-Casas, B., Cohen, J.D., 2006. Imaging valuation models in human choice. Annu. Rev. Neurosci. 29, 417–448.

    Article  Google Scholar 

  • Mookherjee, D., Sopher, B., 1994. Learning behavior in an experimental matching pennies game. Games Econ. Behav. 7, 62–91.

    Article  MATH  MathSciNet  Google Scholar 

  • Nowak, M., Sigmund, K., 1989. Game dynamical aspects of the Prisoner’s Dilemma. J. Appl. Math. Comput. 30, 191–213.

    Article  MATH  MathSciNet  Google Scholar 

  • Nowak, M., Sigmund, K., 1990. The evolution of stochastic strategies in the Prisoner’s Dilema. Acta Appl. Math. 20, 247–265.

    Article  MATH  MathSciNet  Google Scholar 

  • Nowak, M.A., 2006. Five rules for the evolution of cooperation. Science 314, 1560–1563.

    Article  Google Scholar 

  • Nowak, M.A., Sigmund, K., 1992. Tit for tat in heterogeneous populations. Nature 355, 250–253.

    Article  Google Scholar 

  • Nowak, M.A., Sigmund, K., 1993. A strategy of win-stay lose-shift that outperforms tit-for-tat in the Prisoner’s Dilemma game. Nature 364, 56–58.

    Article  Google Scholar 

  • Nowak, M.A., Sigmund, K., El-Sedy, E., 1995. Automata, repeated games and noise. J. Math. Biol. 33, 703–722.

    Article  MATH  MathSciNet  Google Scholar 

  • Ohtsuki, H., 2004. Reactive strategies in indirect reciprocity. J. Theor. Biol. 227, 299–314.

    Article  MathSciNet  Google Scholar 

  • Posch, M., Pichler, A., Sigmund, K., 1999. The efficiency of adapting aspiration levels. Proc. R. Soc. Lond. B 266, 1427–1435.

    Article  Google Scholar 

  • Rapoport, A., Chammah, A.M., 1965. Prisoner’ s Dilemma: A Study in Conflict and Cooperation. University of Michigan Press, Ann Arbor.

    Google Scholar 

  • Roth, A.E., Erev, I., 1995. Learning in extensive-form games: experimental data and simple dynamic models in the intermediate term. Games Econ. Behav. 8, 164–212.

    Article  MATH  MathSciNet  Google Scholar 

  • Samuel, A.L., 1959. Some studies in machine learning using the game of checkers. IBM J. Res. Dev. 3, 210–229.

    Article  MathSciNet  Google Scholar 

  • Sandholm, T.W., Crites, R.H., 1996. Multiagent reinforcement learning in the Iterated Prisoner’s Dilemma. BioSystems 37, 147–166.

    Article  Google Scholar 

  • Sato, Y., Akiyama, E., Farmer, J.D., 2002. Chaos in learning a simple two-person game. Proc. Natl. Acad. Sci. USA 99, 4748–4751.

    Article  MATH  MathSciNet  Google Scholar 

  • Schultz, W., Dayan, P., Montague, P.R., 1997. A neural substrate of prediction and reward. Science 275, 1593–1599.

    Article  Google Scholar 

  • Singh, S.P., Jaakkola, T., Jordan, M.L., 1994. Learning without state-estimation in partially observable Markovian decision processes. In: Proc. the Eleventh Machine Learning Conference

  • Singh, S., Jaakkola, T., Littman, M.L., Szepesvári, C., 2000. Convergence results for single-step on-policy reinforcement algorithms. Mach. Learn. 39, 287–308.

    Article  Google Scholar 

  • Sutton, R.S., Barto, A.G., 1998. Reinforcement Learning. MIT Press, Cambridge.

    Google Scholar 

  • Taiji, M., Ikegami, T., 1999. Dynamics of internal models in game players. Physica D 134, 253–266.

    Article  MATH  MathSciNet  Google Scholar 

  • Tesauro, G., 1992. Practical issues in temporal difference learning. Mach. Learn. 8, 257–277.

    MATH  Google Scholar 

  • Trivers, R., 1971. The evolution of reciprocal altruism. Q. Rev. Biol. 46, 35–57.

    Article  Google Scholar 

  • Watkins, C.J.C.H., Dayan, P., 1992. Q-learning. Mach. Learn. 8, 279–292.

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Naoki Masuda.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Masuda, N., Ohtsuki, H. A Theoretical Analysis of Temporal Difference Learning in the Iterated Prisoner’s Dilemma Game. Bull. Math. Biol. 71, 1818–1850 (2009). https://doi.org/10.1007/s11538-009-9424-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11538-009-9424-8

Keywords

Navigation