Skip to main content
Log in

Interaction dynamics of two reinforcement learners

  • Published:
Central European Journal of Operations Research Aims and scope Submit manuscript

Abstract

The paper investigates a stochastic model where two agents (persons, companies, institutions, states, software agents or other) learn interactive behavior in a series of alternating moves. Each agent is assumed to perform “stimulus-response-consequence” learning, as studied in psychology. In the presented model, the response of one agent to the other agent's move is both the stimulus for the other agent's next move and part of the consequence for the other agent's previous move. After deriving general properties of the model, especially concerning convergence to limit cycles, we concentrate on an asymptotic case where the learning rate tends to zero (“slow learning”). In this case, the dynamics can be described by a system of deterministic differential equations. For reward structures derived from [2×2] bimatrix games, fixed points are determined, and for the special case of the prisoner's dilemma, the dynamics is analyzed in more detail on the assumptions that both agents start with the same or with different reaction probabilities.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Banerjee, B., Mukherjee, R., Sen, S. (2000) Learning mutual trust. Working Notes of AGENTS-00 (Workshop on Deception, Fraud and Trust in Agent Societies), pp. 9–14.

  2. Benaïm, M., Weibull, J. (2001) Deterministic Approximation of stochastic evolution in games. Technical Report, S-WOPEC (Scandinavian Working Papers in Economis).

  3. Bereby-Meyer, Y., Erev, I. (1998) On learning to become a successful loser: a comparison of alternative abstractions of learning processes in the loss domain. J. of Mathematical Psychology 42, pp. 266–286.

    Article  Google Scholar 

  4. Bernal, G., Golann, S. (1980) Couple interaction: a study of the punctuation process. Int. J. of Family Therapy 2, pp. 47–56.

    Article  Google Scholar 

  5. Börgers, T., Sarin, R. (1997) Learning through reinforcement and replicator dynamics. J. of Economic Theory 77, pp. 1–14.

    Article  Google Scholar 

  6. Brenner, T. (1999) Modelling Learning in Economics. Cheltenham: Edward Elgar Publishing Ltd.

    Google Scholar 

  7. Bush, R. R., Mosteller, F. (1955) Stochastic Model of Learning. New York: Wiley.

    Google Scholar 

  8. Claus, C., Boutilier, C. (1998) The dynamics of reinforcement learning in cooperative multiagent systems. Proc. Fifteenth National Conf. on Artificial Intelligence, AAAI Press / MIT Press, pp. 746–752.

  9. Cross, J. G. (1973) A stochastic learning model of economic behavior. Quarterly J. of Economics 87, pp. 239–266.

    Article  Google Scholar 

  10. Eder, A., Gutjahr, W.J., Neuwirth, E. (2001) Modelling social interactions by learning Markovian matrices. Proc. FACET'01 (Int. Conf. on Facet Theory), Prague, pp. 94–106.

  11. Eder, A., Gutjahr, W.J. (2003) Can simulation techniques contribute to microsociological theory? The case of learning matrices. Developments in Applied Statistics (Eds.: A. Ferligoj, A. Mrvar), Metodoloski zvseski 19, FDV (Ljubljana), pp. 219–239.

  12. Erev, I., Bereby-Meyer, Y., Roth, A.E. (1999) The effect of adding a constant to all payoffs: experimental investigation, and implications for reinforcement learning models. J. of Economic Behavior & Organisation 39, pp. 111–128.

    Article  Google Scholar 

  13. Fudenberg, D., Levine, D. (1998) Theory of Learning in Games. Cambridge, MA: MIT-Press.

    Google Scholar 

  14. Greenwald, A., Friedman, E. J., Shenker, S. (2001) Learning in network contexts: experimental results from simulation. Games and Economic Behavior 35, pp. 80–123.

    Article  Google Scholar 

  15. Gutjahr, W.J., Eder, A. (2001) A Markov model for dyadic interaction learning. Proc. EWRL'01 (Fifth European Workshop on Reinforcement Learning), Utrecht, Netherlands, pp. 17–18.

  16. Herrnstein, R. J., Prelec, D. (1991) Melioration: a theory of distributed choice. J. of Economic Perspectives 5, pp. 137–156.

    Google Scholar 

  17. Hofbauer, J., Sigmund, K. (1998) Evolutionary Games and Population Dynamics. Cambridge UP.

  18. Hopkins, E. (2002) Two competing models of how people learn in games. Econometrics 70, pp. 2141–2166.

    Article  Google Scholar 

  19. Hu, J., Wellman, M.P. (1998) Multiagent reinforcement learning: Theoretical framework and an algorithm. Proc. Fifteenth International Conf. on Machine Learning, San Francisco, CA, pp. 242–250.

  20. Laslier, J.-F., Topol, R., Walliser, B. (2001) A behavioral learning process in games. Games and Economic Behavior 37, pp. 340–366.

    Article  Google Scholar 

  21. Littman, M.L. (1994) Markov games as a framework for multi-agent reinforcement learning. Proc. Eleventh International Conf. on Machine Learning, San Mateo, CA, pp. 157–163.

  22. Nagel, R., Tang, F.F. (1998) Experimental results on the centipede game in normal form: an investigation on learning. J. of Mathematical Psychology 42, pp. 356–384.

    Article  Google Scholar 

  23. Neill, D.B. (2001) Optimality under noise: higher memory strategies for the alternating prisoner's dilemma. J. Theor. Biol. 211, pp. 159–180.

    Article  Google Scholar 

  24. Nowak, M.A., Sigmund, K. (1994) The alternating prisoner's dilemma. J. Theor. Biol. 168, pp. 219–226.

    Article  Google Scholar 

  25. Posch, M. (1997) Cycling in a stochastic learning algorithm for normal form games. J. Evolutionary Econ. 7, pp. 1993–207.

    Google Scholar 

  26. Posch, M., Pichler, A., Sigmund, K. (1999) The efficiency of adapting aspiration levels. Proceedings of the Royal Society, Series B, 266, pp. 1427–1436.

    Article  Google Scholar 

  27. Roth, A.E., Erev, I. (1995) Learning in extensive-form games: experimental data and simple dynamic models in the intermediate term. Games and Economic Behavior 8, pp. 164–212.

    Article  Google Scholar 

  28. Rubinstein, A. (1982) Perfect equilibrum in a bargaining model. Econometrica 50, pp. 97–109.

    Article  Google Scholar 

  29. Sandholm, T., Crites, R.H. (1995) Multiagent reinforcement learning in the iterated prisoner's dilemma. Biosystems 37, pp. 147–166.

    Article  Google Scholar 

  30. Shohan, V., Rohrbaugh, M.J. (2002) Brief strategic couple therapy. Clinical Handbook of Couple Therapy, eds.: Gurman, A., Jacobson, N. S., pp. 5–26.

  31. Skyrms, B., Pemantle, R. (2004) Learning to network. Technical Report, University of California-Irvine, http://hypatia.ss.uci.edu/lps/home/facstaff/faculty/skyrms/Skyrmspapers.html

  32. Smith, R., Sola, M., Spagnolo, F. (2000) The prisoner's dilemma and regime-switching in the greek-turkish arms race. J. of Peace Research 37, pp. 737–750.

    Google Scholar 

  33. Wedekind, C., Milinski, M. (1996) Human cooperation in the simultaneous and the alternating Prioner's Dilemma: Pavlov versus generous tit-for-tat. Proc. Nat. Acad. Sci. USA 93, pp. 2686–2689.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gutjahr, W.J. Interaction dynamics of two reinforcement learners. cent.eur.j.oper.res. 14, 59–86 (2006). https://doi.org/10.1007/s10100-006-0160-y

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10100-006-0160-y

Key words

Navigation