Interaction dynamics of two reinforcement learners

Gutjahr, Walter J.

doi:10.1007/s10100-006-0160-y

Interaction dynamics of two reinforcement learners

Published: February 2006

Volume 14, pages 59–86, (2006)
Cite this article

Central European Journal of Operations Research Aims and scope Submit manuscript

Walter J. Gutjahr¹

109 Accesses
Explore all metrics

Abstract

The paper investigates a stochastic model where two agents (persons, companies, institutions, states, software agents or other) learn interactive behavior in a series of alternating moves. Each agent is assumed to perform “stimulus-response-consequence” learning, as studied in psychology. In the presented model, the response of one agent to the other agent's move is both the stimulus for the other agent's next move and part of the consequence for the other agent's previous move. After deriving general properties of the model, especially concerning convergence to limit cycles, we concentrate on an asymptotic case where the learning rate tends to zero (“slow learning”). In this case, the dynamics can be described by a system of deterministic differential equations. For reward structures derived from [2×2] bimatrix games, fixed points are determined, and for the special case of the prisoner's dilemma, the dynamics is analyzed in more detail on the assumptions that both agents start with the same or with different reaction probabilities.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Banerjee, B., Mukherjee, R., Sen, S. (2000) Learning mutual trust. Working Notes of AGENTS-00 (Workshop on Deception, Fraud and Trust in Agent Societies), pp. 9–14.
Benaïm, M., Weibull, J. (2001) Deterministic Approximation of stochastic evolution in games. Technical Report, S-WOPEC (Scandinavian Working Papers in Economis).
Bereby-Meyer, Y., Erev, I. (1998) On learning to become a successful loser: a comparison of alternative abstractions of learning processes in the loss domain. J. of Mathematical Psychology 42, pp. 266–286.
Article Google Scholar
Bernal, G., Golann, S. (1980) Couple interaction: a study of the punctuation process. Int. J. of Family Therapy 2, pp. 47–56.
Article Google Scholar
Börgers, T., Sarin, R. (1997) Learning through reinforcement and replicator dynamics. J. of Economic Theory 77, pp. 1–14.
Article Google Scholar
Brenner, T. (1999) Modelling Learning in Economics. Cheltenham: Edward Elgar Publishing Ltd.
Google Scholar
Bush, R. R., Mosteller, F. (1955) Stochastic Model of Learning. New York: Wiley.
Google Scholar
Claus, C., Boutilier, C. (1998) The dynamics of reinforcement learning in cooperative multiagent systems. Proc. Fifteenth National Conf. on Artificial Intelligence, AAAI Press / MIT Press, pp. 746–752.
Cross, J. G. (1973) A stochastic learning model of economic behavior. Quarterly J. of Economics 87, pp. 239–266.
Article Google Scholar
Eder, A., Gutjahr, W.J., Neuwirth, E. (2001) Modelling social interactions by learning Markovian matrices. Proc. FACET'01 (Int. Conf. on Facet Theory), Prague, pp. 94–106.
Eder, A., Gutjahr, W.J. (2003) Can simulation techniques contribute to microsociological theory? The case of learning matrices. Developments in Applied Statistics (Eds.: A. Ferligoj, A. Mrvar), Metodoloski zvseski 19, FDV (Ljubljana), pp. 219–239.
Erev, I., Bereby-Meyer, Y., Roth, A.E. (1999) The effect of adding a constant to all payoffs: experimental investigation, and implications for reinforcement learning models. J. of Economic Behavior & Organisation 39, pp. 111–128.
Article Google Scholar
Fudenberg, D., Levine, D. (1998) Theory of Learning in Games. Cambridge, MA: MIT-Press.
Google Scholar
Greenwald, A., Friedman, E. J., Shenker, S. (2001) Learning in network contexts: experimental results from simulation. Games and Economic Behavior 35, pp. 80–123.
Article Google Scholar
Gutjahr, W.J., Eder, A. (2001) A Markov model for dyadic interaction learning. Proc. EWRL'01 (Fifth European Workshop on Reinforcement Learning), Utrecht, Netherlands, pp. 17–18.
Herrnstein, R. J., Prelec, D. (1991) Melioration: a theory of distributed choice. J. of Economic Perspectives 5, pp. 137–156.
Google Scholar
Hofbauer, J., Sigmund, K. (1998) Evolutionary Games and Population Dynamics. Cambridge UP.
Hopkins, E. (2002) Two competing models of how people learn in games. Econometrics 70, pp. 2141–2166.
Article Google Scholar
Hu, J., Wellman, M.P. (1998) Multiagent reinforcement learning: Theoretical framework and an algorithm. Proc. Fifteenth International Conf. on Machine Learning, San Francisco, CA, pp. 242–250.
Laslier, J.-F., Topol, R., Walliser, B. (2001) A behavioral learning process in games. Games and Economic Behavior 37, pp. 340–366.
Article Google Scholar
Littman, M.L. (1994) Markov games as a framework for multi-agent reinforcement learning. Proc. Eleventh International Conf. on Machine Learning, San Mateo, CA, pp. 157–163.
Nagel, R., Tang, F.F. (1998) Experimental results on the centipede game in normal form: an investigation on learning. J. of Mathematical Psychology 42, pp. 356–384.
Article Google Scholar
Neill, D.B. (2001) Optimality under noise: higher memory strategies for the alternating prisoner's dilemma. J. Theor. Biol. 211, pp. 159–180.
Article Google Scholar
Nowak, M.A., Sigmund, K. (1994) The alternating prisoner's dilemma. J. Theor. Biol. 168, pp. 219–226.
Article Google Scholar
Posch, M. (1997) Cycling in a stochastic learning algorithm for normal form games. J. Evolutionary Econ. 7, pp. 1993–207.
Google Scholar
Posch, M., Pichler, A., Sigmund, K. (1999) The efficiency of adapting aspiration levels. Proceedings of the Royal Society, Series B, 266, pp. 1427–1436.
Article Google Scholar
Roth, A.E., Erev, I. (1995) Learning in extensive-form games: experimental data and simple dynamic models in the intermediate term. Games and Economic Behavior 8, pp. 164–212.
Article Google Scholar
Rubinstein, A. (1982) Perfect equilibrum in a bargaining model. Econometrica 50, pp. 97–109.
Article Google Scholar
Sandholm, T., Crites, R.H. (1995) Multiagent reinforcement learning in the iterated prisoner's dilemma. Biosystems 37, pp. 147–166.
Article Google Scholar
Shohan, V., Rohrbaugh, M.J. (2002) Brief strategic couple therapy. Clinical Handbook of Couple Therapy, eds.: Gurman, A., Jacobson, N. S., pp. 5–26.
Skyrms, B., Pemantle, R. (2004) Learning to network. Technical Report, University of California-Irvine, http://hypatia.ss.uci.edu/lps/home/facstaff/faculty/skyrms/Skyrmspapers.html
Smith, R., Sola, M., Spagnolo, F. (2000) The prisoner's dilemma and regime-switching in the greek-turkish arms race. J. of Peace Research 37, pp. 737–750.
Google Scholar
Wedekind, C., Milinski, M. (1996) Human cooperation in the simultaneous and the alternating Prioner's Dilemma: Pavlov versus generous tit-for-tat. Proc. Nat. Acad. Sci. USA 93, pp. 2686–2689.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Statistics and Decision Support Systems, University of Vienna, Universitaetsstrasse 5/9, A-1010, Wien, Austria
Walter J. Gutjahr

Authors

Walter J. Gutjahr
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gutjahr, W.J. Interaction dynamics of two reinforcement learners. cent.eur.j.oper.res. 14, 59–86 (2006). https://doi.org/10.1007/s10100-006-0160-y

Download citation

Issue Date: February 2006
DOI: https://doi.org/10.1007/s10100-006-0160-y

Key words

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Interaction dynamics of two reinforcement learners

Abstract

Access this article

Similar content being viewed by others

Stubborn learning

Intrinsic fluctuations of reinforcement learning promote cooperation

Learning in Networked Interactions: A Replicator Dynamics Approach

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Key words

Navigation

Interaction dynamics of two reinforcement learners

Abstract

Access this article

Similar content being viewed by others

Stubborn learning

Intrinsic fluctuations of reinforcement learning promote cooperation

Learning in Networked Interactions: A Replicator Dynamics Approach

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Key words

Search

Navigation