Abstract
The paper investigates a stochastic model where two agents (persons, companies, institutions, states, software agents or other) learn interactive behavior in a series of alternating moves. Each agent is assumed to perform “stimulus-response-consequence” learning, as studied in psychology. In the presented model, the response of one agent to the other agent's move is both the stimulus for the other agent's next move and part of the consequence for the other agent's previous move. After deriving general properties of the model, especially concerning convergence to limit cycles, we concentrate on an asymptotic case where the learning rate tends to zero (“slow learning”). In this case, the dynamics can be described by a system of deterministic differential equations. For reward structures derived from [2×2] bimatrix games, fixed points are determined, and for the special case of the prisoner's dilemma, the dynamics is analyzed in more detail on the assumptions that both agents start with the same or with different reaction probabilities.
Similar content being viewed by others
References
Banerjee, B., Mukherjee, R., Sen, S. (2000) Learning mutual trust. Working Notes of AGENTS-00 (Workshop on Deception, Fraud and Trust in Agent Societies), pp. 9–14.
Benaïm, M., Weibull, J. (2001) Deterministic Approximation of stochastic evolution in games. Technical Report, S-WOPEC (Scandinavian Working Papers in Economis).
Bereby-Meyer, Y., Erev, I. (1998) On learning to become a successful loser: a comparison of alternative abstractions of learning processes in the loss domain. J. of Mathematical Psychology 42, pp. 266–286.
Bernal, G., Golann, S. (1980) Couple interaction: a study of the punctuation process. Int. J. of Family Therapy 2, pp. 47–56.
Börgers, T., Sarin, R. (1997) Learning through reinforcement and replicator dynamics. J. of Economic Theory 77, pp. 1–14.
Brenner, T. (1999) Modelling Learning in Economics. Cheltenham: Edward Elgar Publishing Ltd.
Bush, R. R., Mosteller, F. (1955) Stochastic Model of Learning. New York: Wiley.
Claus, C., Boutilier, C. (1998) The dynamics of reinforcement learning in cooperative multiagent systems. Proc. Fifteenth National Conf. on Artificial Intelligence, AAAI Press / MIT Press, pp. 746–752.
Cross, J. G. (1973) A stochastic learning model of economic behavior. Quarterly J. of Economics 87, pp. 239–266.
Eder, A., Gutjahr, W.J., Neuwirth, E. (2001) Modelling social interactions by learning Markovian matrices. Proc. FACET'01 (Int. Conf. on Facet Theory), Prague, pp. 94–106.
Eder, A., Gutjahr, W.J. (2003) Can simulation techniques contribute to microsociological theory? The case of learning matrices. Developments in Applied Statistics (Eds.: A. Ferligoj, A. Mrvar), Metodoloski zvseski 19, FDV (Ljubljana), pp. 219–239.
Erev, I., Bereby-Meyer, Y., Roth, A.E. (1999) The effect of adding a constant to all payoffs: experimental investigation, and implications for reinforcement learning models. J. of Economic Behavior & Organisation 39, pp. 111–128.
Fudenberg, D., Levine, D. (1998) Theory of Learning in Games. Cambridge, MA: MIT-Press.
Greenwald, A., Friedman, E. J., Shenker, S. (2001) Learning in network contexts: experimental results from simulation. Games and Economic Behavior 35, pp. 80–123.
Gutjahr, W.J., Eder, A. (2001) A Markov model for dyadic interaction learning. Proc. EWRL'01 (Fifth European Workshop on Reinforcement Learning), Utrecht, Netherlands, pp. 17–18.
Herrnstein, R. J., Prelec, D. (1991) Melioration: a theory of distributed choice. J. of Economic Perspectives 5, pp. 137–156.
Hofbauer, J., Sigmund, K. (1998) Evolutionary Games and Population Dynamics. Cambridge UP.
Hopkins, E. (2002) Two competing models of how people learn in games. Econometrics 70, pp. 2141–2166.
Hu, J., Wellman, M.P. (1998) Multiagent reinforcement learning: Theoretical framework and an algorithm. Proc. Fifteenth International Conf. on Machine Learning, San Francisco, CA, pp. 242–250.
Laslier, J.-F., Topol, R., Walliser, B. (2001) A behavioral learning process in games. Games and Economic Behavior 37, pp. 340–366.
Littman, M.L. (1994) Markov games as a framework for multi-agent reinforcement learning. Proc. Eleventh International Conf. on Machine Learning, San Mateo, CA, pp. 157–163.
Nagel, R., Tang, F.F. (1998) Experimental results on the centipede game in normal form: an investigation on learning. J. of Mathematical Psychology 42, pp. 356–384.
Neill, D.B. (2001) Optimality under noise: higher memory strategies for the alternating prisoner's dilemma. J. Theor. Biol. 211, pp. 159–180.
Nowak, M.A., Sigmund, K. (1994) The alternating prisoner's dilemma. J. Theor. Biol. 168, pp. 219–226.
Posch, M. (1997) Cycling in a stochastic learning algorithm for normal form games. J. Evolutionary Econ. 7, pp. 1993–207.
Posch, M., Pichler, A., Sigmund, K. (1999) The efficiency of adapting aspiration levels. Proceedings of the Royal Society, Series B, 266, pp. 1427–1436.
Roth, A.E., Erev, I. (1995) Learning in extensive-form games: experimental data and simple dynamic models in the intermediate term. Games and Economic Behavior 8, pp. 164–212.
Rubinstein, A. (1982) Perfect equilibrum in a bargaining model. Econometrica 50, pp. 97–109.
Sandholm, T., Crites, R.H. (1995) Multiagent reinforcement learning in the iterated prisoner's dilemma. Biosystems 37, pp. 147–166.
Shohan, V., Rohrbaugh, M.J. (2002) Brief strategic couple therapy. Clinical Handbook of Couple Therapy, eds.: Gurman, A., Jacobson, N. S., pp. 5–26.
Skyrms, B., Pemantle, R. (2004) Learning to network. Technical Report, University of California-Irvine, http://hypatia.ss.uci.edu/lps/home/facstaff/faculty/skyrms/Skyrmspapers.html
Smith, R., Sola, M., Spagnolo, F. (2000) The prisoner's dilemma and regime-switching in the greek-turkish arms race. J. of Peace Research 37, pp. 737–750.
Wedekind, C., Milinski, M. (1996) Human cooperation in the simultaneous and the alternating Prioner's Dilemma: Pavlov versus generous tit-for-tat. Proc. Nat. Acad. Sci. USA 93, pp. 2686–2689.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Gutjahr, W.J. Interaction dynamics of two reinforcement learners. cent.eur.j.oper.res. 14, 59–86 (2006). https://doi.org/10.1007/s10100-006-0160-y
Issue Date:
DOI: https://doi.org/10.1007/s10100-006-0160-y