Abstract
We investigate the impact of payoff shocks on the evolution of large populations of myopic players that employ simple strategy revision protocols such as the “imitation of success”. In the noiseless case, this process is governed by the standard (deterministic) replicator dynamics; in the presence of noise however, the induced stochastic dynamics are different from previous versions of the stochastic replicator dynamics (such as the aggregate-shocks model of Fudenberg and Harris in J Econ Theory 57(2):420–441, 1992). In this context, we show that strict equilibria are always stochastically asymptotically stable, irrespective of the magnitude of the shocks; on the other hand, in the high-noise regime, non-equilibrium states may also become stochastically asymptotically stable and dominated strategies may survive in perpetuity (they become extinct if the noise is low). Such behavior is eliminated if players are less myopic and revise their strategies based on their cumulative payoffs. In this case, we obtain a second order stochastic dynamical system where non-equilibrium states are no longer attracting and dominated strategies become extinct (a.s.), no matter the noise level.
Similar content being viewed by others
Notes
Importantly, fluctuations due to randomized choices disappear in the large population limit (Benaïm and Weibull 2003).
In a deterministic setting, Sandholm (2010) calls a dynamical system imitative if (in addition to some monotonicity requirements) strategies that are initially absent from the population do not appear. The biological stochastic replicator dynamics satisfy this condition, but this does not mean that they are derived from a revision protocol based on imitation of other agents; by contrast, the dynamics that we study in this paper are derived from such an imitation model, hence the name “imitation dynamics with payoff shocks”.
In the example of the choice of an itinerary to go to work, the size of the commuting population remains roughly constant over time spans allowing significant evolutions of behavior. Moreover, in the short-term, users typically have a standard itinerary, but may revise their choice occasionally, and may then get information on travel times and travel comfort from discussions with other commuters. These features suggest that our imitation dynamics with payoff shocks are a better fit for this situation than the stochastic dynamics of Fudenberg and Harris (1992). Of course, an even more realistic model of itinerary choice should also allow for innovation, that is, the possibility of trying an itinerary not based on imitation, but on an otherwise informed guess that this itinerary might be appealing.
Note that we are considering general payoff functions and not only multilinear (resp. linear) payoffs arising from asymmetric (resp. symmetric) random matching in finite N-person (resp. 2-person) games. This distinction is important as it allows our model to cover e.g. general traffic games as in Sandholm (2010).
In other words, \(\rho _{\alpha \beta }\) is the probability of an \(\alpha \)-strategist becoming a \(\beta \)-strategist up to normalization by the alarm clocks’ rate.
Modulo an additive constant which ensures that \(\rho \) is positive but which cancels out when it comes to the dynamics.
An important special case where it makes sense to consider correlated shocks is if the payoff functions \(v_{\alpha }(x)\) are derived from random matchings in a finite game whose payoff matrix is subject to stochastic perturbations. This specific disturbance model is discussed in Sect. 5.
The intermediate variable \(y_{\alpha }\) should be thought of as an evaluation of how good the strategy \(\alpha \) is, and the formula for \(x_{\alpha }\) as a way of transforming these evaluations into a strategy.
Elimination is obvious; for survival, simply add \(\frac{1}{2}\sigma _{\min }^{2}t\) to the exponents of (2.18) and recall that any Wiener process has \(\limsup _{t} W(t) > 0\) and \(\liminf _{t} W(t) <0\) (a.s.).
We are implicitly assuming here deterministic initial conditions, i.e. \(X(0) = x\) (a.s.) for some \(x\in \mathcal {X}\).
If several strategies are unaffected by noise, that is, are such that \(\sigma _{\alpha }=0\), then their relative shares remain constant (that is, if \(\alpha \) and \(\beta \) are two such strategies, then \(X_{\alpha }(t)/X_{\beta }(t) = X_{\alpha }(0)/X_{\beta }(0)\) for all \(t\ge 0\)). It follows from this observation and the above result that, almost surely, all these strategies are eliminated or all these strategies survive (and only them).
In the pure noise case of the model of Fudenberg and Harris (1992), what remains constant is the expected number of individuals playing a strategy. A crucial point here is that this number may grow to infinity. What happens to strategies affected by large aggregate shocks is that with small probability, the total number of individuals playing this strategy gets huge, but with a large probability (going to 1), it gets small (at least compared to the number of individuals playing other strategies). This can be seen as a gambler’s ruin phenomenon, which explains that even with a higher expected payoff than others (hence a higher expected subpopulation size), the frequency of a strategy may go to zero almost surely (see e.g. Robson and Samuelson 2011, Sect. 3.1.1). This cannot happen in our model since noise is added directly to the frequencies (which are bounded).
Put differently, it is more probable for X(n) to decrease rather than increase: \(X(n+2) > X(n)\) with probability 1 / 4 (i.e. if and only if \(\xi _{n}\) takes two positive steps), while \(X(n+2) < X(n)\) with probability 3 / 4.
Simply note that \(X_{\alpha ^{*}} = \big (1 + \sum _{\beta \in \mathcal {A}^{*}} \exp (Z_{\beta })\big )^{-1}\).
In a discrete time setting, if \(Z(n+1)= g(n) Z_n\) and \(g(n)=k_i\) with probability \(p_i\), what we mean is that the quantity that a.s. governs the long-term growth of Z is not \(E(g)=\sum _{i} p_i k_i\), but \(\exp (E (\ln g))= \prod _i k_i^{p_i}\).
Recall that \(\sum \nolimits _{\alpha } dV_{\alpha } = 0\) since \(\sum \nolimits _{\alpha } X_{\alpha } = 1\).
Recall that \(\int _{0}^{t} \sigma _{\alpha }(X(s)) \,dW_{\alpha }(s)\) is continuous, so there is no Itô correction.
Theorem 4.1 actually applies to mixed dominated strategies as well (even iteratively dominated ones). The proof is a simple adaptation of the pure strategies case, so we omit it.
Recall here that equilibria of \(\mathcal {G}\) are also equilibria of \(\mathcal {G}^{\sigma }\), but the converse need not hold.
References
Akin E (1980) Domination or equilibrium. Math Biosci 50(3–4):239–250
Benaïm M, Weibull JW (2003) Deterministic approximation of stochastic evolution in games. Econometrica 71(3):873–903
Bergstrom TC (2014) On the evolution of hoarding, risk-taking, and wealth distribution in nonhuman and human populations. Proc Natl Acad Sci USA 111(3):10,860–10,867
Bertsekas DP, Gallager R (1992) Data networks, 2nd edn. Prentice Hall, Englewood Cliffs
Björnerstedt J, Weibull JW (1996) Nash equilibrium and evolution by imitation. In: Arrow KJ, Colombatto E, Perlman M, Schmidt C (eds) The rational foundations of economic behavior. St. Martin’s Press, New York, pp 155–181
Bravo M, Mertikopoulos P (2014) On the robustness of learning in games with stochastically perturbed payoff observations. arXiv:1412.6565
Cabrales A (2000) Stochastic replicator dynamics. Int Econ Rev 41(2):451–481
Fudenberg D, Harris C (1992) Evolutionary dynamics with aggregate shocks. J Econ Theory 57(2):420–441
Hofbauer J, Imhof LA (2009) Time averages, recurrence and transience in the stochastic replicator dynamics. Ann Appl Probab 19(4):1347–1368
Hofbauer J, Sigmund K (2003) Evolutionary game dynamics. Bull Am Math Soc 40(4):479–519
Hofbauer J, Sorin S, Viossat Y (2009) Time average replicator and best reply dynamics. Math Oper Res 34(2):263–269
Imhof LA (2005) The long-run behavior of the stochastic replicator dynamics. Ann Appl Probab 15(1B):1019–1045
Karatzas I, Shreve SE (1998) Brownian motion and stochastic calculus. Springer-Verlag, Berlin
Khasminskii RZ (2012) Stochastic stability of differential equations, vol 66, 2nd edn. Stochastic modelling and applied probability. Springer-Verlag, Berlin
Khasminskii RZ, Potsepun N (2006) On the replicator dynamics behavior under Stratonovich type random perturbations. Stoch Dyn 6:197–211
Kuo HH (2006) Introduction to Stochastic integration. Springer, Berlin
Laraki R, Mertikopoulos P (2013) Higher order game dynamics. J Econ Theory 148(6):2666–2695
Laraki R, Mertikopoulos P (2015) Inertial game dynamics and applications to constrained optimization. SIAM J Control Optim (to appear)
Littlestone N, Warmuth MK (1994) The weighted majority algorithm. Inf Comput 108(2):212–261
Mertikopoulos P, Moustakas AL (2009) Learning in the presence of noise. In: GameNets ’09: proceedings of the 1st international conference on game theory for networks
Mertikopoulos P, Moustakas AL (2010) The emergence of rational behavior in the presence of stochastic perturbations. Ann Appl Probab 20(4):1359–1388
Nachbar JH (1990) Evolutionary selection dynamics in games. Int J Game Theory 19:59–89
Øksendal B (2007) Stochastic differential equations, 6th edn. Springer-Verlag, Berlin
Robson AJ, Samuelson L (2011) The evolutionary foundations of preferences. In: Benhabib J, Bisin A, Jackson MO (eds) Handbook of social economics, vol 1, chap 7. North-Holland, Amsterdam, pp 221–310
Rustichini A (1999) Optimal properties of stimulus-response learning models. Games Econ Behav 29:230–244
Samuelson L, Zhang J (1992) Evolutionary stability in asymmetric games. J Econ Theory 57:363–391
Sandholm WH (2010) Population games and evolutionary dynamics. Economic learning and social evolution. MIT Press, Cambridge, MA
Schlag KH (1998) Why imitate, and if so, how? A boundedly rational approach to multi-armed bandits. J Econ Theory 78(1):130–156
Sorin S (2009) Exponential weight algorithm in continuous time. Math Program 116(1):513–528
Taylor PD, Jonker LB (1978) Evolutionary stable strategies and game dynamics. Math Biosci 40(1–2):145–156
van Kampen NG (1981) Itô versus Stratonovich. J Stat Phys 24(1):175–187
Vlasic A (2012) Long-run analysis of the stochastic replicator dynamics in the presence of random jumps. arXiv:1206.0344
Vovk VG (1990) Aggregating strategies. In: COLT ’90: proceedings of the 3rd workshop on computational learning theory, pp 371–383
Weibull JW (1995) Evolutionary game theory. MIT Press, Cambridge, MA
Acknowledgments
Supported in part by the French National Research Agency under Grant No. GAGA–13–JS01–0004–01 and the French National Center for Scientific Research (CNRS) under Grant No. PEPS–GATHERING–2014. The authors are grateful to the associate editor in charge of the manuscript and to two anonymous referees for their insightful comments and remarks
Author information
Authors and Affiliations
Corresponding author
Additional information
Dedicated to Abraham “Merale” Neyman on the occasion of his 66th birthday.
Appendix: Auxiliary results from stochastic analysis
Appendix: Auxiliary results from stochastic analysis
In this appendix, we provide an asymptotic growth bound for Wiener processes relying on the law of the iterated logarithm. This result appears in a similar context in Bravo and Mertikopoulos (2014); the proof below is given only for completeness and ease of reference.
Lemma 6.1
Let \(W(t) = (W_{1}(t),\cdots ,W_{n}(t))\), \(t\ge 0\), be an n-dimensional Wiener processes and let Z(t) be a bounded, continuous process in \(\mathbb {R}^{n}\). Then:
for any function \(f:[0,\infty )\rightarrow \mathbb {R}\) such that \(\lim _{t\rightarrow \infty } \left( t\log \log t\right) ^{-1/2} f(t) = +\infty \).
Proof
Let \(\xi (t) = \int _{0}^{t} Z(s) \cdot dW(s) = \sum _{i=1}^{n} \int _{0}^{t} Z_{i}(s) \,dW_{i}(s)\). Then, the quadratic variation \(\rho = [\xi ,\xi ]\) of \(\xi \) satisfies:
where \(M = \sup _{t\ge 0} \left\| Z(t) \right\| ^{2} < +\infty \) (recall that Z(t) is bounded by assumption). On the other hand, by the time-change theorem for martingales (Øksendal 2007, Corollary 8.5.4), there exists a Wiener process \(\widetilde{W}(t)\) such that \(\xi (t) = \widetilde{W}(\rho (t))\), and hence:
Obviously, if \(\lim _{t\rightarrow \infty } \rho (t) \equiv \rho (\infty ) < +\infty \), \(\widetilde{W}(\rho (\infty ))\) is normally distributed so \(\widetilde{W}(\rho (t))/f(t) \rightarrow 0\) and there is nothing to show. Otherwise, if \(\lim _{t\rightarrow \infty } \rho (t) = +\infty \), the quadratic variation bound (6.2) and the law of the iterated logarithm yield:
and our claim follows. \(\square \)
Rights and permissions
About this article
Cite this article
Mertikopoulos, P., Viossat, Y. Imitation dynamics with payoff shocks. Int J Game Theory 45, 291–320 (2016). https://doi.org/10.1007/s00182-015-0505-7
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00182-015-0505-7