Skip to main content
Log in

Convergence results on stochastic adaptive learning

  • Research Article
  • Published:
Economic Theory Aims and scope Submit manuscript

Abstract

We investigate an adaptive learning model which nests several existing learning models such as payoff assessment learning, valuation learning, stochastic fictitious play learning, experience-weighted attraction learning and delta learning with foregone payoff information in normal form games. In particular, we consider adaptive players each of whom assigns payoff assessments to his own actions, chooses the action which has the highest assessment with some perturbations and updates the assessments using observed payoffs, which may include payoffs from unchosen actions. Then, we provide conditions under which the learning process converges to a quantal response equilibrium in normal form games.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. In this paper, we focus on the case in which the probability of choosing an action depends on and is proportional to the past performance of the action. Hart and Mas-Colell (2000) provide a model of adaptive players each of whom regrets the foregone payoffs and the probability of changing his decision depends on his regret. One important difference between this model and their model is that in their model, the probability of each player choosing one action depends on what he is currently choosing: whether each adaptive player changes his current action or not depends on the regret that he has experienced from the action, so that the probability of one action profile being chosen depends on which action profile is currently chosen. As a result, this dependency allows the convergence to a correlated equilibrium. In this paper, we do not allow the dependency and focus on the convergence to a non-correlated equilibrium.

  2. For details, see Funai (2016a).

  3. For the method, see Benaïm (1999) and Borkar (2008).

  4. Among others, see Hopkins (2002) and Leslie and Collins (2005).

  5. When it is obvious, we omit the index set and denote the sequence by \(\{ {\mathcal {F}}_{n}\}\). This rule is also applied to the other sequences with index set \({\mathbb {N}}_{0}\).

  6. The formal description of each player’s decision rule is further specified in Sect. 2.2.2.

  7. Note that when \(\gamma =0\), players do not take into account the foregone payoff information and thus there is no distortion on the foregone payoffs. Mathematically, note that \(\gamma _{n,i,s_{i}}{\tilde{\pi }}_{n,i,s_{i}}= {\mathbb {1}}_{n,s_{i}} \pi _{n,s_{i}} \) if \(\gamma =0\) and thus \(\delta \) does not affect the payoffs. Therefore, we impose this technical assumption for analytical convenience in Sect. 4.2.

  8. Therefore, \({\mathbb {1}}_{n,s_{i}}\) is also \({\mathcal {F}}_{n+1}\)-measurable for each i and \(s_{i}\).

  9. Also, the perturbations can be interpreted as random payoffs. In the SFPL model, players experience some random payoffs in addition to the payoffs from the game. Then, after the realisation of the random payoffs, each player chooses the action with the highest total value of the random payoff and the expected payoff with the empirical distribution over opponent actions. Here, the expected payoff and the random payoff of each action correspond to the payoff assessment and the perturbation, respectively.

  10. Property (i) holds due to the (conditional) dominated convergence theorem and the assumption that there exists a positive density for the perturbation profile.

  11. Another widely acknowledged choice rule is the linear choice rule

    $$\begin{aligned} C_{i,s_{i}}(Q_{n,i}) =\frac{Q_{n, i,s_{i}}}{\sum _{t_{i} \in S_{i}}Q_{n, i,t_{i}}}, \end{aligned}$$

    which is adopted by Beggs (2005), Erev and Roth (1998) and Roth and Erev (1995). In this paper, the choice rule is not considered, as it cannot be obtained by the perturbed assessment maximisation. For example, see Proposition 2.3 of Hofbauer and Sandholm (2002).

  12. In this paper, players use the joint empirical distribution on their opponents’ actions, rather than the marginal distribution on each individual opponent’s actions, which is a widely accepted version of stochastic fictitious play. Note that if there exist only two players, the two versions are equivalent; however, if there exist more than two players, then the joint empirical distribution can be correlated in the variant of this paper. Note that the original model of Fudenberg and Kreps (1993) allows the correlation.

  13. \(\overline{D}\) can be a random variable.

  14. Note that since \(\delta =1\) if \(\gamma \in [0,1)\), for each ni and \(s_{i}\),

    1. 1.

      if \(\gamma =1\), \(\frac{\lambda _{n,i,s_{i}} \gamma _{n,i,s_{i}} }{\alpha _{n,i,s_{i}}} {\tilde{\pi }}_{n,i,s_{i}} ={\tilde{\pi }}_{n,i,s_{i}}\);

    2. 2.

      if \(\gamma \in [0,1)\), \({\tilde{\pi }}_{n,i,s_{i}}=\pi _{n,i,s_{i}}\) and \(\frac{\lambda _{n,i,s_{i}} \gamma _{n,i,s_{i}} }{\alpha _{n,i,s_{i}}} {\tilde{\pi }}_{n,i,s_{i}} = \frac{{\mathbb {1}}_{n,i,s_{i}} + (1-{\mathbb {1}}_{n,i,s_{i}}) \gamma }{ x_{n,i,s_{i}} + (1-x_{n,i,s_{i}}) \gamma } \pi _{n,i,s_{i}}\).

    Therefore, \(E[M_{n,i,s_{i}} \mid {\mathcal {F}}_{n}]=0.\)

  15. Note that we ignore some aspects such as the noise term and stochastic nature of the learning process. To show the convergence formally, we adopt the stochastic approximation method.

  16. If F is a contraction mapping, then there exists \(k \in [0,1)\) such that for any \(Q,Q' \in {\mathbb {R}}^{|{\mathcal {S}}|}\),

    figure a

    Since \(F(Q^{*})=Q^{*}\) for the fixed point, by replacing \(Q'\) by \(Q^{*}\) in the inequality, we obtain condition (5).

  17. We show in the following argument that if \(\delta =1\), \(x^{*}\) corresponds to a quantal response equilibrium. In addition, if \(\max _{i} \tau _{i}\) approaches 0, then \(x^{*}\) corresponds to a Nash equilibrium.

  18. If \(\gamma =0\), the model coincides with the PAL model.

  19. For instance, see Corollary 2.3 of Hall and Heyde (1980).

  20. The quantal response equilibrium of the prisoner’s dilemma game approaches the unique Nash equilibrium (RR) as \(\tau _{i} \rightarrow 0\) for each i.

  21. For example, consider two close assessment profiles such that under one payoff assessment profile, one player’s assessment of L is greater than that of R, and under the other payoff assessment profile, his assessment of R is greater than that of L. In this case, his choice probability of L is close to one under the first assessment profile, while the choice probability is close to zero under the other assessment profile. Then, his opponent’s expected payoffs from L (and R as well) under the assessment profiles have a distance of almost 2, which is equal to \(\theta \).

  22. In detail, for the random process \(\{ Q_{n} \}\), which is defined recursively by

    $$\begin{aligned} Q_{n+1} = Q_{n} + \lambda _{n}(h(Q_{n}) + M_{n} + \eta _{n}) \end{aligned}$$
    (13)

    where (i) \(h: {\mathbb {R}}^{m} \rightarrow {\mathbb {R}}^{m}\) is Lipschitz; (ii) \(\{\lambda _{n} \}\) satisfies condition (2); (iii) \(\{ M_{n}\}\) is a martingale difference with respect to \(\{{\mathcal {F}}_{n} \}\) and square-integrable with

    $$\begin{aligned} E[||M||^{2}_{n+1} \mid {\mathcal {F}}_{n}] < K \ a.s., \ n \ge 0 \end{aligned}$$

    for some \(K>0\); and (iv) \(\sup _{n} ||Q_{n}|| < \infty \), a.s.,

    $$\begin{aligned} \lim _{s \rightarrow \infty } \sup _{t \in [s,s+T]} || \overline{Q}_{t} - Q^{s}_{t} || = 0 \ a.s. \end{aligned}$$

    where the process \(\overline{Q}_{t}\) is a continuous interpolated trajectory of \(\{ Q_{n} \}\) and \(Q^{s}_{t} \) is the unique solution starting at s of the ordinary differential equation \(\dot{Q}_{t} = h (Q^{s}_{t})\), \(t \ge s\) with \(Q^{s}_{s} = \overline{Q}_{s}\).

References

  • Beggs, A.W.: On the convergence of reinforcement learning. J. Econ. Theory 122, 1–36 (2005)

    Article  Google Scholar 

  • Benaïm, M.: Dynamics of stochastic approximation algorithms. In: Azéma, J., Émery, M., Ledoux, M., Yor, M. (eds.) Séminaire De Probabilités, XXXIII. Lecture Notes in Mathematics, vol. 1709, pp. 1–68. Springer, Berlin (1999)

    Chapter  Google Scholar 

  • Benaïm, M., Hirsch, M.: Mixed equilibria and dynamical systems arising from fictitious play in perturbed games. Games Econ. Behav. 29, 36–72 (1999)

    Article  Google Scholar 

  • Borkar, V.S.: Stochastic Approximation: A Dynamical Systems Viewpoint. Cambridge University Press, Cambridge (2008)

    Book  Google Scholar 

  • Camerer, C., Ho, T.H.: Experience-weighted attraction learning in normal form games. Econometrica 67, 827–874 (1999)

    Article  Google Scholar 

  • Chen, Y., Khoroshilov, Y.: Learning under limited information. Games Econ. Behav. 44, 1–25 (2003)

    Article  Google Scholar 

  • Cominetti, R., Melo, E., Sorin, S.: A payoff-based learning procedure and its application to traffic games. Games Econ. Behav. 70, 71–83 (2010)

    Article  Google Scholar 

  • Conley, T.G., Udry, C.R.: Learning about a new technology: pineapple in Ghana. Am. Econ. Rev. 100, 35–69 (2010)

    Article  Google Scholar 

  • Duffy, J., Feltovich, N.: Does observation of others affect learning in strategic environments? An experimental study. Int. J. Game Theory 28, 131–52 (1999)

    Article  Google Scholar 

  • Erev, I., Roth, A.E.: Predicting how people play games: reinforcement learning in experimental games with unique mixed strategy equilibria. Am. Econ. Rev. 88, 848–881 (1998)

    Google Scholar 

  • Fudenberg, D., Kreps, D.M.: Learning mixed equilibria. Games Econ. Behav. 5, 320–367 (1993)

    Article  Google Scholar 

  • Fudenberg, D., Takahashi, S.: Heterogeneous beliefs and local information in stochastic fictitious play. Games Econ. Behav. 71, 100–120 (2011)

    Article  Google Scholar 

  • Funai, N.: An adaptive learning model with foregone payoff information. B.E. J. Theor. Econ. 14, 149–176 (2014)

    Article  Google Scholar 

  • Funai, N.: A unified model of adaptive learning in normal form games. Working paper (2016a)

  • Funai, N.: Reinforcement learning with foregone payoff information in normal form games. Working paper (2016b)

  • Grosskopf, B., Erev, I., Yechiam, E.: Foregone with the wind: indirect payoff information and its implications for choice. Int. J. Game. Theory 34, 285–302 (2006)

    Article  Google Scholar 

  • Hall, P., Heyde, C.C.: Martingale Limit Theory and Its Application. Academic Press, New York (1980)

    Google Scholar 

  • Hart, S., Mas-Colell, A.: A simple adaptive procedure leading to correlated equilibrium. Econometrica 68, 1127–1150 (2000)

    Article  Google Scholar 

  • Heller, D., Sarin, R.: Adaptive learning with indirect payoff information. Working paper (2001)

  • Hofbauer, J., Hopkins, E.: Learning in perturbed asymmetric games. Games Econ. Behav. 52, 133–152 (2005)

    Article  Google Scholar 

  • Hofbauer, J., Sandholm, W.H.: On the global convergence of stochastic fictitious play. Econometrica 70, 2265–2294 (2002)

    Article  Google Scholar 

  • Hopkins, E.: Two competing models of how people learn in games. Econometrica 70, 2141–2166 (2002)

    Article  Google Scholar 

  • Hopkins, E., Posch, M.: Attainability of boundary points under reinforcement learning. Games Econ. Behav. 53, 110–125 (2005)

    Article  Google Scholar 

  • Ianni, A.: Learning strict Nash equilibria through reinforcement. J. Math. Econ. 50, 148–155 (2014)

    Article  Google Scholar 

  • Jehiel, P., Samet, D.: Learning to play games in extensive form by valuation. J. Econ. Theory 124, 129–148 (2005)

    Article  Google Scholar 

  • Laslier, J.F., Topol, R., Walliser, B.: A behavioural learning process in games. Games Econ. Behav. 37, 340–366 (2001)

    Article  Google Scholar 

  • Leslie, D.S., Collins, E.J.: Individual q-learning in normal form games. SIAM J. Control Optim. 44, 495–514 (2005)

    Article  Google Scholar 

  • McKelvey, R.D., Palfrey, T.R.: Quantal response equilibria for normal form games. Games Econ. Behav. 10, 6–38 (1995)

    Article  Google Scholar 

  • Rustichini, A.: Optimal properties of stimulus-response learning models. Games Econ. Behav. 29, 244–273 (1999)

    Article  Google Scholar 

  • Roth, A.E., Erev, I.: Learning in extensive-form games: experimental data and simple dynamic models in the intermediate term. Games Econ. Behav. 8, 164–212 (1995)

    Article  Google Scholar 

  • Sarin, R., Vahid, F.: Payoff assessments without probabilities: a simple dynamic model of choice. Games Econ. Behav. 28, 294–309 (1999)

    Article  Google Scholar 

  • Sarin, R., Vahid, F.: Predicting how people play games: a simple dynamic model of choice. Games Econ. Behav. 34, 104–122 (2001)

    Article  Google Scholar 

  • Tsitsiklis, J.N.: Asynchronous stochastic approximation and q-learning. Mach. Learn. 16, 185–202 (1994)

    Google Scholar 

  • Watkins, C.J.C.H., Dayan, P.: Q-learning. Mach. Learn. 8, 279–292 (1992)

    Google Scholar 

  • Wu, H., Bayer, R.: Learning from inferred foregone payoffs. J. Econ. Dyn. Control 51, 445–458 (2015)

    Article  Google Scholar 

  • Yechiam, E., Busemeyer, J.R.: Comparison of basic assumptions embedded in learning models for experience-based decision making. Psychon. Bull. Rev. 12, 387–402 (2005)

    Article  Google Scholar 

  • Yechiam, E., Busemeyer, J.R.: The effect of foregone payoffs on underweighting small probability events. J. Behav. Dec. Mak. 19, 1–16 (2006)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Naoki Funai.

Additional information

This paper was initially circulated under the title “A Unified Model of Adaptive Learning in Normal Form Games”. I am grateful to Andrea Collevecchio, Marco LiCalzi and Massimo Warglien for their support at the Department of Management, Ca’ Foscari University of Venice. I also thank Rajiv Sarin, Farshid Vahid, Ikuo Ishibashi and seminar audiences at Monash University and UECE Game Theory Lisbon Meetings for suggestions and helpful comments. The comments from the anonymous referees and associate editor also greatly improved the paper. Financial support from the MatheMACS project (FP7-ICT, # 318723), the COPE project (supported by a Sapere Aude grant from the Danish Research Council for Independent Research) and Ca’ Foscari University of Venice is gratefully acknowledged. All remaining errors are mine.

Appendices

Appendices

1.1 The Proof of Proposition 4

The following argument is an extension of the proof of Proposition 5 in Cominetti et al. (2010) when players observe foregone payoff information and there exists a discount factor for the foregone payoff information. In the argument, we provide a condition under which F is a contraction mapping.

Now for Q, \(Q'\), \(x=(C_{i,s_{i}}(Q_{i}))_{i,s_{i}}\) and \(x'=(C_{i,s_{i}}(Q'_{i}))_{i,s_{i}}\), let i and \(s_{i}\) be such that

$$\begin{aligned} ||F(Q) - F(Q')||_{\infty } = |\overline{\pi }_{i}(s_{i}, x_{-i}) - \overline{\pi }_{i}(s_{i}, x'_{-i}) |. \end{aligned}$$
(A.1)

Consider a telescopic series \(y_{0},\ldots ,y_{N}\) such that \(y_{N}= x\), \(y_{0}=x'\) and

$$\begin{aligned} y_{n}= & {} (x_{1},\ldots ,x_{n},x'_{n+1},\ldots ,x'_{N}) \end{aligned}$$

for \(n \notin \{0,N\}\). Then, equation (A.1) is expressed as follows:

$$\begin{aligned} |\overline{\pi }_{i}(s_{i}, x_{-i}) - \overline{\pi }_{i}(s_{i}, x'_{-i})| = |\sum ^{N}_{l = 1}(\overline{\pi }_{i}(s_{i}, y_{l,-i}) - \overline{\pi }_{i}(s_{i}, y_{l-1,-i}))|. \end{aligned}$$

Now, the summand for \(l = i\) is expressed as follows:

$$\begin{aligned} | \overline{\pi }_{i}(s_{i}, y_{l,-i}) - \overline{\pi }_{i}(s_{i}, y_{l-1,-i}) | = | (1-\delta )\pi _{i}(s_{i},y_{l,-i})(x_{i,s_{i}}-x'_{i,s_{i}})|, \end{aligned}$$

whereas the summand for \(l \ne i\) is expressed as follows:

$$\begin{aligned} |\overline{\pi }_{i}(s_{i}, y_{l,-i}) - \overline{\pi }_{i}(s_{i}, y_{l-1,-i}) | \le&| \big ( y_{l,i,s_{i}} + (1-y_{l,i,s_{i}}) \delta \big )| |\pi _{i}(s_{i}, y_{l,-i}) - \pi _{i}(s_{i}, y_{l-1,-i}) | \\ \le&|\pi _{i}(s_{i}, y_{l,-i}) - \pi _{i}(s_{i}, y_{l-1,-i}) | \\ =&|\sum _{s_{l} \in S_{l}} \pi _{i}(s_{i}, s_{l}, y_{l, -(i,l)}) (x_{l,s_{l}} - x'_{l, s_{l}}) | \\ =&|\sum _{s_{l} \in S_{l}} \big ( \pi _{i}(s_{i}, s_{l}, y_{l, -(i,l)}) - \pi _{i}(s_{i}, \overline{s}_{l}, y_{l,-(i,l)})\big ) (x_{l,s_{l}} - x'_{l, s_{l}}) | \end{aligned}$$

where \(\overline{s}_{l}\) is fixed. Note that

$$\begin{aligned} |x_{l, s_{l}} - x'_{l,s_{l}}| \le&\sum _{s \in S_{l}}| \frac{\partial }{\partial Q_{s}}C_{l,s_{l}}(Q^{*}_{l})(Q_{l,s} - Q'_{l,s})| \\ \le&|| Q - Q'||_{\infty } \sum _{s \in S_{l}}| \frac{\partial }{\partial Q_{s}}C_{l,s_{l}}(Q^{*}_{l})| \\ =&|| Q - Q'||_{\infty } 2 \frac{1}{\tau _{l}}C_{l,s_{l}} (Q^{*}_{l})(1-C_{l,s_{l}} (Q^{*}_{l})) \end{aligned}$$

for some \(Q^{*}_{l}\). For the last equality, we use the result of Lemma 1, which we show later. Therefore, for the summand of \(l =i\), we have

$$\begin{aligned} | \overline{\pi }_{i}(s_{i}, y_{l,-i}) - \overline{\pi }_{i}(s_{i}, y_{l-1,-i}) |= & {} |(1-\delta )\pi _{i}(s_{i},y_{l,-i}) (x_{i,s_{i}}-x'_{i,s_{i}})| \\\le & {} 2K || Q - Q'||_{\infty } \frac{1}{\tau _{i}} \end{aligned}$$

where \(K := \frac{(1-\delta )}{4}\max _{i,s_{i},s_{-i}} |\pi _{i}(s_{i},s_{-i})|\), whereas for the summand of \(l \ne i\), we have

$$\begin{aligned} |\overline{\pi }_{i}(s_{i}, y_{l,-i}) - \overline{\pi }_{i}(s_{i}, y_{l-1,-i}) | \le&|\sum _{s_{l} \in S_{l}} \big ( \pi _{i}(s_{i}, s_{l}, y_{l,-(i,l)}) - \pi _{i}(s_{i}, \overline{s}_{l}, y_{l, -(i,l)}) \big ) (x_{l,s_{l}} - x'_{l, s_{l}}) | \\ \le&\sum _{s_{l} \in S_{l}} | \big ( \pi _{i}(s_{i}, s_{l}, y_{l,-(i,l)}) - \pi _{i}(s_{i}, \overline{s}_{l}, y_{l,-(i,l)}) \big ) (x_{l,s_{l}} - x'_{l, s_{l}}) | \\ \le&\sum _{s_{l} \in S_{l}} \theta | x_{l,s_{l}} - x'_{l, s_{l}} | \\ \le&\sum _{s_{l} \in S_{l}} \theta || Q - Q'||_{\infty } 2 \frac{1}{\tau _{l}}C_{l,s_{l}} (Q^{*}_{l}) \\ =\,&2 \theta || Q - Q'||_{\infty } \frac{1}{\tau _{l}} \end{aligned}$$

where \(\theta := \max _{i,s_{i},s_{j},s'_{j}} | \pi _{i}(s_{i},s_{j},s_{-(i,j)}) - \pi _{i}(s_{i},s'_{j},s_{-(i,j)}) |\). Therefore,

$$\begin{aligned} ||F(Q) - F(Q')||_{\infty } \le&\sum _{l \ne i} 2 \theta || Q - Q'||_{\infty } \frac{1}{\tau _{l}} + 2K || Q - Q'||_{\infty } \frac{1}{\tau _{i}}\\ \le&2 \theta ' \beta || Q - Q'||_{\infty } \end{aligned}$$

where \(\theta ':=\max \{ \theta , K \}\) and \(\beta := \sum _{l \in {\mathcal {N}}} \frac{1}{\tau _{l}}\). Hence, if \((\tau _{l})\) is large enough, that is, \(2 \theta ' \beta <1\), F is a contraction mapping. \(\square \)

Lemma 1

For the logit choice rule, we have

$$\begin{aligned} \frac{\partial C_{i,s_{i}}}{\partial Q_{i,t_{i}}}(Q_{i}) = \frac{1}{\tau _{i}} C_{i,s_{i}}(Q_{i}) \big ( {\mathbb {1}}\{ t_{i}=s_{i}\} - C_{i,t_{i}}(Q_{i}) \big ) \end{aligned}$$

and

$$\begin{aligned} \sum _{t \in S_{i}}\left| \frac{\partial C_{i,s_{i}}}{\partial Q_{i,t_{i}}}(Q_{i})\right| = 2 \frac{1}{\tau _{i}} C_{i,s_{i}}(Q_{i}) (1- C_{i,s_{i}}(Q_{i}) ) \end{aligned}$$

for each i, \(Q_{i}\), \(s_{i}\) and \(t_{i}\).

Proof

$$\begin{aligned} \frac{\partial C_{i,s_{i}}}{\partial Q_{i,t_{i}}}(Q_{i})=&\frac{\partial }{\partial Q_{i,t_{i}}} \frac{\exp \left( \frac{1}{\tau _{i}}Q_{i,s_{i}} \right) }{\sum _{s'_{i} } \exp \left( \frac{1}{\tau _{i}}Q_{i,s'_{i}}\right) } \\ =&\frac{{\mathbb {1}}\{ t_{i}=s_{i}\}\frac{1}{\tau _{i}}\exp \left( \frac{1}{\tau _{i}}Q_{s_{i}}\right) \sum _{s'_{i} } \exp \left( \frac{1}{\tau _{i}}Q_{s'_{i}}\right) - \frac{1}{\tau _{i}}\exp \left( \frac{1}{\tau _{i}}Q_{s_{i}}\right) \exp \left( \frac{1}{\tau _{i}}Q_{t_{i}}\right) }{\left( \sum _{s'_{i} } \exp \left( \frac{1}{\tau _{i}}Q_{s'_{i}}\right) \right) ^{2}} \\ =&{\mathbb {1}}\{ t_{i}=s_{i}\} \frac{1}{\tau _{i}} C_{i,s_{i}}(Q_{i}) - \frac{1}{\tau _{i}} C_{i,s_{i}}(Q_{i}) C_{i,t_{i}}(Q_{i}) \\ =&\frac{1}{\tau _{i}} C_{i,s_{i}}(Q_{i}) \left( {\mathbb {1}}\{ t_{i}=s_{i}\} - C_{i,t_{i}}(Q_{i}) \right) \end{aligned}$$

and

$$\begin{aligned} \sum _{t_{i} \in S_{i}}\left| \frac{\partial C_{i,s_{i}}}{\partial Q_{i,t_{i}}}(Q_{i})\right| =&\sum _{t_{i} \in S_{i}} \left| \frac{1}{\tau _{i}} C_{i,s_{i}}(Q_{i}) \big ( {\mathbb {1}}\{ t_{i}=s_{i}\} - C_{i,t_{i}}(Q_{i}) \big )\right| \\ =&\frac{1}{\tau _{i}} C_{i,s_{i}}(Q_{i}) \sum _{t_{i} \in S_{i}} | {\mathbb {1}}\{ t_{i}=s_{i}\} - C_{i,t_{i}}(Q_{i}) | \\ =&2 \frac{1}{\tau _{i}} C_{i,s_{i}}(Q_{i}) (1- C_{i,s_{i}}(Q_{i}) ). \end{aligned}$$

\(\square \)

1.2 The Proof of Proposition 6

As we focus on a symmetric \(2 \times 2\) game, for actions, the set of actions and the payoff function, we omit the subscript referring to players. Since only two actions are available and the weighting parameters of the actions are equivalent for each player, it is enough for the following convergence analysis to focus on the assessment difference. In the following argument, without confusion, let \(Q_{n,i}\) denote the difference of player i’s assessments in period n: for \(s, t \in S\) and \(s \ne t\),

$$\begin{aligned} Q_{n,i}:=Q_{n, i,t} -Q_{n,i,s}. \end{aligned}$$

Let \(Q_{n}=(Q_{n,i}, Q_{n,-i})\) be the assessment difference profile. Note that each player’s choice rule can be expressed as a function of the difference: let \(C_{i}: {\mathbb {R}} \rightarrow {\mathbb {R}}\) be the choice rule of player i such that

$$\begin{aligned} C_{i}(Q_{n,i}) := \frac{1}{1+ \exp \left( \frac{1}{\tau _{i}}Q_{n,i}\right) } \end{aligned}$$

and \(x_{n,i,s}=C_{i}(Q_{n,i})\).

Now, we express the updating rule of the assessment differences in the following manner: for each n and i,

$$\begin{aligned} Q_{n+1,i} = Q_{n,i} + \alpha _{n,i}\big ( G_{i}(Q_{n}) -Q_{n,i} + M'_{n,i} \big ) \end{aligned}$$

where

  1. 1.

    \(M'_{n,i} := M_{n, i, t} - M_{n, i, s}\), which is still a martingale difference noise;

  2. 2.

    \(G=(G_{i}, G_{-i}): {\mathbb {R}}^2 \rightarrow {\mathbb {R}}^{2}\) is defined in the following manner: for each i and \(Q=(Q_{i},Q_{-i}) \in {\mathbb {R}}^{2}\),

    $$\begin{aligned} G_{i}(Q) :=\,&\big ( \pi (t, x_{-i}) - \pi (s, x_{-i}) \big ) \\ =\,&bx_{-i, s}+ c \end{aligned}$$

    where

    1. (a)

      \(x_{i, s}:=C_{i}(Q_{i})\), \(x_{i,t}:= 1-x_{i,s}\) and \(x_{i}:=(x_{i,s}, x_{i,t})\) for each i;

    2. (b)

      \(b:=( \pi (t, s)- \pi (s, s)) - (\pi (t, t) - \pi (s, t))\);

    3. (c)

      \(c:= \pi (t, t)- \pi (s, t)\).

By applying the asynchronous stochastic approximation method of Tsitsiklis (1994) to the assessment difference process, we show the convergence to a quantal response equilibrium, at which the following equations hold: for each i,

$$\begin{aligned} x^{e}_{i, s}= & {} \frac{1}{1+ \exp \left( \frac{1}{\tau _{i}} Q^{e}_{i}\right) } \quad \text {and} \\ Q^{e}_{i}= & {} bx^{e}_{-i, s}+c. \end{aligned}$$

Without loss of generality, let s be the strictly dominant action of the game. Then, \(Q^{e}_{i}<0\) and \(C_{i}(Q^{e}_{i})> \frac{1}{2}\), as \(Q^{e}_{i} = (\pi (t, s) -\pi (s, s) )x^{e}_{-i, s} +(\pi (t, t) -\pi (s, t) )(1-x^{e}_{-i, s}) <0\). Note also that \(x^{e}_{i,s} \rightarrow 1\) as \(\tau _{i} \rightarrow 0\), which means that the quantal response equilibrium approaches the dominant strategy equilibrium of the game.

Now consider

$$\begin{aligned} | G_{i}(Q_{n}) - Q^{e}_{i} | =&|b| |x_{n,-i,s} -x^{e}_{-i, s}| \\ =&k_{n,-i} |Q_{n,-i}-Q^{e}_{-i}| \end{aligned}$$

where

$$\begin{aligned} k_{n,i} := {\left\{ \begin{array}{ll} |b|\frac{ |C_{i}(Q_{n,i})-C_{i}(Q^{e}_{i}) |}{|Q_{n,i}-Q^{e}_{i}| } &{} \text {if } Q_{n,i} \ne Q^{e}_{i}\\ 0 &{} \text {otherwise}. \end{array}\right. } \end{aligned}$$

Since \(C_{i}\) is decreasing, concave on the negative domain and convex on the positive domain for each i, we know that

$$\begin{aligned} \max _{Q_{i}} \frac{|C_{i}(Q_{i}) - C_{i}(Q^{e}_{i})|}{|Q_{i}-Q^{e}_{i}|} \le \frac{|C_{i}(Q^{e}_{i})|}{|Q^{e}_{i}|} \end{aligned}$$

for \(Q^{e}_{i}<0\) and \(C_{i}(Q^{e}_{i})>0\). Also, note that for each i,

$$\begin{aligned} |\pi (t, s) -\pi (s, s)| \vee |\pi (t, t)-\pi (s, t)| \ge |Q^{e}_{i} |, \end{aligned}$$

and thus

$$\begin{aligned} k_{n,i} \le \frac{|b|}{|\pi (t,s) -\pi (s, s)| \vee |\pi (t, t)-\pi (s, t)|}=:b'. \end{aligned}$$

Therefore, if \(b'<1\), we know that \(x_{n}\) converges to the quantal response equilibrium by the asynchronous stochastic approximation method.

Note that for symmetric \(2 \times 2\) games with a strictly dominant action,

$$\begin{aligned} |( \pi (t, s)- \pi (s, s)) - (\pi (t, t) - \pi (s, t) ) | < |\pi (t,s) -\pi (s, s)| \vee |\pi (t, t)-\pi (s, t)|, \end{aligned}$$

and thus we obtain the convergence for the game. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Funai, N. Convergence results on stochastic adaptive learning. Econ Theory 68, 907–934 (2019). https://doi.org/10.1007/s00199-018-1150-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00199-018-1150-8

Keywords

JEL Classification

Navigation