Abstract
The present article provides some additional results for the two-player game of strategic experimentation with three-armed exponential bandits analyzed in Klein (Games Econ Behav 82:636–657, 2013). Players play replica bandits, with one safe arm and two risky arms, which are known to be of opposite types. It is initially unknown, however, which risky arm is good and which is bad. A good risky arm yields lump sums at exponentially distributed times when pulled. A bad risky arm never yields any payoff. In this article, I give a necessary and sufficient condition for the state of the world eventually to be found out with probability 1 in any Markov perfect equilibrium in which at least one player’s value function is continuously differentiable. Furthermore, I provide closed-form expressions for the players’ value function in a symmetric Markov perfect equilibrium for low and intermediate stakes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
The utilitarian planner maximizes the sum of the players’ utilities. The solution to this problem is the policy the players would want to commit to at the outset of the game if they had commitment power. It thus constitutes a natural efficient benchmark against which to compare our equilibria.
- 3.
By contrast, Bolton and Harris [2] identified an encouragement effect in their model. It makes players experiment at beliefs that are more pessimistic than their single-agent cutoffs. This is because they will receive good news with some probability, which will make the other players more optimistic also. This then induces them to provide more experimentation, from which the first player then benefits in turn. With fully revealing breakthroughs as in [6, 8], or this model, however, a player could not care less what others might do after a breakthrough, as there will not be anything left to learn. Therefore, there is no encouragement effect in these models.
- 4.
The efficient solution in [6] also implies incomplete learning.
- 5.
For perfect negative correlation, this is true in any equilibrium; for general negative correlation, there always exists an equilibrium with this property.
- 6.
The technical requirement that at least one player’s value function be continuously differentiable is needed on account of complications pertaining to the admissibility of strategies. I use it in the proof of Lemma 4.1 to establish that the safe payoff s constitutes a lower bound on the player’s equilibrium value. However, by e.g. insisting on playing (1, 0) at a single belief \(\hat {p}\) while playing (0, 0) everywhere else in a neighborhood of \(\hat {p}\), a player could e.g. force the other player to play (0, 1) at \(\hat {p}\) for mere admissibility reasons. Thus, both players’ equilibrium value functions might be pushed below s at certain beliefs \(\hat {p}\). For the purposes of this section, I rule out such implausible behavior by restricting attention to equilibria in which at least one player’s value function is smooth.
- 7.
See Prop.3.1 in [6].
- 8.
See Proposition 8 in [8].
- 9.
Strictly speaking, the first inequality relies on the admissibility of the action (0, 0) at \(\tilde {p}\). However, even if (0, 0) should not be admissible at \(\tilde {p}\), my definition of strategies still guarantees the existence of a neighborhood of \(\tilde {p}\) in which (0, 0) is admissible everywhere except at \(\tilde {p}\). Hence, by continuous differentiability of u, there exists a belief \(\tilde {\tilde {p}}\) in this neighborhood at which the same contradiction can be derived.
- 10.
Again, strictly speaking, the first inequality relies on the admissibility of the action (1, 0) at the belief in question, and my previous remark applies.
References
Bellman, R.: A problem in the sequential design of experiments. Sankhya Indian J. Stat. (1933–1960) 16(3/4), 221–229 (1956)
Bolton, P., Harris, C.: Strategic experimentation. Econometrica 67, 349–374 (1999)
Bolton, P., Harris, C.: Strategic experimentation: the Undiscounted case. In: Hammond, P.J., Myles, G.D. (eds.) Incentives, Organizations and Public Economics – Papers in Honour of Sir James Mirrlees, pp. 53–68. Oxford University Press, Oxford (2000)
Bradt, R., Johnson, S., Karlin, S.: On sequential designs for maximizing the sum of n observations. Ann. Math. Stat. 27, 1060–1074 (1956)
Gittins, J., Jones, D.: A dynamic allocation index for the sequential design of experiments. In: Progress in Statistics, European Meeting of Statisticians, 1972, vol. 1, pp. 241–266. North-Holland, Amsterdam (1974)
Keller G., Rady, S., Cripps, M.: Strategic experimentation with exponential bandits. Econometrica 73, 39–68 (2005)
Klein, N.: Strategic learning in teams. Games Econ. Behav. 82, 636–657 (2013)
Klein, N., Rady, S.: Negatively correlated bandits. Rev. Econ. Stud. 78, 693–732 (2011)
Robbins, H.: Some aspects of the sequential design of experiments. Bull. Am. Math. Soc. 58, 527–535 (1952)
Thompson, W.: On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25, 285–294 (1933)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this chapter
Cite this chapter
Klein, N. (2018). Learning in a Game of Strategic Experimentation with Three-Armed Exponential Bandits. In: Petrosyan, L., Mazalov, V., Zenkevich, N. (eds) Frontiers of Dynamic Games. Static & Dynamic Game Theory: Foundations & Applications. Birkhäuser, Cham. https://doi.org/10.1007/978-3-319-92988-0_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-92988-0_4
Published:
Publisher Name: Birkhäuser, Cham
Print ISBN: 978-3-319-92987-3
Online ISBN: 978-3-319-92988-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)