Skip to main content
Log in

Reputation effects in stochastic games with two long-lived players

  • Research Article
  • Published:
Economic Theory Aims and scope Submit manuscript

Abstract

This paper extends the reputation result of Evans and Thomas (Econometrica 65(5):1153–1173, 1997) to stochastic games. Specifically, I analyze reputation effects with two long-lived (but not equally long) players in stochastic games, that is, in games in which the payoffs depend on a state variable and the law of motion for states is a function of the current state and of the players’ actions. The results suggest that reputation effects in stochastic games can be expected if the uninformed player has a weak control of the law motion or if there is no irreversibility. On the contrary, in economic situations with irreversibility and in which the uninformed player’s strong control of the transition, such as the hold-up problems, reputation effects are unlikely to be obtained.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. See, for instance, Admati and Perry (1991), Che and Sakovics (2004), Compte and Jehiel (2004), Décamps et al. (2006), Lockwood and Thomas (2002) and Piazza and Roy (2019).

  2. The asymptotic minmax payoff is the payoff obtained when player 2 plays a best response against the worst strategy of player 1 in the infinitely repeated game when the discount factor goes to one.

  3. A deterministic transition is a probability distribution over states that assigns a weight equal to 0 or 1 of moving from one state to another, whatever the players do.

  4. Let \(W(s)=co\{v\in {\mathbb {R}}_2: \exists x\in \Omega \text { s.t. } V_i(x,s)=v_i\}\). If \(W(s)=W(s')\) for all \(s,s'\in S\), then the reputation results of this paper can be obtained for a higher reputation bound. When the set of states is a singleton, the condition on W is satisfied, and the reputation bound is identical to that in Evans and Thomas (1997). The proof is in “Block stationarity of Appendix”.

  5. The strategy \(\tau \in \bar{\Sigma }_2\) if for all history h and \(b\in B\) with a positive probability under \(\tau (h^t)\), (1) \((b,s^t)\notin \varphi\) and (2) \(\tau (h^t)=\hat{\omega }_2(s^t)\) if \(s^t\) belongs to a recurrent set in \({\mathcal {R}}_{\hat{\omega }^{\alpha }}\), where \(h^t\) is the truncation of h at date t.

  6. A reminder of the proof of Evans and Thomas (1997) is provided in the “related literature” part of the introduction.

  7. Formally, a stochastic game is irreducible despite player 2’s actions if for each pair of states \((s,s')\in S\times S\), there is a \(T > 0\) and a sequence of states \(s^1,\ldots ,s^T\) such that \(s^1=s,s^T=s'\), and for each \(t < T\), there is a profile \((a,b)\in A\times B\) such that \(q(s^{t+1}|a,b',s^t)>0\) for all \(b\in B\).

  8. Let \(\eta\) be the probability of returning to the pre-investment situation. The probability of being in the pre-investment situation at t is \((1-\eta )^t\). The minmax payoff is \(-(1-\delta _2) \sum _{t=0}^{\infty } (1-\eta )^t \delta _2^t\), which goes to 0 as \(\delta _2\rightarrow 1\).

  9. There are \(C+1\) states: the initial state in which the investment decision is made and C absorbing states in which players bargain.

References

  • Admati, A., Perry, M.: Joint project without commitment. Rev. Econ. Stud. 58, 259–276 (1991)

    Article  Google Scholar 

  • Aoyagi, M.: Reputation and dynamic commitment leadership in infinitely repeated games. J. Econ. Theory 71, 378–393 (1996)

    Article  Google Scholar 

  • Celentani, M., Pesendorfer, W.: Reputation in dynamic games. J. Econ. Theory 77, 109–132 (1996)

    Article  Google Scholar 

  • Celentani, M., Fudenberg, D., Levine, D.K., Pesendorfer, W.: Maintaining a reputation against a long-lived opponent. Econometrica 64, 691–704 (1996)

    Article  Google Scholar 

  • Che, Y.K., Sakovics, J.: A dynamic theory of holdup. Econometrica 72(4), 1063–1103 (2004)

    Article  Google Scholar 

  • Compte, O., Jehiel, P.: Gradualism in bargaining and contribution games. Rev. Econ. Stud. 71(4), 975–1000 (2004)

    Article  Google Scholar 

  • Cripps, M.W., Schmidt, M., Thomas, J.P.: Reputation in perturbed repeated games. J. Econ. Theory 69, 387–410 (1996)

    Article  Google Scholar 

  • Cripps, M.W., Thomas, J.P.: Reputation and perfection in repeated common interest games. Games Econ. Behav. 18(2), 141–158 (1997)

    Article  Google Scholar 

  • Cripps, M.W., Dekel, E., Pesendorfer, W.: Reputation with equal discounting in repeated games with strictly conflicting interests. J. Econ. Theory 121(2), 259–272 (2004)

    Article  Google Scholar 

  • Décamps, J.P., Mariotti, T., Villeneuve, S.: Irreversible investment in alternative projects. Econ. Theory 28, 425 (2006)

    Article  Google Scholar 

  • Evans, R., Thomas, J.: Reputation and experimentation in repeated games with two long-run players. Econometrica 65(5), 1153–1173 (1997)

    Article  Google Scholar 

  • Lockwood, B., Thomas, J.: Gradualism and irreversibility. Rev. Econ. Stud. 69(2), 339–356 (2002)

    Article  Google Scholar 

  • Piazza, A., Roy, S.: Irreversibility and the economics of forest conservation. Econ. Theory (2019). https://doi.org/10.1007/s00199-019-01175-x

    Article  Google Scholar 

  • Schmidt, K.: Reputation and equilibrium characterization in repeated games with conflicting interests. Econometrica 61, 325–351 (1993)

    Article  Google Scholar 

  • Sorin, S.: Merging, reputation, and repeated games with incomplete information. Games Econ. Behav. 29, 274–208 (1999)

    Article  Google Scholar 

  • Young, P.: The evolution of conventions. Econometrica 61, 57–84 (1993)

    Article  Google Scholar 

Download references

Acknowledgements

Financial support from the ANR DBCPG is gratefully acknowledged. This research has been conducted as part of the Labex MME-DII (ANR11-LBX-0023-01) Project. I would also like to thank R. Evans, P. Fleckinger, O. Gossner, J. Hörner, L. Ménager, L. Samuelson, A. Salomon, J. Sobel, J.M. Tallon, N. Vieille and the participants of the Game Theory World Congress in Istanbul (2012) and SAET (2017) for their very helpful comments. I am grateful to an associate editor and three anonymous referees for their very constructive suggestions that greatly improve the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chantal Marlats.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

1.1 Proof of Lemma 1

Fix \(\varepsilon >0\). According to the Abel theorem, \(\lim _{T\rightarrow \infty }(1/T)E_{\hat{\omega },s^0}(\sum _{t=0}^{T-1} u^t_i)=\hat{V}_i(\hat{\omega },s^0)\) for all \(s^0\in S\). Let \(\min _{s\in S}\hat{V}_i(\hat{\omega },s) =\hat{V}_i\). Thus, there is a \(\tilde{T}\) such that for all \(T\ge \tilde{T}\), \(\frac{1}{T}E_{\hat{\omega },s^0}(\sum _{t=0}^{T-1} u^t_i)\ge \hat{V}_i-\varepsilon /16\), \(\forall s^0\in S\). There exists \(n^*\) such that for all integers \(m\le \tilde{T}\) and \(n\ge n^*\), \(\frac{1}{n\tilde{T}+m}(m(-M)+n\tilde{T}\min _{s\in S}E_{\hat{\omega },s}(\sum _{t=0}^{\tilde{T}-1}u^t))\equiv \hat{V}_i^{n\tilde{T}+m}\ge \hat{V}_i-\varepsilon /8.\) Define \(\delta _i^*\) such that for all \(\delta _i\ge \delta _i^*\) and \(m\le \tilde{T}\),

$$\begin{aligned} \frac{1-\delta _i}{1-\delta _i^{n^*\tilde{T}+m}}\bigg (\frac{1-\delta _i^{m}}{1-\delta _i}(-M)+\delta _i^{m}\frac{1-\delta _i^{\tilde{T}n^*}}{1-\delta _i}\min _{s\in S}U_{i}^{\tilde{T}}(\hat{\omega },\delta _i ,s)\bigg )\ge \hat{V}_i^{n^*\tilde{T}+m}-\varepsilon /8. \end{aligned}$$
(1)

This \(\delta _i^*\) can be found because \(\lim _{\delta _i\rightarrow 1}\frac{1-\delta _i}{1-\delta _i^T}=1/T\). Note that the right-hand side of inequality (1) is greater than \(\hat{V}_i-\varepsilon /4\). Furthermore, \(\min _{s\in S}U_{i}^{\tilde{T}}(\hat{\omega },\delta _i ,s)\) is greater than the left-hand side, and thus, \(\min _{s\in S}U_{i}^{\tilde{T}}(\hat{\omega },\delta _i ,s) \ge \hat{V}_i-\varepsilon /4\). Now, I verify that for all \(T\ge \tilde{T}(1+n^*)\equiv T^*\) and \(\delta _i\ge \delta _i^*\), we obtain \(U_{i}^{T}(\hat{\omega },\delta _i ,s^0)\ge \hat{V}_i-\varepsilon /4\) for all \(s^0\in S\). Fix \(T\ge T^*\). There is a \(n\ge n^*\) and \(m<\tilde{T}\) such that \(T=n\tilde{T}+m\). Therefore, \(U_{i}^{T}(\hat{\omega },\delta _i ,s)\) is greater than

$$\begin{aligned}&\frac{1-\delta _i}{1-\delta _i^{T}}\bigg (\frac{1- \delta _i^{m}}{1-\delta _i}(-M)+ \delta _i^{m}\frac{1-\delta _i^{\tilde{T}n^*}}{1-\delta _i}\min _{s\in S}U_{i}^{\tilde{T}}(\hat{\omega },\delta _i ,s)\\&\qquad +\,\delta _i^{\tilde{T}n^*+m}\frac{1-\delta _i^{(n-n^*)\tilde{T}}}{1-\delta _i}\min _{s\in S}U_{i}^{\tilde{T}}(\hat{\omega },\delta _i ,s) \bigg )\\&\quad \ge \frac{1-\delta _i}{1-\delta _i^{T}}\bigg (\frac{1- \delta _i^{m+\tilde{T}n^*}}{1-\delta _i}(\hat{V}_i-\varepsilon /4) +\delta _i^{\tilde{T}n^*+m}\frac{1-\delta _i^{(n-n^*)\tilde{T}}}{1-\delta _i}(\hat{V}_i-\varepsilon /4 ) \bigg )\ge \hat{V}_i-\varepsilon /4 . \end{aligned}$$

The penultimate inequality follows from Inequality (1).

1.2 Proof of Lemma 2

The first point is straightforward. Now, suppose that \(s'\in R\), \(R\in {\mathcal {R}}_{\hat{\omega }^{\alpha }}\). Then, \(\hat{\omega }^{\alpha }\) generates an ergodic Markov chain over S, and thus, the limit is unique. Let \({\mathcal {T}}_{\omega }\) be the set of transient states under some stationary strategy profile \(\omega\). Let \(p_s^{\alpha }\) be the frequency of time spent in state s. The asymptotic payoff is given by

$$\begin{aligned} V_i(\hat{\omega }^{\alpha },s')=\sum _{s\in R}p_{s}^{\alpha }\left[ (1-\alpha )u_i(\hat{\omega }(s),s)+\frac{\alpha }{|A|-1}\sum _{a\ne \hat{\omega }^{\alpha }(s)} u_i(a,\hat{\omega }_2(s),s) \right] \end{aligned}$$

Note that for all \(s,s'\in R\), \(V_i(\hat{\omega }^{\alpha },s')=V_i(\hat{\omega }^{\alpha },s)\equiv V_i(\hat{\omega }^{\alpha },R)\). Clearly, \(R=\cup _{l=1}^n R_l\cup T\), where \(R_l\in {\mathcal {R}}_{\hat{\omega }}\) and \(T\subseteq {\mathcal {T}}_{\hat{\omega }}\). Using Theorem 4 in Young (1993), there are weights \((\beta _l)_{l=1}^n\) such that

$$\begin{aligned} \lim _{\alpha \rightarrow 0}V_i(\hat{\omega }^{\alpha },s)=\sum _{l=1}^n \beta _l V_i(\hat{\omega },R_l), \end{aligned}$$

where \(V_i(\hat{\omega },R_l)\) is player i’s asymptotic payoff when the initial state is in \(R_l\). Thus, \(\lim _{\alpha \rightarrow 0}V_i(\hat{\omega }^{\alpha },s) \ge \min _{s\in S} V_i(\hat{\omega },s)\).

Now, suppose that \(s'\in {\mathcal {T}}_{\hat{\omega }^{\alpha }}\). Then, for all \(\alpha >0\),

$$\begin{aligned} V_i(\hat{\omega }^{\alpha },s')=\sum _{R\in {\mathcal {R}}_{\hat{\omega }^{\alpha }}} p(R) V_i(\hat{\omega }^{\alpha },R), \end{aligned}$$

where p(R) is the probability that \(R\in {\mathcal {R}}_{\hat{\omega }^{\alpha }}\) is the first recurrent set that is reached. Because for all recurrent sets R, \(\lim _{\alpha \rightarrow 0}V_i(\hat{\omega }^{\alpha },R) \ge \min _{s\in S} V_i(\hat{\omega },s)\), \(\lim _{\alpha \rightarrow 0}V_i(\hat{\omega }^{\alpha },s')\ge \min _{s\in S} V_i(\hat{\omega },s)\).

1.3 Proof of Lemma 3

Suppose that the game is in a state that belongs to a recurrent set \(R\in {\mathcal {R}}_{\tilde{\omega }^{\alpha }}\). Because \(\hat{\omega }^{\alpha }\) is played, the probability distribution over states is an ergodic Markov chain on \(R\in {\mathcal {R}}_{\hat{\omega }^{\alpha }}\). Thus, \(V_1(\hat{\omega }^{\alpha },s)=V_1(\hat{\omega }^{\alpha },s')\equiv V_1(\hat{\omega }^{\alpha },R)\) for all \(s,s'\in R\). Therefore, by using the same argument as in Lemma 1, for all \(\varepsilon >0\), there is a \(\delta _1(\varepsilon )\) and \(T(\varepsilon )\) such that for all \(\delta _1\ge \delta _1(\varepsilon )\) and for all \(T\ge T(\varepsilon )\), \(U_1^T(\hat{\omega }^{\alpha },\tau ,\delta _1,s)\ge V_1(\hat{\omega }^{\alpha },R)-\varepsilon /8\). To complete the proof, I need to show that a recurrent set is reached in finite time for all initial states \(s^0\in S\) and any strategy that does not block the transition. Specifically, I show that for all \(\xi >0\), there is a \(T'(\xi )\) such that for all \(T\ge T'(\xi )\), \(s^0\in S\) and \(\tau \in \bar{\Sigma }_2\),

$$\begin{aligned} P_{\hat{\omega }^{\alpha }_1,\tau ,s^0}\Big (\min \{t:s^t\in R,\; R\in {\mathcal {R}}_{\tilde{\omega }^{\alpha }}\}\le T'(\xi ) \Big )\ge 1-\xi . \end{aligned}$$

Fix \(s^0\), and let \(\{s_k\}_{k=1}^{K}\), \(K\le |S|\) be a chain of states that leads to a recurrent set if \(\hat{\omega }^{\alpha }\) is played: \(P_{\hat{\omega }^{\alpha },s_k}(s_{k+1})>0\) for all \(k<K\), and \(s_{K}\) belongs to a recurrent set \(s_k\ne s_{k'}\) \(\forall k,k'=0,1,\ldots ,K\). Let \(\mu =\min _{s\in S,s'\in S,a\in A,b\in B}\{P_{a,b,s}(s'): P_{a,b,s}(s')>0\}\), and \(\tau \in \bar{\Sigma }_2\). Therefore, \(P_{\hat{\omega }^{\alpha }_1,\tau ,s_k}(s_{k+1})\ge \mu\) for all \(k< K\). The probability of reaching a recurrent set in fewer than |S| stages is greater than \(\mu ^{|S|}\) for any \(s^0\in S\). For all \(n\in {\mathbb {N}}\), consider the following probability:

$$\begin{aligned} P(n)\equiv P_{\hat{\omega }^{\alpha }_1,\tau ,s^0}\Big (\min \{t:s^t\in R,\; R\in {\mathcal {R}}_{\tilde{\omega }^{\alpha }}\}\le |S|n \Big ). \end{aligned}$$

Let \(P_{\hat{\omega }^{\alpha }_1,\tau ,s^0}(|S|i\ge \min \{t:s^t\in \cup _{R\in {\mathcal {R}}_{\hat{\omega }^{\alpha }}}R\}| \min \{t:s^t\in \cup _{R\in {\mathcal {R}}_{\hat{\omega }^{\alpha }}}R\} \ge |S| (i-1)) \equiv p_i\). Then, \(P(n) = \sum _{i=1}^n p_i (1-p_1)(1-p_2)\cdots (1-p_{i-1}).\) The derivative of this sum with respect to \(p_i\) is \(\prod _{j=1}^{i-1}(1-p_j)-\sum _{k=i+1}^{n}p_i\prod _{j=1,\; j\ne i}^{k-1}(1-p_j)=\prod _{j=1, j\ne i}^{n}(1-p_j)\ge 0\). Recall that \(p_i\ge \mu ^{|S|}\) for all i. Therefore, \(P(n) \ge \sum _{i=0}^n \mu ^{|S|} (1-\mu ^{|S|})= 1-(1-\mu ^{|S|})^{n+1}.\) Note that \(1-(1-\mu ^{|S|})^{n+1}\) tends to 1 if n tends to infinity.

1.4 Proof of Lemma 4

Given a history h, let \(N_1(h)\) be the number of times player 2 places a positive probability on b in s such that \((b,s)\in \varphi\) during a normal phase. Fix a strategy \(\tau\) and a history h such that \(P_{\bar{\sigma },\tau ,s^0}(h)>0\). Let \(\{t_{n}(h)\}_{n=1}^N\equiv \{t_{n}\}_{n=1}^{N}\) be the dates at which the following hold:

  1. (1)

    At \(t_n-1\), player 2 places a positive probability on \(b\in B\) in \(s\in S\) such that \((b,s)\in \varphi\) in a normal phase.

  2. (2)

    At \(t_n-1\), he anticipates that player 1 plays as the commitment type over the next \(k+1\) periods with a probability greater than \(1-\gamma\).

Let \(k_n\) be the length of the punishment phase beginning at \(t_n\). Consider n such that \(k\le k_n\). At \(t_n-1\), if b is realized, then he obtains at most

$$\begin{aligned} (1-\gamma )\left[ (1-\delta _2 )M+(\delta _2-\delta _2^{k+1} )\sup _{k'\ge k}E(m_2^{k'}(\delta _2,.)|\hat{\omega }_{1}^{\alpha }(s),b,s)+\delta _2 ^{k+1}M\right] +\gamma M, \end{aligned}$$
(2)

where s is the state at \(s_{t-1}\). Suppose instead that player 2 plays \(\hat{\omega }_2\) and then switches to the minmax. He anticipates that he receives at least

$$\begin{aligned} (1-\gamma )\left[ -(1-\delta _2 )M+(\delta _2-\delta _2 ^{k+1} )\inf _{k'\ge k}E(m_2^{k'}(\delta _2,.)|\hat{\omega }^{\alpha }(s),s)-\delta _2^{k+1}M\right] -\gamma M. \end{aligned}$$
(3)

If player 2 is sufficiently patient and k is sufficiently large relative to \(\delta _2\), then

$$\begin{aligned} |\sup _{k'\ge k}E(m_2^{k'}(\delta _2,.)|a',b',s)-E(m_2|a',b',s)|<\eta /2, \end{aligned}$$

for all \((a',b')\in A\times B\). Furthermore, given the value of \(\gamma\), the difference between Eqs. (2) and (3) is smaller than

$$\begin{aligned} (1-\delta _2 )2M+(\delta _2-\delta _2^{k+1} )(E(m_2|\hat{\omega }_{1}^{\alpha }(s),b,s)-E(m_2|\hat{\omega }^{\alpha }(s),s))+2\delta _2 ^{k+1}M+2\eta , \end{aligned}$$

for all \((b,s)\in \varphi\). Given the value of \(\alpha\) and Assumption 1, \(E(m_{2}|\hat{\omega }^{\alpha },s)> E(m_2|\hat{\omega }_{1}^{\alpha }(s),b,s)\), there exists a \(\delta _2'\), and for all \(\delta _2\ge \delta _2'\), there is a \(k_{1}(\delta _2)\ge k_0(\delta _2)\) such that if \(k\ge k_{1}(\delta _2)\), then this difference is negative for all \((b,s)\in \varphi\). Therefore, given \(\delta _2\ge \delta _2'\) and the fact that Point 2 holds, there are fewer than \(k_{1}(\delta _2)\) normal phases in which player 2 assigns a positive probability to b in s such that \((b,s)\in \varphi\) during a normal phase, whatever a best response \(\tau\). According to Lemma 5, for all \(\lambda >0\), there are no more than \(N_0(\lambda ,\gamma ,k_{1}(\delta _2)+1)\) stages at which Point 2 does not hold with a probability greater than \(1-\lambda\). Consequently, there are no more than \(N_0(\lambda ,\gamma ,k_{1}(\delta _2)+1)+k_1(\delta _2)\) normal phases in which player 2 places a positive probability on b in s such that \((b,s)\in \varphi\) during a normal phase with a probability greater than \(1-\lambda\).

1.5 Proof of Lemma 7

Fix a history h. Note that

$$\begin{aligned} \frac{1}{T}\sum _{t=0}^{T-1}u_{2}(a^{t},b^{t},s^{t})=\frac{1}{T}\sum _{a\in A, s\in S}n_T(a,s)u_{2}(a,\hat{\omega }_2(s),s), \end{aligned}$$

where \(n_T(a,s)\) is the number of times that (as) occurs in h between dates 0 to T. Because \(\frac{n_t(a,s)}{t} \xrightarrow {a.s.} p(a,s)\) for all \((a,s)\in A\times S\), there is a \(T_h\) such that for all \(T\ge T_h\),

$$\begin{aligned} \frac{1}{T}\sum _{t=0}^{T-1}u_{2}(a^{t},b^{t},s^{t})\ge \min _{s\in S}V_2(\hat{\omega }^{\alpha },s)-\frac{\eta }{16} \end{aligned}$$

Following the same logic as in the proof of Lemma 1, it is possible to find \(T_h^*\) and \(\delta _h^*\) such that

$$\begin{aligned} \inf _{\begin{array}{l} t\ge T^*_h\\ \delta _2\ge \delta ^*_h \end{array}}U_2^t(h,\delta _2)\ge \min _{s\in S}V_2(\hat{\omega }^{\alpha },s)-\eta \end{aligned}$$

Therefore, for all \(\xi >0\), there is a \(\hat{T}\) and \(\hat{\delta }_2\) such that the following holds:

$$\begin{aligned} P_{\hat{\omega }^{\alpha },s^0}\bigg (h:\forall t\ge \hat{T}, \delta _2\ge \hat{\delta }_2 , \;U_2^t(h,\delta _2)\ge \min _{s\in S}V_2(\hat{\omega }^{\alpha },s)-\eta \bigg )\ge 1-\xi . \end{aligned}$$

The second point follows immediately from the fact that player 1 passes the test at \(t-1\).

1.6 Proof of Lemma 9

For all \(s^t\in S\), there is a path toward some \(\underline{s}_R\) under \(\hat{\omega }\): That is, there is a sequence of states \(\{s_k\}_{k=0}^{K}\) such that \(s_k\ne s_{k'}\) \(\forall k,k'=0,1,\ldots ,K\); \(s_{K}\in \{\underline{s}_R\}_{R\in {\mathcal {R}}_{\hat{\omega }^{\alpha }}}\), \(s_0=s^t\) and \(q(s_k|\hat{\omega }(s_{k-1}),s)>0\) for all \(1<k<K\). Let \(t'\ge t\), \(s^{t'}=s_k\) and \(P_{\hat{\omega }_1^{\alpha },\tau ,s_k}(s_{k+1})\) be the probability of reaching \(s_{k+1}\) when the state at \(t'\) is \(s_k\). Let \(\tau (b|h^{t'})\) is the probability that player 2 places on b after history \(h^{t'}\). Because player 2 does not block the transition, \(q(s_{k+1}|a,b,s)=0\; \forall a\in A\Rightarrow \tau (b|h^{t'})=0\). Let \(\mu =\min _{s,s',a,b}\{q(s'|a,b,s):q(s'|a,b,s)>0\}\). If \(\tau (b|h^{t'})>0\), then \(q(s_{k+1}|a,b,s_k)\ge \mu\) for some \(a\in A\). Therefore, \(P_{\hat{\omega }_1^{\alpha },\tau ,s_k}(s_{k+1})\ge \frac{\alpha }{|A|-1} \mu\).

Because the path cannot be greater than |S|, the probability that the path \(\{s_k\}_{k=0}^{K}\) is taken is greater than \((\frac{\alpha }{|A|+1})^{|S|}\).

Replicating the same argument, if no state in \(\{\underline{s}_R\}_{R\in {\mathcal {R}}_{\bar{\omega }}^{\alpha }}\) is reached at \((m-1)|S|+1\), then there is a probability greater than \((\alpha \mu )^{|S|}\) that this set will be reached before m|S| for any m.

Along the lines of the proof of Lemma 3, the probability that \(\{\underline{s}_R\}_{R\in {\mathcal {R}}_{\bar{\omega }}^{\alpha }}\) is reached before m|S| is greater than \(\sum _{l=0}^{m-1}\mu ^{|S|}(1-\mu ^{|S|})^l\). Note that this upper bound does not depend on R or on \(\tau\) and goes to 1 if \(m\rightarrow \infty\).

1.7 Proof of Lemma 11

Take n such that the length of the punishment phase that begins at \(t_n\) is greater than \(k>k^*\). At \(t_n-1\), the expected payoff of player 2 is not greater than

$$\begin{aligned}&(1-\gamma )\bigg [M(1-\delta _2)+\sum _{t=t_{n}}^{t_{n}+k^*-1}P(F^{t})\Big ((\delta _2-\delta _2^{t-t_{n}+1})M\nonumber \\&\qquad +\,\delta _2^{t-t_{n}+1}(1-\delta _2^{t_n+k-t})\min _{s\in R}m_2^{t_n+k-t}(\delta _2,s)\Big )\nonumber \\&\qquad +\,\Big (1-\sum _{t=t_{n}}^{t_{n}+k^*-1}P(F^{t})\Big )(\delta _2-\delta _2^{k+1})M+\delta _2^{k+1}M\bigg ]+\gamma M. \end{aligned}$$
(4)

Recall that \(\sum _{t=t_{n}}^{t_{n}+k^*-1}P(F^{t})\ge 1-\xi\). Moreover, if \(\delta _2\) is sufficiently large and k is sufficiently large relative to \(\delta _2\), then \(\min _{s\in R}m_2^{t_n+k-k^*}(\delta _2,s)\le \min _{s\in R} m_2(s)+\eta /2\). Therefore, Eq. (4) is smaller than

$$\begin{aligned}&(1-\gamma )\left[ (1-\xi )\left( (1-\delta _2^{k^*})M+(\delta _2^{k^*}-\delta _2^{k+1})(\min _{s\in R}m_2(s)+\eta /2)\right) +\xi (1-\delta ^{k+1})M+\delta _2^{k+1}M\right] +\gamma M. \end{aligned}$$

Because \(\gamma \le \eta /(8M)\) and \(\xi \le \eta /(8M)\) , the above equation is smaller than

$$\begin{aligned} (1-\delta _2^{k^*})M+(\delta _2^{k^*}-\delta _2^{k+1})\min _{s\in R}m_2(s) +\delta _2^{k+1}M+\eta \end{aligned}$$

Consequently, there is a \(\delta _2''\ge \delta _2^{'}\) such that for all \(\delta _2\ge \delta _2''\), there is a \(k_2(\delta _2)\) independent of \(\tau\) and R such that for all \(k\ge k_2(\delta _2)\), Eq. (4) is smaller than \(\min _{s\in R}m_{2}(s)+2\eta\).

1.8 Proof of Lemma 12

I rename the recurrent sets by \(\{R_1,\ldots ,R_{r}\}\) and order them such that \(\min _{s\in R_l} m_2(s) \le \min _{s\in R_{l+1}} m_2(s)\) for all \(r'\in \{1,..,r-1\}\). Fix \(l\le m-r\), and let \(\sum _{t=t_{n+l}}^{t_{n+l}+k^*-1}P(F^{t})\equiv P_{n+l}\).

Note that, for all \(r'\in \{1,..,r\}\), if \(s^{t_{n+l}-1}\in R_{r'}\), then

$$\begin{aligned} P_{n+l}\ge P_{\hat{\omega }_1^{\alpha },\tau ,s^{t_{n+l}-1}}\bigg ( \min \{k : s^{t_{n+l}+k}\in \{\underline{s}_{R_1},\underline{s}_{R_{2}},\ldots ,\underline{s}_{R_{r'}}\} \}< k^*\bigg ) \end{aligned}$$

Therefore, by Lemma 9, if \(s^{t_{n+l-1}}\in R_r\) then \(P_{n+l}\ge 1-\xi\).

Now, suppose that \(s^{t_{n+l-1}}\in R_{r-1}\) and \(P_{n+l}<1-\xi\). We have:

$$\begin{aligned}&P_{\hat{\omega }_1^{\alpha },\tau ,s^{t_{n+l}-1}}\bigg ( \min \{k : s^{t_{n+l}+k}\in \underline{s}_{R_r}\} \}< k^*\bigg )\\&\quad \ge 1-\xi /2-P_{\hat{\omega }_1^{\alpha },\tau ,s^{t_{n+l}-1}}\bigg ( \min \{k : s^{t_{n+l}+k}\in \{\underline{s}_{R_{r-1}},\ldots ,\underline{s}_{R_{1}}\} \}< k^*\bigg )\ge \xi /2 \end{aligned}$$

The first inequality is a consequence of Lemma 9, and the second comes from \(P_{n+l}<1-\xi\).

Therefore, there is a probability greater than \(\xi /2\) that \(R_r\) is reached. Note that as long as the game remains in \(R_r\), the tough punishment is not triggered.

Additionally, for all recurrent sets R, if \(s\in R\), then the probability of being in R in the next period is greater than \(\frac{\alpha }{|A|-1}\mu\) because player 2 does not block the transition (c.f. proof of Lemma 9). Therefore, if \(R_r\) is reached, then the probability that the next normal phase begins in \(R_r\) is greater than \((\frac{\alpha }{|A|-1} \mu )^{k^*}\).

At \(t_{n+l}-1\), there is a probability greater than \((\xi /2 )(\frac{\alpha }{|A|-1}\mu )^{k^*} \equiv \tilde{\mu }>0\) that the next normal phase begins in \(R_r\). In this case, \(s^{t_{n+l+1}-1}\in R_r\) because the next punishment phase is triggered after a deviation of type 2. Therefore, we can conclude that there is a probability greater than \(\tilde{\mu }\) that \(\sum _{t=t_{n+l+1}}^{t_{n+l+1}+k^*-1}P(F^{t})> 1-\xi\).

If the recurrent set is \(R_{r-2}\), then there is a probability greater than \(\tilde{\mu }\) that the next normal phase begins in \(R_{r-1}\) or \(R_{r}\). Therefore, \(s^{t_{n+l+1}-1}\) is in \(R_{r-1}\) or \(R_{r}\) with a probability greater than \(\tilde{\mu }\). If \(s^{t_{n+l+1}-1}\) is in \(R_{r}\), then \(P_{n+l+1}>1-\xi\). If \(s^{t_{n+l+1}-1}\) is in \(R_{r-1}\) and \(P_{n+l+1}<1-\xi\), then there is a probability \(\tilde{\mu }\) that \(s^{t_{n+l+2}-1}\) will be in \(R_{r}\). Therefore, there is a probability greater than \(\tilde{\mu }^2\) than \(P_{n+l+1}>1-\xi\) or \(P_{n+l+2}>1-\xi\).

Continuing, we find that there is a probability greater than \(\tilde{\mu }^{r}\) that \(P_{n+l+l'}>1-\xi\) for some \(l'\in \{0,\ldots ,r-1\}\), whatever the recurrent set in which is \(s^{t_{n+l}-1}\). Fix an integer \(m'\) and let \(p_{m'}= P(\exists l'\in \{0,\ldots , r-1\}s.t. P_{n+m'r+l'}>1-\xi )\). We have just shown that \(p_{m'}>\tilde{\mu }^r\).

Fix m and let \(m=r m^{''}+k\) where \(m^{''}\) is an integer and \(k\in \{0,\ldots ,r-1\}\)

$$\begin{aligned}&P_{\hat{\omega }^\alpha , \tau , s}\bigg (\exists l\in \{n,\ldots , n+m\} : \sum _{t=t_{n+l}}^{t_{n+l}+k^*-1}P(F^{t})\ge 1-\xi \bigg )\\&\quad \ge P_{\hat{\omega }^\alpha , \tau , s}\bigg (\exists l\in \{n,\ldots , n+r m^{''}\} : \sum _{t=t_{n+l}}^{t_{n+l}+k^*-1}P(F^{t})\ge 1-\xi \bigg )\\&\quad =p_0+(1-p_0)p_1+\cdots +p_{m''}(1-p_0)(1-p_1)\cdots (1-p_{m''-1}) \end{aligned}$$

Along the lines of the proof of Lemma 3 and by using the fact that \(p_{m'}>\tilde{\mu }^r\) for all \(m'\in \{0,\ldots ,m^{''}\}\), we can conclude that this probability converges to 1 as m goes to infinity.

1.9 Block stationarity

In this section, I provide a proof of Footnote 8. Let \(W(s)=co\{w\in {\mathbb {R}}_2: \exists x\in \Omega \text { s.t. } V_i(x,s)=w_i\}\) for all \(i=1,2\), where co denotes the convex envelope. By assumption, \(W(s)= W(s')=W\) for all s and \(s'\in S\). Let \(\bar{V}_1=\sup \{v_1\in {\mathbb {R}}:\exists (V_1,V_2)\in W \text { s.t. } V_1=v_1 \text { and } V_2\ge \max _{s\in S}m_2(s) \}\). As W is convex, there are weights \(\{\beta _l\}_{ l=1,\ldots ,L}\) and stationary strategy profiles \(\{x_{l}\}_{l=1}^L\) such that for all \(i=1,2\), \(\bar{V}_i=\sum _{l=1}^L\beta _l V_i(x^l,s)\). For all \(\varepsilon >0\), there are integers \(\{T_l\}_{ l=1,\ldots ,L}\) such that for all \(i=1,2\), \(|\sum _{l=1}^L\frac{ T_l}{T}V_i(x^l,s)-\bar{V}_{i}|\le \varepsilon \text { and }\sum _{l=1}^L\frac{ T_l}{T}V_2(x^l,s)> \max _{s\in S}m_2(s)\), where \(\sum _{l=1}^L T_l=T\). Let \(\bar{\omega }^T\) be the strategy profile that plays \(x_1\) over \(T_1\) periods, then \(x_2\) over \(T_2\) periods,..., \(x_L\) over \(T_L\) periods, and then restart the cycle. Thus, \(\bar{\omega }^T\) repeatedly plays cycles of length T that are composed of L phases. The proof of Lemma 1 can be easily adapted.

If \(m_2(s)=m_2(s')\) for all \(s,s'\in S\) and the definition of the commitment type is unchanged, then the proof of reputation effects is straightforward. Indeed, it suffices to fix \(\varepsilon >0\) and to replace \(\hat{\omega }(s)\) with \(\bar{\omega }^T(s,l)\), where s is the current state and l is the cycle phase. Recall that, by construction, \(\bar{\omega }^T\) yields payoffs that lie in a \(\varepsilon\)-neighborhood of \(\bar{V}_i\) for all \(i=1,2\). Following the lines of the proof of Theorem 2, I find that mimicking the commitment strategy yields more than \(\bar{V}_1-2\varepsilon\).

In the case of state-dependent minmax payoffs, the proof of reputation effects is not straightforward, and some adaptations are needed. The idea is to work on an “extended” set of states in which \(\bar{\omega }^T\) is a Markov chain. Then, the results can be easily adapted. For all \(l=1,\ldots ,L\), let \(x^{l,\alpha }\) be the mixed stationary strategy profile that plays \(x^l\) with probability \(1-\alpha\) and that divides the weight \(\alpha\) equally among the other actions in A. Let \(\bar{\omega }^{T,\alpha }=(\bar{\omega }^{T,\alpha }_1,\bar{\omega }^{T}_2)\), where \(\bar{\omega }^{T,\alpha }_1\) is the strategy profile \(x^{1,\alpha }\) that plays over \(T_1\) periods; then, \(x^{2,\alpha }\) over \(T_2\) periods,..., and \(x^{L,\alpha }\) over \(T_L\) periods.

Rename the set of states \(\bar{S}\) and the transition function \(\bar{q}\). Fix T such that the corresponding \(\bar{\omega }^{T,\alpha }\) yields payoffs that lie in an \(\varepsilon\)-neighborhood of \(\bar{V}_i\) for all \(i=1,2\). Consider the extended set of states \(S=\bar{S}\times T\), and define q as the transition function over S. Suppose that \(\bar{\omega }^{T,\alpha }\) has been played for t periods. Then, t can be decomposed as follows: \(t=nT+k\), where n, k are integers and \(k=1,\ldots , T\) (k is the kth period of the cycle of length T). Let \(k\oplus 1\equiv {\mathbb {1}}_{k<T}k+1\). Then, given any pair of states \((s,k),(s',k') \in S\) and action profile \((a,b)\in A\times B\), \(q((s',k')|a,b,(s,k))=\bar{q}(s'|a,b,s)\) if \(k<T, k'=k+1\) or if \(k=T,k'=1\), and \(q((s',k')|(s,k))=0\) otherwise. Note that the strategy profile \(\bar{\omega }^{T,\alpha }\) follows a Markov chain with transition q over S. The corresponding set of recurrent sets \(\bar{{\mathcal {R}}}\) can be defined as before. Moreover, because the set S is finite, \(\bar{{\mathcal {R}}}\) is nonempty. There exists \(\alpha >0\) sufficiently small that \(\bar{\omega }^{T,\alpha }\) for all \(s\in S\) and \(i=1,2\): \(V_i(\bar{\omega }^{T,\alpha },s)\ge \bar{V}_i-2\varepsilon\), \(V_2(\bar{\omega }^{T,\alpha },s)> \max _{s\in S} m_2(s)\), and if \(E(m_2|x^{l}(s),s)>E(m_2|a',b's)\) for some \((a',b')\in A \times B\) and \(l=1,\ldots ,L\), then \(E(m_2|x^{l, \alpha }(s),s)>E(m_2|a',b',s)\). Let l(k) be the phase in the cycle (\(l=1,\ldots ,L\)) corresponding to the kth period of the cycle. Finally,

$$\begin{aligned} \varphi _k =\left\{ (b,s)\in B\times \bar{S}:\exists s^{\prime }\in \bar{S} \text { s.t. } q(s^{\prime }|x^{l(k),\alpha }(s),s)>0\text { and }q(s^{\prime }|a,b,s)=0, \; \forall a\in A \right\} . \end{aligned}$$

Because \(\bar{\omega }^{T,\alpha }\) is a Markov chain over an extended (finite) set of states, the proof of Lemmas 1, 2 and 3, which uses properties of Markov chains only, applies easily. (S is replaced by \(\bar{S}\), \(\hat{\omega }^\alpha\) by \(\bar{\omega }^{T}\) and \({\mathcal {R}}_{\hat{\omega }^\alpha }\) by \(\bar{{\mathcal {R}}}\).) Therefore, we obtain the following:

Let \(n_t(a,s,k)\) be the number of times at date t that the action a was played in state s at the kth period of the cycle of length T. Fix \(R\in \bar{{\mathcal {R}}}\). For all \((s,k)\in R\), let \(p(a,s,k)\equiv p^{\alpha }(s,k)(1-\alpha )\) if \(a=\hat{\omega }_1^{l(k)}(s)\) and \(p(a,s,k)\equiv p^{\alpha }(s,k)\frac{\alpha }{|A|-1}\) otherwise, where \(p^{\alpha }(s,k)\) is the average time spent on \((s,k)\in S\). By using these notations and noting that \(\sum _{a\in A,(s,k)\in S}\sum _{k=1}^Tp(a,s,k)u(a,x_2^{l(k)}(s),s)= V_2(\bar{\omega }^{T,\alpha },s)\), the proof of Lemma 7 can be adapted easily.

Lemma 7’ For all \(\xi >0\), there is a \(\hat{T}\) and \(\hat{\delta }_2\) such that for all \(R\in \bar{{\mathcal {R}}}\) and \(s^0\in R\), the following holds:

  1. 1.

    The probability of passing the test at all \(t\ge \hat{T}\) if \(\hat{\omega }^{\alpha }\) is played is greater than \(1-\xi\):

    $$\begin{aligned} P_{\bar{\omega }^{T,\alpha },s^0}\bigg (h:\forall t\ge \hat{T}, D_t^{\hat{\delta }_2}(h)=1\bigg )\ge 1-\xi . \end{aligned}$$
  2. 2.

    Moreover, if player 1 fails the test at \(t>\hat{T}\), then for all \(\delta _2\ge \hat{\delta }_2\), player 2 obtains a discounted payoff of at least \(\min _{s\in S}V_2(\hat{\omega }^{\alpha },s)-\eta\) from date 0 to \(t-1\).

Replacing \(V_2(\hat{\omega }^{\alpha },s)\) with \(V_2(\bar{\omega }^{T,\alpha },s)\), we obtain Lemma 8. To prove the other lemmas, I need to adapt Assumption 1:

Assumption 1’ For all \(k=1,..,T\), there is no \((b,s)\in \varphi _k\) such that \(E(m_{2}|x^{l(k)}(s),s)\le E(m_{2}|x^{l(k)}_1(s),b,s)\).

I also redefine the commitment strategy as follows:

  • Normal path n: play \(\bar{\omega }^{T,\alpha }_1\) until player 2 deviates from \(\bar{\omega }^{T}_2\). Suppose that he plays an action b instead of playing \(x^{l(k)}_{2}(s)\) for some period k of the cycle. In this case, go to the punishment path (nbsk) if \((b,s)\in \varphi _k\) or if \((b,s)\notin \varphi _k\) and s belong to a recurrent set.

  • Punishment path (nbsk):

    1. (1)

      If \((b,s)\in \varphi _k\), then play the minmax strategy against player 2 over n periods. Then, proceed to the normal path \(n+1\).

    2. (2)

      If \((b,s)\notin \varphi _k\) and \(s\in R, \; R\in \bar{ {\mathcal {R}}}\), then play \(\hat{\omega }_1^{\alpha }\) until one of the following events occurs:

       (a)  player 2 plays an action \(b'\) instead of \(x^{l(k')}_2(s)\) during the \(k'^{\text {th}}\) period of the cycle in state \(s'\) such that \((b',s')\in \varphi _{k'}\).

       (b)  s such that \(m_2(s)\le \min _{s': \exists k=1,\ldots ,K \text { s.t. } (s',k)\in R} m_2(s')\) is reached.

       (c)  The punishment phase began \(\min \{n,k^{**}\}\) periods ago.

      Suppose that case (a) or (b) occurs after m periods of soft punishment then minmaxes player 2 for \(n-m\) periods (play \(\mu _2(\delta _2, n-m)\)). Then, go to normal phase \(n+1\). If case (c) occurs, then go to normal phase \(n+1\).

The integer \(N_1(h)\) is now the number of times a punishment phase in history h begins because player 2 deviates from \(\bar{\omega }^{T,\alpha }_2\) by playing \(b\ne x^{l(k)}_2(s)\) at some period k of the cyclical strategy \(\bar{\omega }^{T,\alpha }\) in state s such that \((b,s)\in \varphi _k\) during a normal phase, \(N_1'(h)\) is the number of times a minmax punishment begins because player played \(b\ne x^{l(k)}_1(s)\) at some period k of \(\bar{\omega }^{T,\alpha }\) in state s such that \((b,s)\in \varphi _{k}\) during one of the \(k^*\) first periods of a soft punishment phase, and \(N_2(h)\) is the number of times player deviates in a recurrent set during a normal phase. Under Assumption A1’, the proofs of Lemmas 4, 9, 11 and 12 apply immediately.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Marlats, C. Reputation effects in stochastic games with two long-lived players. Econ Theory 71, 1–31 (2021). https://doi.org/10.1007/s00199-020-01252-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00199-020-01252-6

Keywords

JEL Classification

Navigation