Strategy Revision Opportunities and Collusion

This paper studies whether and how strategy revision opportunities affect levels of collusion in indefinitely repeated two-player games. Consistent with standard theory, we find that such opportunities do not affect strategy choices, or collusion levels, if the game is of strategic substitutes. In contrast, there is a strong and positive effect for games of strategic complements. Revision opportunities lead to more collusion. We discuss alternative explanations for this result.


Introduction
Strategy revision opportunities describe possibilities for players to change a strategy during the play of a repeated game. The role of strategy revision opportunities is somewhat of a conundrum for economists. From a theory perspective, strategy revision opportunities should not affect behaviour, since behaviour strategies, which allow full flexibility during the course of play, and mixtures over strategies chosen at the start of the game are seen as equivalent in games of perfect recall (Kuhn, 1953;Aumann, 1964). Any unilateral revision a decision-maker may want to make can be encoded in a suitably specified mixed strategy.
Still in many applications, decision-makers seem to care about having the opportunity to revise strategies. In practice, revision opportunities are often set by higher-level management and play an important role in the design of organizations (Daily et al., 2003). In fact, many managers at higher levels of the hierarchy have to make day-to-day decisions on strategic oversight -that is, how much flexibility to give to lower-level management to make choices, or revise initially set strategies, as market conditions unfold. The extent to which corporations regulate franchises "top down" or allow revisions to, for example, pricing strategies is only one example of such decisions. 1 Policy makers should also be concerned about revision opportunities. Legal restrictions on meetings and agreements between market participants do affect the possibility of (orchestrated) strategy revisions and potentially affect prices and competition in local markets. 2 A key question for both policy-makers and managers is how such strategy revision opportunities affect collusion and cooperation across subdivisions of a company, in local or even global markets.
In this paper we conduct a laboratory experiment to study the impact of strategy revision opportunities on strategy choices and levels of collusion. This experimental approach enables us to elicit information about the strategies participants use in the repeated game, allowing us to identify strategy revisions and observe intended behavior off the realized outcome path. Such data is crucial to understand the mechanisms through which revision opportunities affect collusive behavior, and typically would 1 Different practices can be observed. Online business-to-consumer retailers such as Amazon and CheapTickets restrict their flexibility by the use of price bots. McDonalds typically allows local restaurants to unilaterally deviate from pricing strategies (nationwide commercials involving prices usually have the caveat "in participating restaurants" attached). Ikea, by contrast, does not generally allow local stores to price articles higher than indicated in the nationwide catalogue; lower prices are allowed in some cases (see, for example, the case study in Kaynak, 1991).
2 Some instances of antitrust legislation provide good examples of such policies (see, for example, McCutcheon, 1997). be impossible to obtain from field data. Our experiment systematically varies revision opportunities across treatments. The design isolates the effect of revision opportunities on cooperation while keeping other factors such as the timing of moves, the available strategy sets, the size of the stakes, the number of players and the incentives to deviate or cooperate constant.
While standard subgame perfect equilibria are unaffected by revision opportunities, there are two types of considerations that could give some role to revision opportunities. One such refinement is renegotiation proofness. While renegotiation proofness implies that revision opportunities cannot increase collusion -since with revision opportunities players may not be able to credibly commit to the required punishment paths -weak renegotiation proofness, for example, has no bite in oligopoly games (Farrell, 2000;Aramendia et al., 2005). Another consideration is miscoordination, which could play an important role given the large number of possible equilibria in indefinitely repeated oligopoly games. As argued above, if arbitrarily complex strategies are allowed, then any possible revision of a strategy during the game can be coded into initial plans. As soon as there is some limit to the complexity (number of states) of strategies, revision opportunities can become important.
In the experiment, participants play an indefinitely repeated game where the stage game is either a game of strategic substitutes or of strategic complements. The stage games are derived from linear duopoly games (Cournot and Bertrand, respectively) and reduced to symmetric, normal-form games in which both players have four actions to choose from. The demand systems and action sets are chosen so that the resulting payoff matrices are as close as possible: they have identical diagonal elements (including the collusion and Nash outcomes), as well as identical temptation and sucker payoffs. The games primarily differ in the location of the (myopic) best response to collusion. In the substitutes game, the best response to collusion is less cooperative than the Nash action, while in the complements game it is more cooperative than the Nash action.
At the beginning of a supergame, participants program a strategy by choosing an initial action choice and a dynamic response machine, which specifies a recommended action in response to their rival's previous action choice. Three treatment variations change the degree to which strategy revisions are possible. These variations are labeled the baseline, unilateral and bilateral variations. In the baseline treatment participants cannot change their dynamic response machine; that is, they lack revision possibilities. Under the unilateral variation, unilateral changes are possible, while under the bilateral variation mutual consent is required to change one's dynamic response. The unilateral variation allows for full revision opportunities, while the bilateral variation is designed to mimic orchestrated revisions, such as those covered by renegotiation theory. In all variations, participants can deviate from their recommendation using one-shot deviations -these deviations come at a small cost. 3 A machine revision allows participants to economize on such costs in future choices.
We find that the existence of revision opportunities has no effect on cooperation under strategic substitutes, yet has a significant, positive effect under strategic complements. The second effect is large enough to reverse the ranking of collusion rates between interaction types: there is more cooperation under strategic substitutes if revision possibilities are absent, while the opposite is true if unilateral revisions are possible. Neither standard risk dominance nor considerations of renegotiation can explain these results. Given the large multiplicity of equilibria in these games, it is intuitive that fear of miscoordination might play an important role. We define a notion of "fear of miscoordination", based on minmax regret, and show that it yields predictions consistent with our main results on the effect of revision opportunities.
In terms of the strategies participants use, we find that most use machines that are familiar from the literature. Out of the 256 possible machines participants could program, the most prominent machines mimic the static Nash, myopic-best-response, tit-for-tat and Nash-reversion strategies. Participants tend to use a machine less often the higher the "fear of miscoordination" associated with it. This is particularly so under strategic complements, where generally "fear of miscoordination" tends to have more bite.
Our paper contributes to the literature on cooperation in games of strategic substitutes and complements. Potters and Suetens (2009) study collusion in a laboratory experiment using finitely repeated games of strategic substitutes and complements. They find more cooperation when actions exhibit strategic complementarities. As all their treatments are within a framework of behavior strategies, their results are best compared to our unilateral variations where strategy revision is possible. Our results with strategy revision opportunities confirm theirs. However, without revision opportunities, we find the opposite is true: there is more collusion with strategic substitutes.
3 This possibility is included so that participants have the full strategy space of the repeated game available. Collusion under this dynamic-response elicitation scheme can be sustained using a simple trigger strategy that implements the grim trigger strategy. This trigger strategy results in the same minimum discount rate for collusion as in the standard repeated game -details are given in Section B of the Supplementary Materials. That behavior is not unduly affected by the small cost imposed on these deviations is also shown empirically using a hot variation where these costs were set to zero. See Embrey et al. (2016) for details of this treatment, where the use of such dynamic response machines to elicit strategies in repeated games is described, along with a detailed investigation of the elicited strategies for the more commonly implemented environment with revision opportunities.
A recent strand of the experimental literature on cooperation/collusion explicitly investigates the role of communication. Fonseca and Normann (2012), Andersson and Wengström (2012) or Cooper and Kühn (2014), for example, study renegotiation with communication and find mixed results as to whether communication, and the timing of communication, leads to more collusion or not. Our setting and results provide insight into renegotiation when explicit communication is not possible. 4 More generally, the results add to the literature on experimental oligopoly games (see, for example, Huck et al., 1999Huck et al., , 2000. Finally, our results also relate to the experimental literature on indefinitely repeated games. Usually this literature either elicits strategies without revision possibilities (Selten et al., 1997;Dal Bó and Fréchette, 2011b) or lets subjects play the game in a "hot" setting without eliciting strategies (Dal Bó, 2005;Casari and Camera, 2009). Our results show that which setting is chosen can potentially affect behavior, at least when the game is one of strategic complements. One exception is Mengel and Peeters (2011), who have a "semi-hot" treatment (hot but with small costs) and a "strong commitment" treatment in a study comparing contributions by partners and strangers in a repeated public good game. Their setting is not suitable to study strategy revisions, however, since, although participants are allowed to deviate from pre-programmed strategies, they are not allowed to revise strategies.
The paper is organized as follows: Section 2 outlines the experimental design and experimental procedures. Section 3 establishes our primary empirical results, in particular that revision opportunities have a positive effect under strategic complements and that this effect is large enough to reverse the ranking of collusion rates between the interaction types. Section 4 provides a discussion of the alternative explanations for our results, and formalises the concept of fear of miscoordination. A final section concludes.

The experiment
Designing experiments to understand strategic behaviour in indefinitely repeated games has two principal challenges. The first is the well-known theoretical problem of characterising the entire set of equilibrium strategies. The second concerns the complexity to directly elicit strategies due to the size and complexity of the strategy space, and constraints on participants' time, cognitive abilities and experience. Related to this second challenge, the existing experimental literature has usually limited the strategy space considerably in order to do so (see, for example, Dal Bó and Fréchette, 2011b). Such an approach has clear consequences for studying strategy revision opportunities since, under a restricted strategy space, the equivalence of behaviour and mixed strategies can break down. On the other hand, the impossibility to encode everything into a mixed strategy -due to a restricted strategy space, or reasoning or other complexity costs -is one of the primary reasons why revision opportunities may matter in many real-life situations of interest.
Our design resolves these tensions by restricting participants to program a unitrecall dynamic response, but allowing them to deviate from the action proposed by the response to make available the full strategy space in all treatments. The experiment then studies revision opportunities in 3 × 2 design with three levels of strategy revision opportunities and two types of strategic interaction. In the following we provide a detailed specification of the treatment variables and how the different levels/types were implemented in the laboratory as well as the experimental procedures.

Design
The two stage games Participants play one of two possible games in Figure 1 that differ in the type of strategic interaction: strategic substitutes or strategic complements. Payoffs are in experimental currency units (ECU), which are converted to Euros at the end of the experiment. 43, 43 31, 51 25, 52 23, 54 B 51, 31 36, 36 32, 40 29, 38 C 52, 25 40, 32 33, 33 31, 32 D 54, 23 38, 29 32, 31 30, 30 Strategic substitutes. 43, 43 23, 54 14, 52 7, 47 B 54, 23 36, 36 32, 40 28, 37 C 52, 14 40, 32 33, 33 31, 32 D 47, 7 37, 28 32, 31 30, 30 Strategic complements. The structure and payoffs of the games are designed so that, while each game has a natural duopoly analogue, the two are as identical as possible. To provide this analogue, the substitutes game is a discretized version of a differentiated-goods linear Cournot duopoly and the complements game is a discretized version of a differentiated-goods linear Bertrand duopoly. In both cases, the duopolists produce differentiated-goods that are product substitutes. To ensure a fair comparison across games, the underlying duopoly games were calibrated so that the majority of payoffs for key action pairs are identical across games: 1. the Nash equilibrium payoffs (π N ash ) that result from both players playing action C are identical.
2. the joint payoff maximizing payoffs (π Collusion ) that result from both choosing action A are identical.
3. the optimal deviation against the co-player playing action A, which requires playing action B in the complements game and action D in the substitutes game, yields the same payoff (π Dev ) for the defector and the sucker across games.
4. the remaining actions in the games, action D for the complements game and action B for the substitutes game, are such that all diagonal elements are identical across games. 5 Sustaining cooperation via trigger strategies requires 1 1−δ π Collusion ≥ π Dev + δ 1−δ π N ash to be satisfied. All the payoff parameters involved in this inequality are the same for both our complements and substitutes variations. 6 As a consequence of these choices, the minimal discount factor needed for those strategies to sustain collusion via trigger strategies is the same in both games (δ min = 0.8077) and the chosen continuation probability of δ = 7/8 is above this level.
The crucial difference between the two games is the location of the optimal deviation against the co-player playing the joint payoff maximizing action, which is action B with strategic complements and action D with strategic substitutes. In games of strategic complements as my opponent "increases" her action, I would like to do the same. Consequently, the optimal deviation action is located between the collusive action (A) and the Nash action (C) in the complements game, whereas it is located beyond the Nash action in the substitutes game, where I would like to respond to an "increase" 5 After rounding the payoffs to numbers, one unit of payoff was changed to some payoffs in order to avoid degeneracies that are caused by rounding. This is done in such a way that games become even more similar: for instance, this led to the box formed by actions B and C and that formed by actions C and D being identical across games. See Section A of the supplementary materials for the underlying demand systems of the two games, as well as a description of the process that generated the discretized versions. Note that, in order to ensure the incentives to cooperate are balanced across the games it was necessary to choose different demand systems under price competition and under quantity competition. 6 The same is also true for other collusive strategies, such as tit-for-tat. While such strategies are not subgame perfect, they can be implemented without one-shot deviations or machine changes. in my opponent's action by a "decrease" myself. 7 This difference in the location of these actions is the primary difference between the games; a difference that will prove to have a significant interaction with the level of strategy revision opportunities. For convenience, we will refer to the actions A, B, C and D as respectively Collusion, Dev.SC, Nash and Dev.SS.
Repeated game strategies At the beginning of a repeated game, participants are asked to specify an intended strategy. This strategy consists of an initial action, to be played in the first stage, and a programmed machine, which recommends at each later stage an action conditional on their co-player's action in the previous stage. The machine is denoted by a quadruple z A z B z C z D specifying which action z k ∈ {A, B, C, D} the machine is programmed to play if the opponent has chosen action k ∈ {A, B, C, D} in the previous stage. An intended strategy is denoted by z ∅ − z A z B z C z D , where the first element refers to the initial action choice. The most general strategy one can formulate in a repeated game maps any possible history of observed action profiles into actions. In this design, however, participants' intended strategies are restricted so that actions can only be conditioned on their co-player's action in the previous stage. Some examples of familiar strategies that can be programmed are: unconditional cooperation (A-AAAA), tit-for-tat (A-ABCD), (forgiving) Nash reversion (A-ACCC), and always Nash (C-CCCC). Also strategies such as myopic best responses can be programmed. A well-known machine that cannot be programmed is grim-trigger. The ACCC-machine that comes closest implements a forgiving grim-trigger; that is, it reverts to cooperation if the opponent chooses to cooperate in some stage following a deviation.
While the focus is on simple strategies with few states and hence lower complexity, we allow participants to play other strategies as well if they have a strong preference to do so. In particular, in all treatments participants are allowed to take an action that differs from the one recommended by their machine. Such changes are referred to as one-shot deviations. Consequently, more general strategies, such as grim-trigger, become feasible to implement via one-shot deviations. Indeed, such a trigger strategy can be used to sustain collusion for all discount rates above the δ min calculated earlier for the standard repeated game (i.e. just action choices in each period). See Section B of the supplementary materials for details. To provide participants with an incentive 7 In continuous market games, the type of strategic interaction is determined by the second crossderivative of player i's payoff function with respect to the actions of i and −i. This type is one of complements (substitutes) if this cross-derivative is positive (negative). In our discretized versions, the positive (negative) cross-derivative for complements (substitutes) is reflected in the (myopic) best response to the collusive action being "close to" ("far from") the collusive action itself.
to program their machines (strategies) carefully, one-shot deviations are costly. Each one-shot deviation costs 3 ECU. Hence, we expect participants to rely mostly on unit recall strategies. However, if participants have strong enough preferences to choose another strategy, the full strategy space is available for participants in all treatments.
Revision opportunities There are no revision opportunities in the treatments labeled baseline. Here, participants keep their machines for the entire duration of the repeated game, and can only deviate from the recommendations of their machines via one-shot deviations. In the unilateral treatments strategy revisions are possible. Participants can modify their machines after any stage of the repeated game. To provide participants with an incentive to program their machines (strategies) carefully, machine modifications also have a small cost associated with them. In particular each machine modification costs 1 ECU, irrespective of the number of elements of the machine that are changed. Choosing the costs in such a manner we hoped to ensure that playing with a poorly programmed strategy is more costly in the baseline (where one needs to rely on one-shot deviations) than with unilateral revision opportunities (where machine changes are possible). Nonetheless, collusive equilibria can be supported for all discount rates above δ min using the same trigger strategy (A-ACCC) as in the baseline, which only relies on one-shot deviations to ensure permanent Nash reversion -see Section B.3 of the supplementary materials for further details. That behavior is not unduly affected by the small cost imposed on these deviations is also shown empirically using a hot variation, where these costs were set to zero (see Embrey et al., 2016, for details). Under the third variation -labeled bilateral -participants can modify their machines if and only if consent to a modification has been given by their opponent. This treatment is closer to the type of revision opportunities the renegotiation literature is concerned with.

Procedures
The experiment was conducted in the BEElab at Maastricht University during October-December 2011. 288 students were recruited using ORSEE (Greiner, 2015) and participated in one of the six treatments. 8 For each of our treatments we have 6 independent observations. During each session, three matching groups were run in parallel on sepa-rate z-Tree servers (Fischbacher, 2007). Sessions lasted an hour and a half on average, including a twenty minute instruction period. 9 On average participants earned between 12.60 and 15.30 Euro for their participation.
For each treatment six matching groups were run. Each matching group comprised eight participants that all played the repeated game (of the same treatment) ten times. At the beginning of a match, as a single repeated game is referred to, participants within a matching group were randomly paired. At the end of a session, participants were paid in cash according to the amount of ECUs they earned in one randomly drawn match. Table 1 gives the number of observations for each treatment.
Participants were fully informed about all details of the decision task, the environment and procedures in the experimental instructions (see Section C of the supplementary materials for an example of the instructions). Participants were never informed of the machine employed by other participants, but instead observed the history of play. That is, after every stage they were informed of their own action and the action of the person they were matched with, as well as the resulting payoffs.
For all members in a matching group, any given match consisted of the same number of stages, but this number changed across matches. Across matching groups this sequence of match-lengths differed. However, to facilitate comparison between treatments, the sequences were generated at random upfront and the same sequences were used for the different matching groups of each treatment. Table E.1 of the supplementary materials provides further details on the sequence of match lengths for the different matching groups. Table 1 provides a summary of the six treatments. In general participants had difficulty establishing more cooperative behavior, capturing on average less than 25% of the potential gains from cooperating in all treatments. The treatments with strategic complements provided both the least and the most cooperative behavior, with low levels of cooperation in the baseline and bilateral treatments and high levels in the unilateral treatment. In all treatments, participants incurred very low costs for deviating from or modifying their machines. One-shot deviations are observed in less than 11% of stage games. In the unilateral treatments, machine modifications were minimal (after less than 4% of stage games), while in the bilateral treatment, in which mutual agreement was required, such changes were rare (after less than 1% of stage games).

Results
, where the stage game payoff is averaged over all matches and all stages.
The majority of the subsequent analysis uses data from the last third of a session (matches 7-10). This sub-sample provides a reasonable trade-off between using the final matches, where subject behavior is most likely to have converged, and ensuring enough observations. Analysis explicitly on the evolution of behavior across matches, as well as some analysis that focuses on very particular histories for which there is the need to expand the sample size, uses data from the last two-thirds of a session (matches 4-10). In terms of stages we use only data from stages twelve or earlier. The reason is that later stages did not exist in each match for each matching group (see Table E.1).
The results are presented in three subsections. The first two deal with the impact of strategy revision opportunities on cooperation and the difference between strategic complements and substitutes; the third, analyses the intended strategies. All reported regressions and statistical tests use cluster-robust standard errors, corrected for arbitrary correlation at the matching-group level. This clustering is used to account for the dependencies generated by randomly re-matching subjects from the same matching group. Consequently, the statistical approach does not assume observations within a matching group to be independent; only those across matching-groups. 10,11 Table 2: Linear random-effects regression of payoff efficiency in the stage game.

Impact of revision opportunities on cooperation
We consider two measures of the cooperative nature of subjects' behavior. The primary measure is the implied efficiency of their choices. That is, the actual surplus generated over and above the one-shot Nash equilibrium as a percentage of the maximum available surplus. This measure aggregates the impact of all choices, including partial collusion and deviation choices. The secondary measure focusses on just the percent of (full) collusion choices by subjects. Figure 2 shows the evolution of efficiency both across matches and within matches. As can be seen, across matches (Panel (a)) revision opportunities do not seem to affect efficiency with strategic substitutes while there is a clear separation of the unilateral variation from the other two conditions with strategic complements. Within matches (Panel (b)), all treatments show the same pattern of decline of efficiency as a match continues.
A linear regression on efficiency is used to formally quantify the effect of strategy revision opportunities and to understand statistical significance. Table 2 reports the results of this analysis separately for strategic substitutes and complements. In columns (1) and (3)    impression that strategy revision opportunities have no impact on rates of collusion under strategic substitutes, but have a significant impact under strategic complements. With strategic complements, the unilateral variation is associated with significantly higher rates of collusion (column (3)), while the baseline and bilateral variation have statistically similar rates.
Columns (2) and (4), expand the specification to include the match number as a dependent variable. Doing so reveals that strategy revision opportunities also work through the dynamics across matches. This effect is most notable for (full) collusion choice, which is discussed below. These regressions also show significant effects for the bilateral variation in the treatments with strategic complements. Here the mean is about the same as in the baseline but the dynamics are quite different, as can be seen in column (4). 13 Figure 3 shows the evolution of collusion over time. Panel (a) shows this time trend across matches. With strategic substitutes (left panel), collusion is increasing over matches for the baseline and unilateral variations, while no clear trend is seen when bilateral consent is needed to modify machines. An obvious ranking of levels of revision opportunities, on the extent to which it generates collusion, cannot be made on the basis of this graph. With strategic complements (right panel), the graph illustrates a clear separation of treatments. Collusion rates are highest under unilateral. Under the baseline and bilateral variations, collusion rates are lower. There is a trend for collusion rates to increase over time in the unilateral treatment. However, no such trend is evident for the other treatments. Panel (b) of Figure 3 shows how the rates of collusion change within a match. It displays the typical pattern of collusion decreasing sharply after the first few stages, then remaining approximately constant at a lower level. A logit regression using choice of action A as the dependent variable can be found in Table E.4 of the supplementary materials and mirrors the results of Table 2.
The following summarizes our findings with respect to the impact of strategy revision opportunities on collusion.    picture emerges. Here, strategy revision opportunities do have an effect on observed levels of collusion, and the direction of this effect is in the opposite direction to that predicted by renegotiation theory. Strategic commitment is detrimental to collusion under strategic complements.

Difference between complements and substitutes
While the previous subsection concentrated on the effect of manipulating the level of strategy revision opportunities, it is clear that there is an interaction effect associated with the type of strategic interaction. With fewer or no revision opportunities -that is, under the baseline or bilateral variation -there is more collusion with strategic substitutes than with strategic complements. However, with revision opportunities -that is, under the unilateral variation -there is more collusion with strategic complements.
This strategic interaction effect is quantified using analogous regression specifications to those reported in Table 2, except the data from both game types are pooled and additional dependent variables are added -a strategic complements indicator interacted with the levels of revisions opportunities and stage and with the level of revision opportunities and match. Table 3 reports the results of this exercise. 14 These results confirm the overall message, with respect to the comparison across game types, given in Figures 2 and 3. Namely, there is a significant effect of the type of strategic interaction on the development of collusion across matches. Except for the bilateral variation, which is significantly different, the development of collusion within a match is comparable across game types.
Result 2 There is more collusion with strategic complements with revision opportunities (unilateral) and more collusion with strategic substitutes without revision opportunities (baseline).
The first part of Result 2 is reminiscent of a previous experimental finding by Potters and Suetens (2009). In a setting with revision opportunities, they found greater collusion under strategic complements than strategic substitutes in finitely-repeated duopoly markets. There are some similarities in the designs. In particular, we follow their example of keeping the best response correspondences constant across interaction types (up to relabeling) by using different (underlying) duopoly markets when constructing the stage-games. There are also important differences in the designs, over Notes: The baseline case is the strong commitment treatment with strategic substitutes. All regressions use data from matches 7-10 and stages 1-12 and include match-stage composition dummies. VCE clustered at the matching-group level. * * * 1%, * * 5%, * 10% significance.
and above repeating the stage games indefinitely. 15 Our primary interest in the role of revision opportunities, and thus dynamic response machines, lead us to use a smaller choice set in the stage-games: only four choices, rather than a much larger (discretised) price/quantity space. Along with changing the salience of important stage-game actions, and potentially learning dynamics, the reduced action space allows us to hold some key marginal incentives constant: π Collusion , π N ash , but also π Dev and the corresponding sucker payoff (see Section 2.1). By contrast, the latter payoff differs across game types in Potters and Suetens (2009). Given these differences, it is notable that the observation that strategic complementarity (with revision opportunities) leads to more collusion persists. However, our results add an important new insight to this existing finding, and indeed a dimension in which the prior result is not robust. While strategy revision opportunities have no effect on collusion under strategic substitutes, they have a significantly negative effect under strategic complements, and this latter effect is large enough to reverse the ranking of collusion between the interaction types. That is, without revision opportunities entirely the opposite result holds, and collusion is in fact greater under strategic substitutes.

Individual behavior
The previous subsections dealt with the impact of revision opportunities and the type of strategic interaction by assessing outcomes along the realized path of play. To understand further what drives these realized paths, this subsection analyses individuals' intended strategies. To this end, Table 4 gives the distribution of the machines programmed at the beginning of matches 7-10, along with a breakdown of the initial choices associated with each machine.
In principle there are 256 different machines -that is, the dynamic response part of the intended strategy -that participants could use. However, only seven types were used with a frequency of at least 5 percent in at least one of the treatments. 16 These prominent types of machine seem reasonable, with the majority corresponding to strategies that are commonly seen either in prior experimental studies or from the theory of repeated games. The first four -AAAA, ABC(C/D), ACCC and BBCCattempt to establish some collusion, either unconditionally or conditionally. 17 Of the the non-cooperative machines, there is the static Nash machine (CCCC), the myopic best response machine (DCCC under substitutes and BCCC under complements) and the punishing machine (DDDD). Finally, participants mostly choose the "intuitive" initial action to go with the machine they adopt. That is, A or B with the cooperative machines, C with the Nash machine, D/B or C with the myopic best response machine (under substitutes/complements), and D with the punishing machine.
With respect to revision opportunities, the treatment comparisons in Table 4 mirror those seen in the data from the realized path of play. Under strategic substitutes there is no consistent effect of revision opportunities on intended strategies. In particular, there is no significant difference between the baseline treatment and the unilateral treatment in the likelihood of programming a collusive dynamic response -that is, either responding cooperatively to the other player having played A (40% versus 35%; p-value = 0.520) or to the other player having played B (28% versus 29%; p-value = 16 20-36 percent of programmed machines do not fall into one of the seven prominent categories. Table E.6 in the supplementary materials decomposes this "Other" category to show that the seven prominent machines give a fair characterization: many machines categorized as "Other" in Table 4 are minor deviations from these. Using this decomposition, the "Other" category drops to 6-19 percent. 17 AAAA is unconditional cooperation, ABCD is tit-for-tat or also imitation (Apesteguia et al., 2007), ACCC is Nash reversion and BBCC could be interpreted as cautious or partial cooperation. Notes: Distribution of prominent machines programmed at the beginning of matches 7-10 (in bold), along with a breakdown of the prominent initial choices associated with each machine (in italics). Machine combinations that were used with a frequency below 5 percent in every treatment are categorized as "Other". A cooperative response to A is any machine that chooses A in response A; a cooperative response to B is any that chooses A or B in response to B; a punishing response to the deviation action is any that chooses D in response to D under substitutes and D in response to B under complements.
0.785). 18 Indeed the only significant difference under substitutes is that subjects are more likely to respond cooperatively to action A with bilateral revision opportunities than unilateral ones. However, it seems this more collusive response under the bilateral is counter-balanced by a greater use of punishing responses, especially after the other player has played the deviation action (although in isolation this difference between the bilateral and the unilateral, 37% versus 23%, is not significantly different, p-value = 0.125).
In contrast, revision opportunities have a significant and positive effect on the cooperativeness of intended strategies under strategic complements. In the unilateral treatment, subjects are significantly more likely to respond in a collusive manner after the other player chose A than in both the baseline (52% versus 33%; p-value = 0.019) and the bilateral (52% versus 30%; p-value < 0.001) treatments. In addition, subjects are also more likely to respond with a collusive action after action B in the unilateral than in the baseline treatment (39% versus 22%; p-value = 0.012).
Result 3 (i) With strategic substitutes, strategy revision opportunities have no clear affect on intended strategies. (ii) With strategic complements, strategy revision opportunities lead to an increase in the use of more cooperative dynamic responses.
Consequently, Result 3 provides the intended-strategy analogue of Result 1. 19 The data on intended strategies also adds further details to the interaction-type contrast reported in Result 2. With revision opportunities, dynamic responses are significantly more collusive in response to action A under strategic complements than substitutes (52% versus 35%; p-value = 0.005), and marginally so in response to action B (39% versus 29%, p-value = 0.082). 20 Without revision opportunities, it is no longer the case that strategic complements are more collusive, with dynamic responses generally being more cooperative under substitutes (although in the baseline and bilateral treatments most of these across interaction-type differences are not statistically significant). The use of dynamic responses that respond to full collusion with the one-shot deviation action is consistently of the order of 30-40%, irrespective of interaction type or revision opportunities. However, the response to the use of this deviation action is markedly different across the interaction type. For all variations of revision opportunities, the 19 As seen in Table 4, intended strategies are generally composed of the "intuitive" initial choice to go with the chosen machine. Furthermore, as seen in Figures 2 and 3, stage-one choices and efficiency respond strongly to the change in revision opportunities under strategic complements. These observations lead to the question of whether this affect of revision opportunities is driven entirely by initial choice. Since our experimental design collects data on both the realized path of play and intended strategies, it is possible to analyse the role of such path dependency. This analysis is reported in Section D of the supplementary materials (see Mengel and Peeters, 2011, for an earlier example of such analysis). In summary, both initial choices and dynamic responses appear to be important. 20 These p-values are based on a linear random-effects regression on interaction-type treatment dummies. See Table E.11 of the supplementary materials for a complete set of pairwise tests for the four summary statistics reported at the bottom of Table 4, as well as the p-values obtained via a non-parametric test on matching-group averages that were run as a robustness check.
use of action D under substitutes is significantly more likely to result in a punishing (action D) response than the use of action B under complements, where it is in fact quite rare.
In summary, strategic complementarity appears to induce more collusive outcomes when players have the opportunity to revise their initial intended strategies for two reasons. The first, and most direct effect, is that intended strategies are more collusive, both in terms of more efficient initial choices and more cooperative dynamic response. However, there is a second reinforcing effect in that the response to the initial action from the myopic best response strategy is completely different -the response to action B under strategic complements and action D under strategic substitutes. Under complements, a participant playing the myopic best response is still quite likely to iterate to at least a partially cooperative outcome; under substitutes, such a strategy most likely instigates a spiral towards a Nash or punishment outcome. Without revision opportunities, all these mechanisms (initial choices and cooperative responses to collusion and deviation) are significantly reduced under complements, but are for the most part unaffected under substitutes.

Discussion
Our primary result is that, while strategy revision opportunities have no effect on collusion under strategic substitutes, they have a significant positive effect under strategic complements. What could explain the role of revision opportunities, given that the standard prediction from game theory suggests it should play no role? In what follows, we discuss two popular concepts from the theoretical and experimental literature on repeated games (see, for example, Farrell and Maskin, 1989;Fonseca and Normann, 2012;Blonski et al., 2011;Dal Bó and Fréchette, 2011a). Neither will be able to provide a satisfactory explanation for our results. We define a notion of fear of miscoordination that yields predictions in line with the observed effect of changing the availability of revision opportunities.

Renegotiation
The observed ranking of collusion rates across treatments goes against the intuition delivered by the renegotiation literature. In particular, with fewer revision opportunities, and hence reduced concerns for renegotiation, we observe less collusion under strategic complements.
Although renegotiation should never happen in equilibrium -whether collusion is weak renegotiation proof or not -it is reasonable to expect the strategic forces that drive the concept would need to be learnt by experience. Consequently, there is still the possibility that subjects engaged in something like renegotiation, but that such efforts did not feed back into reduced collusion at the beginning of a match. We can look at our data from the bilateral treatment to see whether many attempts to "renegotiate" were made. Bilateral modifications take place very rarely. In particular, during matches 7-10 there were only 6 bilateral deviations (out of over 1200 interactions) with strategic substitutes and 9 with strategic complements from respectively 5 and 7 machines. Consequently, there are few instances where participants succeeded in coordinating on a mutual modification of their machines.
Nonetheless, the data collected on strategic decisions throughout the experiment allows some analysis regarding which paths are "renegotiated", and if so, how. To do so, we study when and how machines are (attempted to be) modified conditional on the last outcome of the realized path of play. We classify paths into three categories: (i) "failed collusion" (outcomes (B,A) and (A,B)), (ii) "miscoordination" (from the perspective of a cooperative agent; outcomes (A,C), (A,D), (B,C) and (B,D)) and (iii) "punishment paths" (outcomes (C,D), (D,C) and (D,D)). After a "miscoordination" stage, participants mostly try to modify cooperative machines into more punishing machines. Along "punishment paths", participants mostly (want to) modify non-cooperative machines, but rarely modify them into more collusive machines. Renegotiation theory, though, would say that participants modify punishing machines into more cooperative machines that allow them to leave a punishment stage. 21 Hence, in addition to the treatment comparisons, there is also no direct evidence that participants engaged in something like renegotiation.

Risk-dominance
Given the large number of possible equilibria in these indefinitely repeated games, it seems intuitive that considerations of renegotiation might be overshadowed by concerns of coordination on one of the different equilibria. Hence, it seems reasonable to look at risk-dominance, since it gives a role for a fear of equilibrium miscoordination. As discussed in Dal Bó and Fréchette (2011a), extending the idea of risk-dominance to infinitely repeated games poses a number of difficulties, even with only two actions for each player. These difficulties include extending the definition to repeated-game strategies and the issue that two repeated-game strategies can generate equivalent outcome paths. To these difficulties, our environment also adds the issue of extending the definition to more action choices in the stage game. Blonski et al. (2011) consider an extension of the concept to the repeated prisoners' dilemma that involves only the strategies permanent Nash reversion and static Nash. Translating this approach to our environment results in the prediction that Nash reversion, with an initial choice of cooperation, is the risk-dominant strategy for both types of strategic interaction and all levels of strategy revision opportunities. 22 In what follows, we discuss how a definition of fear of miscoordination, which does not restrict itself to equilibrium miscoordination, can accommodate the observed behavior.

Fear of miscoordination
Since neither renegotiation nor risk dominance considerations are in line with our results, we consider a refinement based on a notion of "fear of miscoordination". Given the large number of possible equilibria in indefinitely repeated games, fear of miscoordination seems a particularly relevant concern and its effect may overshadow any potential effect of renegotiation. Intuitively, players will be less concerned about miscoordinating if they have the possibility to revise their strategy during the course of play. Hence fear of miscoordination delivers exactly the opposite intuition compared to renegotiation in terms of how revision opportunities should affect collusion.
We formalize this idea as follows. For a player i using repeated game strategy s * i , the maximal regret possible for this strategy is given by F N R is constructed as the difference between the payoffs i expects when choosing s * i (and expecting her opponent to do the same). That is, Π i (s * i , s * i ) and the worst possible payoff that she could obtain by choosing this strategy, namely min s −i Π i (s * i , s −i ). Notice that this definition is formulated from the perspective of a symmetric equilibrium, 22 When focussing only on the Nash reversion and static Nash strategies, there is only one difference between the strategic complement and strategic substitute games: the (A,C) payoff in the complements game is lower than that in the substitutes game (14 rather than 25). With a discount rate of 7/8 and the uniform prior as the belief of the opponent's strategy, this difference is too small to result in different predictions for the risk-dominance concept. One could consider allowing for a non-uniform prior. However, only a relatively small range of beliefs would result in the static Nash machine being selected in the complements game, whereas the Nash reversion machine is selected in the substitutes game. The weight on the opponent choosing the static Nash machine would need to be at least 77% and no more than 88%. There is no support in the data for such a range of beliefs. which is sufficient for the purpose of this study, but not crucial for the result we will state below. 23,24 Equation (1) describes the fear of miscoordination in cases where strategy revision is not possible.
To define fear of miscoordination if strategy revision is possible, define a sequence of pure strategies (s τ i ) τ =0,...,∞ with the interpretation that at t the action prescribed by s t i given the history induced by (s τ i ) τ =0,...,t−1 is chosen. Note that any fixed sequence of strategies can simply be expressed as (a potentially very complex) strategy itself and that constant sequences are possible as well. Denote by s a pair of such sequences s = ((s τ 1 ) τ =0,...,∞ , (s τ 2 ) τ =0,...,∞ ) and by Π i (s) player i's discounted average payoff under s. Fear of miscoordination F can then be defined as Of course the question arises why a player would ever want to revise their strategy. After all, any situation under which a player would want to revise can be encoded in initial strategies as we have discussed in the introduction (Kuhn, 1953). Such strategies may be relatively complex, though, in the sense of being automata involving many states. Now, for any arbitrarily small but fixed cost of using automata with more states agents will not want to use additional states on encoding revisions for zero probability events (histories that they expect not to be reached). Table 5 shows the level of fear of miscoordination for all prominent strategies and all our main treatments, where "prominent strategies" are those that are used in at least 5% of the cases in at least one treatment. As can be seen, fear of miscoordination is higher if there are fewer revision opportunities, and is higher for collusive strategies with complements than with substitutes. Fear of miscoordination hence creates a wedge in the relative attractiveness of collusive strategies between the two types of strategic interaction. 25 In our results, participants do indeed use more cooperative machines in 23 Defining this notion for all equilibria would require knowledge about which equilibria can be supported by any given strategy.
24 Chassang (2010) has used fear of miscoordination in a weaker sense. He studies dynamic global games and refers to fear of miscoordination as the possibility of miscoordination arising from a lack of common knowledge and in particular arbitrarily small amounts of private knowledge. His characterization of sequentially rationalizable strategies is related to risk dominance in the one-shot game and thus quite different from ours. 25 The intuition is as follows. Π i ((s * i ), (s * i )) is the same for both of our games, by construction. min (s −i ) Π i ((s * i ), (s −i )), however, will tend to be lower under complements. The reason lies in the sign of the cross-derivative. Note that the location of the collusive action as well as the size of the action set is the same in both our games. Since the sign of the cross derivative is positive under complements, payoffs will decrease more rapidly when moving away from collusion under complements compared to substitutes.
Notes: Maximal regret is obtained if the opponent plays D-DDDD. The second column labeled "target" is the outcome on which the strategy aims to coordinate on. The maximal regret is computed relative to this target.
the unilateral variation for strategic complements compared to the baseline variation (52% versus 33%; p-value = 0.019, see Table E.10). It is between these two cases where we observe the biggest difference in fear of miscoordination in Table 5. Thus, fear of miscoordination can explain why revision opportunities increase cooperation, and the incidence of cooperative strategies, under strategic complements, but are less salient under strategic substitutes.

Conclusion
We have studied the effect of strategy revision opportunities on collusion in infinitely repeated games and found that, while revision opportunities do not affect collusion with strategic substitutes, they have a positive effect with strategic complements. The latter effect is strong enough that, although there is more collusion with substitutes if there are few or no revision opportunities, there is more collusion with complements if there are revision opportunities. These results provide insights into collusive behavior that have implications for applied questions, such as the optimal design of antitrust regulation. We find that revision opportunities can facilitate collusion in markets where competition exhibits strategic complementarity. In this case, an increase in cartel detection activity -which would complicate the process of renegotiation and hence make strategy revision harder -would have an additional preventative effect on collusion. In contrast, such detection efforts would only have the usual screening effect on cartels operating in a market of strategic substitutes. Finally, the implications of controlling the boundaries of strategic oversight within a corporation are only significant, from a repeated-interaction perspective, if firms' strategies can be characterized as complements.

A Stage game details
This section provides further details of the two differentiated-goods linear duopoly markets that underlie the two stage games implemented in the experiment. To provide a natural duopoly analogue for the strategic substitutes matrix, a discretized version of a discretized version of a differentiated-goods linear Cournot duopoly was used; for the strategic complements matrix, a discretized version of a differentiated-goods linear Bertrand duopoly. Note that, in order to ensure the incentives to cooperate are balanced across the games it was necessary to choose different demand systems under price competition and under quantity competition. We started with the continuous strategy space version of the games and calibrated the parameters so that the payoffs from three key outcomes were constant across the two duopoly markets: 26 the Nash equilibrium payoffs, the joint payoff maximizing payoffs and the optimal deviation against the co-player playing the joint payoff maximizing action. In each matrix, action A corresponds to the joint profit maximizing quantity/price, and action C the Nash equilibrium quantity/price. In the substitutes game, action D corresponds to the optimal deviation to the other player choosing the joint payoff maximizing action, while in the complements game this is action B. To complete the action choices, action B in the substitutes game corresponds to the quantity in which, if both players chose this quantity, the payoff would be the same as the payoff in the Bertrand game when both players choose the optimal deviation price -i.e. the payoff when both players choose action B in the complements game. An analogous calculation is used to find the price that corresponds to action D in the complements game.
This calibration and selection procedure lead to the following duopoly games:   To implement the stage games in the laboratory all the payoffs were first rounded to the nearest integer. After rounding, some payoffs were increased or decreased by one unit in order to avoid degeneracies that are caused by rounding. This is done in such a way that games become even more similar: for instance, this led to the box formed by actions B and C and that formed by actions C and D being identical across games. The implemented stage games are repeated in Figure A.  The crucial difference between the two games is the location of the optimal deviation against the co-player playing the joint payoff maximizing action, which is action B with strategic complements and action D with strategic substitutes. In games of strategic complements as my opponent "increases" her action, I would like to do the same. Consequently, the optimal deviation action is located between the collusive action (A) and the Nash action (C) in the complements game, whereas it is located beyond the Nash action in the substitutes game, where I would like to respond to an "increase" in my opponent's action by a "decrease" myself. 27 This difference in the location of these 27 In continuous market games, the type of strategic interaction is determined by the second crossderivative of player i's payoff function with respect to the actions of i and −i. This type is one of complements (substitutes) if this cross-derivative is positive (negative). In our discretized versions, the positive (negative) cross-derivative for complements (substitutes) is reflected in the (myopic) best response to the collusive action being "close to" ("far from") the collusive action itself.
actions is the primary difference between the games; a difference that will prove to have a significant interaction with the level of strategy revision opportunities. For convenience, we will refer to the actions A, B, C and D as respectively Collusion, Dev.SC, Nash and Dev.SS. Note that our games are designed such that, theoretically, collusion can be sustained as an equilibrium for both game types in all treatment variations. The necessary and sufficient conditions on discount factors (continuation probabilities) for trigger strategies to support collusion are identical. 28 This is a consequence of the restrictions imposed when designing the games, namely that the joint payoff maximizing payoffs, Nash payoffs and optimal deviation payoffs are the same in both games. 28 The same is also true for other collusive strategies, such as tit-for-tat. While such strategies are not subgame pefect, they can be implemented without one-shot deviations or machine changes.

B Extending the Trigger Strategy to the Elicitation Setting
This section considers formally extending the grim trigger strategy (i.e. permanent reversion to Nash) to the initial-choice-plus-dynamic-response elicitation setting of the experiment. The purpose is to illustrate that eliciting dynamic responses, and including small costs for one-shot deviations from recommended action choices, does not fundamentally change this standard strategy -one that is commonly used in the theory of repeated games to show that cooperation can be sustained as part of a subgame-perfect equilibrium. Indeed, the minimum discount factor needed to sustain cooperation using this natural extension of the grim trigger does not change. Furthermore, allowing machine changes, as in the unilateral and bilateral variations, does not change this conclusion. The same extension of the grim trigger supports collusion with the same minimum discount factor as in both the baseline experiment and the standard repeatedgame setting. Thus, machine changes in the unilateral and bilateral treatments are not necessary to implement collusive strategies.

B.1 Extending the Set-up to the Elicitation Setting
In the baseline experiment, subjects are asked in the first round to choose an initial action and a dynamic response vector, where the latter determines their recommended action in all future rounds as a function of their partner's choice in the previous round. In all rounds after the first one, subjects must choose an action for that round with all choices that do not correspond to their recommended action incurring a cost of 3 ECUs. Consequently, their action sets are in the first round and A t = {A, B, C, D} for all t > 1. 29 In the repeated game, the history at round t ≥ 1 is a list of all the choice pairs, a s i , a s −i , that player i and their match, player −i, have made for all rounds s < t, with the understanding that all histories at round 1 are empty (null). Consequently, a strategy in the repeated game must specify an initial choice and dynamic response vector, plus an action choice for any possible history with t > 1.
Let r t i be the recommended strategy for player i in round t. Then the payoff from period t is given by the stage game outcome minus any one-shot deviation costs: where I (a t i = r t i ) takes value one if the action choice is different to the recommended choice, and zero otherwise. In the experiment c one = 3. Payoffs from the repeated game are evaluated at time t ≥ 1 using the discounted sum The natural way to define the grim trigger strategy in this setting is as follows: S GT is the strategy that specifies Note that the last row implies that player i will pay c one in the event that the recommended strategy is something other than C. This situation will only arise if either player chose something other than A in a round before the previous round, but the other player chose A in the previous round. The strategy says that the player i will pay c one to play C instead of the recommended A in such a circumstance.

B.2 Incentive Compatibility in the Baseline Treatment
Next we check whether the pair S GT , S GT can form a sub-game perfect Nash equilibrium of the repeated game. In doing so, we will demonstrate that the minimum discount rate for cooperation to be sustainable using the grim trigger strategy is the same for the initial choice plus dynamic response game as it is for the standard repeated game. Suppose player −i is playing according to S GT . Then we will show that playing S GT is a best response for player i using the standard procedure of checking single deviations. Given the nature of the elicitation setting, incentive compatibility in round one needs to be checked separately from incentive compatibility in later rounds. This is because, the optimal deviation in round one can involve not just changing the current choice, but also changing the dynamic response vector. For later rounds, the only difference to the standard setting is that the specified choice, or the optimal deviation, might involve paying the small one-shot deviation cost of going against the recommended strategy.
Step 1: Incentive compatibility in round one (t = 1). Given −i plays according to S GT , the optimal deviation is to choose D − CCCC in the substitutes game, and B − CCCC in the complements game. This gives a deviation payoff of Π 1 i (dev) = π dev + δ π N ash + δπ N ash + ...
Continuing with the strategy S GT gives Consequently, for S GT , S GT to be part of an sub-game perfect Nash equilibrium it must be that δ π JP M − π N ash ≥ (1 − δ) π dev − π JP M Solving for the smallest δ that just makes the above inequality hold with equality is the same calculation for the minimum discount rate -denoted δ min -as in the usual repeated game environment.
Step 2: Incentive compatibility in later rounds (t > 1). This step is split into two cases that essentially correspond to a reward path and a punishment path.
• Reward path: Suppose the history is such that a s i , a s −i = (A, A) for all s < t. Then the recommendation for both players will be r t i = r t −i = A. The optimal deviation for player i is to pay c one and play D in the substitutes game and B in the complements game. Thus, for incentive compatibility it must be that π dev + δ π N ash + δπ N ash + ... − c one ≤ π JP M + δ π JP M + δπ JP M + ...
That is, it must be that δ π JP M − π N ash ≥ (1 − δ) π dev − c one − π JP M Given the additional cost c one , this inequality holds strictly for any δ ≥ δ min .
• Punishment path: Suppose the history is such that a s i , a s −i = (A, A) for some s < t. Here we need to check that it is incentive compatible for player i to play C. The punishment path has two sub-cases to consider that depend on whether the recommended action is also to play C or not. In either case, player i knows that a t −i = C and a t i , a t −i = (C, C) for t > t, since player −i is following S GT and player i is only considering a single deviation today from S GT .
1. Suppose r t i = C. Given that C is the best response to C in the one-shot game and there is no possibility to change the future outcomes to anything other than repeated play of the one-shot Nash equilibrium, a t i = C is player i's best-response for any δ.

Suppose r
Here we are essentially checking whether it is worth paying c one to play C, as required by S GT : π i r t i , C + δ π N ash + δπ N ash + ... ≤ π N ash + δ π N ash + δπ N ash + ... − c one That is, it must be that c one ≤ π N ash − π i r t i , C Given that the only recommendation under S GT other than C is A, this holds for any δ since π i (A, C) − π N ash > c one .
In summary, moving from the standard repeated-game set up to the initial-choiceplus-dynamic-response set up introduces two changes into the incentive compatibility check: 1. Incentive compatibility needs to be checked separately for the initial round and for later rounds.
2. On the punishment path, players must be willing to play the Nash action even if they need to pay the one-shot deviation cost to do so.
The former has no implications for the minimal discount factor needed to sustain cooperation using the grim trigger. The latter introduces an additional condition that requires the one-shot deviation cost be not too large; a condition that is unrelated to the discount factor and is met in the stage games we implement in the experiments.

B.3 Accommodating Dynamic Response Changes
In the unilateral and bilateral treatments, subjects (may) have the opportunity to change their dynamic response during a match. Such changes take effect from the next period onwards (via potentially giving a different recommended action) and come at a small cost, c mach . Consequently, to extend the above analysis to include this case, it is necessary to include in the strategy what dynamic response the player will specify, if they have the opportunity to change their machine, and whether they would permit the other to change their machine in the bilateral treatment. The per period payoff function becomes: The grim trigger strategy is extended to the unilateral and bilateral settings as follows: Note that this strategy implies that the player will not implement a machine change even if given the opportunity to do so. Furthermore, the player is indifferent between allowing or not allowing the other player to make a strategy change in the bilateral treatment.
Checking incentive compatibility for the above versions of S GT follows very similar lines as that shown for the baseline case. As before incentive compatibility should be checked separately for the initial choices, where there are no one-shot or machinechange costs, and for all later periods. Again, it will be the initial choices that provide the binding constraints that define δ min .
The following observations ensure that the basic logic from the baseline case carries over to the unilateral and bilateral cases: • On the initial or reward path, if δ ≥ δ min then π JP M , π JP M , ... is preferred π dev , π N ash , ... . This is the case both today, when considering a t = 1 deviation in action or paying c one to deviate in action from a recommendation for t > 1, and tomorrow, when considering a t = 1 deviation in the dynamic response or paying c mach to deviate in dynamic response at t > 1. Furthermore, this holds for any cost of one-shot or machine-change cost, as long as the costs are greater than or equal to zero.
• On the punishment path, if the other player is playing the S GT strategy and we are only considering deviations today from the S GT strategy, then there is no reason to pay c mach > 0 to switch the dynamic response for tomorrow from ACCC to CCCC if the action C is being played today. This is because the S GT strategies from tomorrow onwards will ensure that C is played in all future periods. Furthermore, as long as c one ≤ π N ash − π i (r t i , C) when r t i = C, it is preferable to pay the one-shot cost today to avoid the sucker payment.
In summary, essentially the same grim trigger strategy can be used to support collusion in the unilateral and bilateral treatments, using the same minimum discount rate and without the need to change the dynamic response during a match.

C Example instructions: Baseline -strategic complements
Part 1 Welcome!
You are about to participate in a session on interactive decision-making. Thank you for agreeing to take part. The session should last 90 to 120 minutes.
You should have already turned off all mobile phones, smart phones, mp3 players and all such devices by now. If not, please do so immediately. These devices must remain switched off throughout the session. Place them in your bag or on the floor besides you. Do not have them in your pocket or on the table in front of you.
The entire session, including all interaction between you and other participants, will take place through the computer. You are not allowed to talk or to communicate with other participants in any other way during the session.
You are asked to abide by these rules throughout the session. Should you fail to do so, we will have to exclude you from this (and future) session(s) and you will not receive any compensation for this session.
We will start with a brief instruction period. Please read these instructions carefully.
They are identical for all participants in this session with whom you will interact. If you have any questions about these instructions or at any other time during the experiment, then please raise your hand. One of the experimenters will come to answer your question.

Compensation for participation in this session
In addition to the 3 participation fee, what you will earn from this session will depend on your decisions, the decisions of others and chance. In the instructions and all decision tasks that follow, payoffs are reported in Experimental Currency Units (ECUs). At the end of the experiment, the total amount you have earned will be converted into Euros using the following conversion rate: 1 ECU = 4 Eurocents.
The payment takes place in cash at the end of the experiment. Your decisions in the experiment will remain anonymous.

General instructions
The session is structured as follows: 1. This session consists of 10 matches. At the beginning of each match, you will be randomly paired with another participant.
2. During the match, you will interact repeatedly with this same participant for a number of rounds.
3. The number of rounds is randomly determined. After each round, there is an 87.5% chance that the match will continue for at least another round. This is as if we were to roll an 8-sided die and end if the number 1 came up and continue if 2 through 8 came up. Notice that, if you are in round 2, the probability that there will be a third round is 87.5% and if you are in round 9, the probability that there will be a tenth round is also 87.5%. That is, at any point in the match, the probability that there will be at least one more round is 87.5%. This means that, in expectation, another 8 rounds will follow, irrespective of the number of rounds you have just completed.
4. Once a match ends, you will be matched with a randomly drawn participant for the next match.
Description of a match 5. During a match you will repeatedly interact with the same participant for a number of rounds. Each round consists of the same decision situation.
6. In this decision situation, you will be asked to choose an action. There are four possible actions: A, B, C or D. The participant you are matched with will also be asked to choose an action. The set of possible actions to choose from is the same for both of you.  43,43 23,54 14,52 7,47 Your B 54,23 36,36 32,40 28,37 action C 52,14 40,32 33,33 31,32 D 47,7 37,28 32,31 30,30 8. To summarize, in a match you interact repeatedly with the same participant for an unknown number of rounds in the decision situation described above. As described in point 3 above, after every round, there is a 87.5% chance of another round in this match.

Your decisions (How actions are chosen)
At the beginning of a match 9. At the very beginning of every match, you will be asked to specify your initial action and to provide a plan of intended actions. The initial action is the action you choose in the first round of this match. The plan of intended actions determines for each subsequent round which action you intend to choose in response to each possible action choice of the other participant in the previous round.
10. The table below presents an example of a plan of intended actions, as it will be visualized on your screen. In this example, the plan prescribes you to take action D in all rounds immediately following one in which the other participant has taken action a (action D is checked in column a). In periods immediately following one in which the other participant has chosen action b, the plan prescribes you to take action B (action B is checked in column b) and so forth.
Notice that the table above is just one example of a plan. In the experiment you will be asked to design your own plan. 11. Since it will be costly (see point 15 below) to choose an action different from the one prescribed by your intended plan of action, you are advised to think carefully about how to design your plan.
12. Once you and the participant you are matched with have made your choice of initial action and plan of intended actions, the first round of the sequence of decision situations described above will begin.
During round 1 13. In the first round, your action choice will be the initial action you just chose.

During later rounds
14. At the beginning of any subsequent round you will be told the prescribed action from your plan of intended actions.
15. You will then be asked to choose your action for the current round. It is possible to choose an action different from the one prescribed by your plan of intended actions. However, doing so will cost 3 ECU. Note also that you will need to select this action and click on the "OK" button within the time limit shown on your screen; otherwise your prescribed action will be chosen.
At the end of each round 16. At the end of each round you will receive feedback on your action chosen, the action chosen by the other participant and your payoffs as well as about any costs incurred for deviating from the plan of intended actions.
The end of the session 17. After a match is finished, you will be randomly paired for a new match. This session consists of 10 such matches.
18. In each of the 10 matches, your payoff starts at 0 and from there accumulates until the end of your match. At the end of the session -after the tenth matchone match will be selected at random. The payoff you gained during the selected match will be used to calculate your final payoff.

Control Questions
Please read through the following and answer the questions. When you have finished answering these questions, please raise your hand.
Assume you specified action A as initial action and the following plan of intended actions: Suppose that the other participant chooses action b in the first round.
1. What is your payoff in the first round?
2. What is the other participant's payoff in the first round?
3. Which action does your plan prescribe you to choose in the second round?
Assume that you choose the prescribed action in the second round. Suppose that the other participant chooses action d in the second round.
4. What is your payoff in the second round? 5. Which action will you be prescribed to choose in the third round?

True or False?
Please answer whether the following statements are true or false: 6. The longer a match has been going on the more likely it is to end. 7. Each round I can choose the action I want.
8. I can modify my plan of intended action after each round within a match. 9. I am matched with the same participant during the entire session.
10. I am matched with the same participant during each match.
3. In any round (except for the first round), you can either choose the prescribed action or choose another action. Choosing an action which is different from your prescribed action has a cost of 3 ECU.
4. The length of a match is randomly determined. After each round, there is an 87.5 % chance that the match will continue for at least one more round. You will play with the same person for the entire match.
5. After a match is finished, you will be randomly paired for a new match. This session consists of 10 such matches.

D Controlling for path dependency
In order to isolate the impact of the responsive part of the strategies from initial choices, we compute the invariant distribution over actions of the Markov chain as specified by the dynamic responses. The invariant distribution tells us with which probabilities the actions will be chosen in the long run if play continues as it is observed in our experimental sessions, regardless of the initial choices. We compute these invariant distributions using the transition probabilities as defined by (1) the average initial machine, (2) the average over chosen actions in response to rival's actions, and (3) the average chosen actions in response to outcomes. Note that by taking averages over the average behavior of an individual in a match (of the last four matches only), we disregard the heterogeneity that is clearly present in our data (which is more pronounced between individuals than between matches). Although, in principle it is better to properly control for heterogeneity in behavior, it is certainly computationally more involved and, we believe, not needed for our illustrative purposes. The invariant distributions obtained for the different treatments using the three different specifications of the Markov chain are presented in Table D.1. The regressions presented in Tables 2 and E.4 (see also Figures 2(a) and 3(a)) show that, under strategic complements, a higher level of collusion is obtained when unilateral modifications of the dynamic response are allowed. The invariant distributions show a consistent picture and that this effect is not only caused by initial choices (see also Figures 2(b) and 3(b)).
Apart from Complements-Unilateral, the weight on the top two actions (A and B) is larger and that on the lower two actions (C and D) is smaller using the realized actions than using the initial machines. This observation indicates that one-shot deviations and machine modifications are mainly used to escape profit eroding states in favor for collusive ones. Table D.2 presents the persistence of the collusive outcome, the Nash outcome, the top two actions and the lower two actions. For the single state outcomes the persistence is the probability that the state is not left in the next stage given that it is reached. For the sets of states these probabilities are weighted by the invariant distribution of the Markov process over outcomes (to properly account for all states within a set not being more likely being reached). Again, we derive the probabilities in the three alternative ways as before.  It is apparent that the persistence probabilities of the collusive outcome are substantially larger when applying the data of how outcomes are translated into actions. This shows us that one-shot deviations and dynamic response modifications are used to sustain cooperation. Application of outcome to action data also increases the persistence probabilities of the static Nash outcome when dynamic responses can be revised, but not when only one-shot deviations are possible. This indicates that, in a setting with more revision opportunities, and hence better possibilities to renegotiate, punishments are sustained by means of one-shot deviations and dynamic response modifications. If anything, this provides evidence against the idea of renegotiation playing an important role.      0.36 * * * 0.32 * * * 0.24 * * * 0.18 * * * 0.12 * * * 0.17 * * * 0.12 * * * 0.16 * * * (0.000) (0.000) (0.000) (0.000) (0.006) (0.007) (0.001) (0.000) Matches 7-10 7-10 7-10 7-10 7-10 7-10 7-10 7-10 Stages 1-12 1-12 1-12 1-12 1-12 1-12 Notes: The baseline case is the baseline treatment. Specifications 2 and 4 are the same as the reported specifications, 1 and 3, but without the match-stage composition dummies. * * * 1%, * * 5%, * 10% significance. Notes: The baseline case is the baseline treatment. Specifications 2 and 3 are the same as the reported specification, 1, except use the final two-thirds and the middle third sub-samples. VCE clustered at the matching-group level. * * * 1%, * * 5% , * 10% significance.  Notes: The baseline case is the baseline treatment. All regressions use data from matches 7-10 and stages 1-12 and include match-stage composition dummies. VCE clustered at the matching-group level. * * * 1%, * * 5%, * 10% significance. Notes: The baseline case is the baseline treatment with strategic substitutes. All regressions use data from matches 7-10 and stages 1-12 and include match-stage composition dummies. VCE clustered at the matching-group level. * * * 1%, * * 5%, * 10% significance. Notes: Distribution of machines in initial stages of matches 7-10 across treatments. For each category mentioned, machines (i) with Hamming distance at most 1 from the machine mentioned, (ii) that are not named explicitly in the second column (e.g. CCCC does not count towards DCCC) and (iii) do respond to collusion with the same action as the machine mentioned (e.g. ADDD does not count towards DDDD) are counted. Each machine is counted once and in case multiple categories apply, they are counted with equal weight in these categories. The category "Other" includes all machines satisfying none of the properties above.  Notes: The table uses data for matches 7-10. The statistics on the diagonal report the average efficiency for the respective treatment. The statistics in the upper diagonal give p-values for tests of the difference between the two treatments. The first p-value is based on a linear random-effects regression. The second p-value is the result of a ranksum non-parametric test on matching group averages. The first three columns replicate the result that there is no effect of revision opportunities under substitutes. The last three columns replicate the result that collusion is significantly higher with unilateral revision opportunities under complements. Notes: The table uses data for matches 7-10. The statistics in the "p-value" columns give p-values for tests of the difference between the two treatments. The first p-value is based on a linear random-effects regression on interaction-type treatment dummies. The second p-value is the result of a ranksum non-parametric test on matching group averages. The first three columns show that, with no revision opportunities, there is significantly more collusion, at least in first-stage choices, under substitutes than under complements. Notes: The table uses data for matches 7-10. The statistics on the diagonal report the average percentage of intended strategies for the respective treatment. The statistics in the upper diagonal give p-values for tests of the difference between the two treatments. The first p-value is based on a linear random-effects regression on revision-opportunities treatment dummies. The second p-value is the result of a ranksum non-parametric test on matching group averages. Notes: The table uses data for matches 7-10. The statistics in the "p-value" columns give p-values for tests of the difference between the two treatments. The first p-value is based on a linear random-effects regression on interaction-type treatment dummies. The second p-value is the result of a ranksum nonparametric test on matching group averages.