Self-enforcing cooperation via strategic investment

We investigate how, in a situation with two players in which noncooperation is the only equilibrium, cooperation can be achieved via costly investment. We find that in the resulting equilibria, cooperation is an all-or-nothing outcome, that is, either there is full cooperation by both players, or no cooperation at all. The cost of investment is unrelated to the degree of cooperation that is ultimately achieved, unless the cost is too high, in which case investment cannot in any degree overcome the disincentive to cooperate. Moreover, the positive externalities that players have on each other in the course of play, although they affect investment, are ultimately irrelevant to the degree of cooperation achieved. We view our model as an explanation for the formation and stable existence of business alliances, where the players are firms forming a partnership defined and sustained by contractual agreements, but which is short of a merger or acquisition.


Introduction
We develop a simple two-firm game model to investigate how, given a situation with two players in which noncooperation is the only equilibrium, cooperation can be achieved via costly investment. We have in mind explaining the formation and self-This research was carried out in part during Bsrt Taub's stay at ICEF, Higher School of Economics, Moscow. He acknowledges financial support from the Ministry of Education and Science of the Russian Federation, Academic Excellence Project 5-100. enforcing sustainment of alliances between firms that are common in the real business world, but we believe our game is of independent theoretical interest.
An alliance is a cooperative venture between firms that is typically formalized through a set of contracts, rather than via a joint venture that is formally a separate firm owned by the partner firms, or is short of a merger or acquisition. 1 A canonical example is of a large pharmaceutical company allying with a biotechnology company to produce an efficacious molecule invented by the biotech at an industrial scale, using the complementary expertise of each firm for the profit of both. 2 Similarly, aircraft development and manufacture are undertaken by airframe manufacturers and numerous specialized alliance partners who jointly define product features and share development costs of modules, each subsequently undertaking responsibility for manufacturing the modules it has developed. 3 In our model, the cooperative technology driving the alliance needs complementary inputs from both firms: firm 1's investment and firm 2's cooperative action subsequent to the investment together determine firm 1's cooperative benefits (and symmetrically for firm 2). Think of firm 1 training some of its assets (machines, workers) to be compatible with firm 2's technology, and firm 2 sending some of its workers to implement the beneficial transfers.
We express these ideas with a game in two stages, an investment stage and an execution stage. To capture the complementarities, we structure the execution stage as a prisoner's dilemma. We restrict the payoffs in the prisoner's dilemma to have a separable linear structure so as to rule out direct technological complementarities. The benefit of defection then comes directly and only from each player's own action. While seemingly lacking in generality, ruling out direct technological complementarities allows us to focus solely on the consequences of strategic behavior on outcomes that would otherwise be masked.
What is key is that the costly investments in the first stage alter the payoffs of the subsequent prisoner's dilemma game. The modified payoffs resulting from the investment achieve cooperation via the folk theorem in the second stage-cooperation that would not occur without the investment.
What is more surprising is that, because firms are looking ahead, they have an incentive to invest not just to achieve cooperation per se in the second stage, but to achieve full cooperation uniquely. 4 Thus, we go beyond the folk theorem, which only yields a superset of this equilibrium. 1 In contrast with alliances, mergers and acquisitions (where the merged entity can force cooperation) entail substantial up-front fixed costs (such as legal fees, or, with airline mergers, re-painting the airplanes of one firm-distinct from the cost of investment) that can be avoided via an alliance; if the marginal cost of investment is too high to support an alliance then it might then be justifiable to undertake an acquisition or merger in order to take advantage of the firms' technological complementarities if the fixed cost does not eliminate all potential profit. 2 See, for example Lerner and Merges (1998) . For a history of the development of the artificial human insulin via an alliance between the biotechnology firm Genentech and the pharmaceutical company Eli Lilly, see Hall (2002). 3 See Dussauge and Garrette (1995) .
The cooperation result does depend on investment costs not being too high. If they are too high, there is no investment and no cooperation at all, so our theory also explains the failure of the firms to successfully form alliances: it is all or nothing. Our theory, therefore, has empirical relevance and has the potential to expand the theory of the firm in general.

Literature
There are several papers that construct theoretical models intended to capture features specific to alliances. Habib and Mella-Barral (2007) build a continuous-time model in which firms entering into the alliance acquire "know-how," modeled as a kind of capital accumulation rather than from learning in the sense of Bayesian updating, from their participation in the alliance project. The thrust of the article is that the alliance terminates ("dissolution") via a Texas shoot-out mechanism at some finite but stochastic stopping time, when joint participation in the project is no longer costeffective, that is, when operation by one of the firms singly is more efficient; at that point that firm buys out the other via the mechanism. As with our own model, the participating firms provide effort to the alliance project. However, the effort is predetermined, that is, it is not continuously re-determined once the alliance is formed, so strategic punishments for reduced effort are not an element of the model. A paper by Robinson (2008) similarly models alliances as entailing contributions of effort by both firms and also entailing capital investment, but in his case the capital is reallocated from internal projects rather than formed anew. As with Habib and Mella-Barral's model, the firms contribute effort, but again the effort is chosen ex ante, and given that it is a two-period model, strategic noncooperative punishments do not figure in; rather, the focus is on the decision to form an alliance ex post of the revelation of the productivity of alternative projects; the potential to form alliances ex post results in greater willingness to undertake risky ("long shot") projects.
Thus, relative to this literature, we model the crucial element of self-enforcement, which the previous literature simply assumes. Our key point is that this selfenforcement is driven by the investment that is the main ingredient for real-world alliances, and the forward-looking character of the firms leads them to make the "just right" quantity of investment needed to support this self-enforcement.
Our model also has some similarities with Panico's (2017) model of alliances. There are two stages, an initial investment stage followed by a repeated non-cooperative game with the payoff structure determined by the investment in the initial stage. Panico's model differs from ours in that there are exogenous complementarities, expressed via cross-multiplication terms in the payoff functions. Our model is also different in that firms' investments in the first stage influence the payoffs in the second stage, and any bargaining power arising from investment stems from the interaction of the payoffs, that is, we completely endogenize the bargaining power of the firms.
Our model has a number of similarities with the model of duopoly by Kreps and Scheinkman (1983): there are two stages of the game, with costly investment taking place in the first stage, and with a Bertrand game played in the second stage that is conditional on the investments that occurred in the first stage.
Kreps and Scheinkman establish that the revenue functions that emerge from the structure of the second stage are unique and have properties that allow the first stage to be analyzed. They then find that investment in the first stage always leads to the Cournot equilibrium in the second stage, not the Bertrand equilibrium that would result from the firms investing up to the competitive level. One might interpret this as the division of the game into stages in this sense as fostering limited cooperation between the firms and we have a similar result.
In our alliance interpretation of the game, the challenge facing the firms is to ensure their ongoing cooperation in a self-enforcing fashion, given the difficulty of anticipating all contingencies in contracts and the costliness of enforcing contracts. As in Kreps and Scheinkman's model, firms make costly investments in the first stage, but our second stage is different in that the firms play a repeated prisoner's dilemma, specialized further in a manner we describe below.

Contents of the paper
In Sect. 2, we describe the model: There are two stages: in the first stage the firms each invest I i at some cost cI i ; the second stage game is a Prisoner's Dilemma with payoffs linear in the level of "cooperative action" y i by both players, with y i interpretable as mixed strategy probabilities of cooperation, with the payoffs also determined by the first-stage investments. A monitoring device detects defection by either player with some probability 1 − δ and defection is then punished in grim trigger fashion.
In Sect. 3, we state and prove our main theorem, i.e., cooperation is sustained if and only if δ, the complement of the monitoring accuracy, is larger than the marginal cost of investment; we also propose a geometric interpretation of the result, and further connect the result with bargaining theory. A conclusion follows.

Definition 1 Given a (complete information) two-person game in normal form
The classic interpretation of a grim-trigger equilibrium comes from the infinite repetition of game G when both players have the discount factor 1−δ (see for instance Friedman 1986).
The following formally equivalent interpretation is better suited to our model. The players agree to play y * ; the agreement is self-enforcing because a unilateral deviation by player i is detected by a monitoring device with probability 1 − δ, then revealed to player j who in turn punishes player i in grim trigger fashion.
Here is the precise strategic scenario: Each player i publicly reports, to each other and to the machine, the strategy y * i she agrees to use; then the players simultaneously report their actual strategies y 1 , y 2 to the machine (but not to each other). With probability δ the machine does nothing and these strategies are final; the final payoffs are u i (y 1 , y 2 ). With probability 1 − δ, the machine compares the agreed upon with the actual choices: if y i = y * i while y j = y * j , the machine reveals this to player j, then gives both player a chance to pick a new strategy; these last choices are simultaneous.
In a δ-grim-trigger equilibrium, the player j who abides by the agreement but observes that player i does not is committed to play a strategy ensuring that player i can at most reach her min-max utility, i. e., x j solves min Naturally the game G may well have a large set of δ -gt-equilibria and this will be the case in our model.

The investment game
We next define the two-person multi-stage game Γ .
There are two players i = 1, 2. The three positive exogenous parameters of the model are the common cost c per unit of investment, the individual cost α i per unit of cooperative action, and the error parameter δ of the monitoring device (equivalently, the common discount factor 1−δ in the repetition of the Stage 2 game); also 0 < δ < 1.
Stage 1: each firms picks an investment level I i at cost cI i (there is no restriction on the size of I i ); this defines the game G(I 1 , where they choose a level of cooperation y i ∈ [0, 1]: Stage 2: G(I 1 , I 2 ) is played: the firms agree on a joint cooperative action y * and select simultaneously and independently their actual level of cooperation y 1 , y 2 ; the monitoring device is activated, which could result in a Stage 3 where both firms pick their final level of cooperation, as described above.
(In the repeated game interpretation, Stage 2 is simply an infinite repetition of G(I 1 , I 2 ) with common discount factor 1 − δ).
As noted in the introduction, the simple linear structure of the payoffs rules out direct technological complementarities between the players' actions y i . Excluding these seemingly makes the case for cooperation harder: if no investment takes place, the game G(0, 0) allows no cooperation at all.

Equilibrium selection
It follows from the following behavioral assumption in the game Γ : the players anticipate that in the second stage a δ-g.t. equilibrium will emerge; moreover, they evaluate the result of the second stage as their own worst undominated δ-gt-equilibrium.
In words: the players are confident that the cooperative opportunities afforded by the monitoring and commitment device will be exploited, but they make the worst case prediction that the other player will retain full bargaining power, that is, they are pessimistic.
Definition 2 A pessimistic equilibrium of the game Γ is a pair (I * 1 , I * 2 ) of investment strategies such that (i) There is a unique δ-g.t. equilibrium y * = (y * 1 , y * 2 ) in the game G(I * 1 , I * 2 ) with utilities (U * 1 , U * 2 ); (ii) For every I i ≥ 0 the worst δ-g.t. equilibrium for player i in G(I i , I * j ) gives him at most utility U * i .

The main result
Theorem 1 The null investment I = 0, with corresponding null payoffs, is always an equilibrium of the game Γ . If δ < c it is its unique pessimistic equilibrium. If δ > c there is another, welfarewise superior, pessimistic equilibrium I * : players fully cooperate in G(I * ) (that is, y * = (1, 1)), and for each i = 1, 2: If δ = c the null investment I = 0 and I * are both equilibria, both with zero payoffs.

Interpretation
The most striking feature of the result is that, whether cooperation is sustainable or not, we select a unique pessimistic equilibrium outcome. That cooperation is full, y * = (1, 1), or null, y * = (0, 0), is not surprising when the payoffs are linear in strategies.
The key comparison is between the cost per unit of investment c, and the error parameter δ of the monitoring device: full cooperation is feasible if and only if c < δ. The individual cost α i of the cooperative action plays no role. Unsurprisingly, a low investment cost makes cooperation easier to achieve but as is evident in the formula for investment in Eq. (3), the total quantity of investment is completely unrelated to the cost, as long as the marginal cost is below the c < δ threshold.
That a more accurate (smaller δ) monitoring device makes cooperation harder to enable seems counter-intuitive. It is important that the δ > c requirement applies to the first stage investment, not the second stage repeated game. The cost parameter c is the marginal cost of investment; correspondingly, δ is the marginal payoff from that investment in benefit. What is the benefit? It is the potential gain from defection, weighted by the probability of not being monitored; see Eq. (6). A larger δ increases the probability that a defector can successfully defect, conditional on the other player playing cooperate, that is, y j = 1. However, we also see from examining the left hand side of (6) that for cooperation to be possible, the right hand side must be positive, that is, δ > c is necessary for cooperation. The payoffs are then the net marginal benefit of investment multiplied by the degree of investment, as is evident in Eq. (3).
Once the δ > c threshold is satisfied, investment takes place in the first stage that is sufficient for cooperation to occur, but no more than that. Thus, the cone that we describe in Lemma 1 collapses to a ray in equilibrium; investment has taken place just sufficient for this, that is, for the folk theorem to apply. 5

Proof
We begin with a preliminary lemma: given an investment pair (I 1 , I 2 ), what are the undominated equilibria of the second-stage game? If there is insufficient investment then the non-cooperative equilibrium is the only equilibrium, but with sufficient investment there is a line of Pareto-dominant equilibria which in general is not a singleton. At a threshold pair of investments, this continuum of Pareto-dominant equilibria reduces to a point.

(and a symmetric statement by exchanging the role of the players).
Proof We prove the lemma. In G(I ), the min max payoff of each player is −cI i , because player i cannot get a positive payoff if y j = 0 . And the dominant defection is to provide no cooperation at all. Therefore, inequality (1) for player i reads If γ 1 γ 2 < 1, these two inequalities together, for i and for j, allow only y 1 = y 2 = 0, which proves statement 1).
For player 1, unless y = 0, inequality (6) implies therefore, his payoff increases when y increases along the ray [0, y]; by a similar argument, player 2's increases too. We conclude that the undominated δ-gt-eq. of G(I ) are exactly on the intersection of D with the North East frontier of [0, 1] 2 (equivalently, the NE frontier of D ∩ [0, 1] 2 ). If γ 1 ≥ 1 and γ 2 ≥ 1, this frontier contains the full cooperation outcome y * and is given by statement 2). And if γ 2 < 1 < γ 1 this frontier avoids y * and is contained inside the face y 1 = 1 of [0, 1] 2 , as described in statement 3).
We can now prove the theorem.
Proof The first statement is clear. If the other player does not invest, I cannot get a positive payoff in G(I ) and will get a strictly negative one if I invest.
Next, for each of the three types of investment profiles I discussed in Lemma 1, we check for possible Nash equilibria.
If γ 1 (I 1 ) ≥ 1 and γ 2 (I 2 ) ≥ 1, and at least one of these inequalities is strict, say γ 1 (I 1 ) > 1, then U − 1 (I ; δ) decreases strictly in I 1 (see (4)) so this cannot be an equilibrium. The only possibility is γ 1 (I 1 ) = γ 2 (I 2 ) = 1, which means that I = I * and the payoffs are given by ( 3). This cannot be an equilibrium if δ < c because each player is better off defecting and guaranteeing a non-negative payoff. But if δ ≥ c, we check now it is a Nash eq.. By (4) if Player 1 raises his investment to I 1 > I * 1 his payoff becomes U − 1 (I 1 , . If he lowers I 1 below I * 1 , he gets U − 1 (I 1 , I * 2 ) = −cI 1 , a much sharper loss. Note that if δ = c, the payoffs are both zero at I * . It remains to check the case of a profile I such that γ 2 (I 2 ) < 1 < γ 1 (I 1 ) and γ 1 (I 1 )γ 2 (I 2 ) ≥ 1. We claim I cannot be a Nash equilibrium. If δ > c , by (5) U − 2 (I ) increases strictly in I 2 , so player 2 is not best replying. If δ ≤ c, then we use (5) again to compute proving the claim.

Geometric interpretation
Our second-stage game is equivalent to a prisoner's dilemma in which the game frontier of the mixed-strategy payoffs form a parallelogram; Taub and Kibris (2004) demonstrate the basic properties of this game. In the model here, investment by player 1 shifts the right facet of the game frontier to the right, leaving the left facet unaffected. The rightward shift of the right facet also flattens the upper and lower facets, but leaves in place the payoffs of the partner player, and also preserves the parallelogram structure. 6 Viewing the players as firms forming an alliance, this corresponds crudely to firm 1's ability to increase the output capacity of its own factory whilst leaving the output of the partner player's factory unaffected.
The I * equilibrium achieved here corresponds to the apex of a rectangle that is traced out by all investment pairs (I 1 , I 2 ) that minimally achieve cooperation, so that the Pareto dominant equilibrium set consists of a single point (see the orange dashed rectangle in Figs. 1a, b ), as described in point (3) of Lemma 1. Only the parameters α 1 and α 2 are needed to describe it; the positive externality terms from the investments I i y j are rendered irrelevant after the investment stage has attained the apex, as long as the cost is below the threshold. The content of Theorem 1 is that Fig. 1a, in which player 2 is not fully cooperating, is not an equilibrium of the game Γ : in Fig. 1b, both players are fully cooperating on the Pareto frontier, and investment takes place sufficient for this to be the only outcome.
To gain intuition about the rectangle, starting from the no-investment state, begin with a box with apexes at (0, 0) and (−α 1 , −α 2 ) (shaded in green in Fig. 1a, b). The first key observation is that this box is invariant with respect to all subsequent investment such that minimal cooperation is maintained, as these investments do not affect the α i . Now notice that the box can be transferred to the apex of the no-investment game, and to any intermediate game in which investment achieves minimal cooperation, as illustrated in panel (a). The slope of the diagonal of the box is equal to the ratio of the marginal payoffs from a player playing Defect, starting from a position of (Cooperate, Cooperate). In this sense, the apex solution I * reflects the bargaining power of the players that is determined by their potential gains from defection and is not affected by investment. (Its exact locus also depends on δ; however, the coefficient δ 1−δ is simply the conversion of the stock value of the equilibrium payoff into a flow value and as such is only a scaling coefficient.)

Correspondence with a bargaining solution
The uniqueness of the apex outcome and the irrelevance of the positive externality terms I i y j and of the cost parameter once the minimum-cost threshold is crossed suggest that the apex solution is also a bargaining solution.
Bargaining is a formalization of a cooperative game, typically with two players, in which there is a trade-off between the rewards the players can receive. By agreeing on axioms that must hold for the decision on how to trade off rewards, a unique outcome can be achieved. The most well-known bargaining solutions include the Nash solution and the Kalai-Smorodinski solution, which has a variant known as the Kalai-Rosenthal (Kalai and Rosenthal 1978) solution. Details about these and other bargaining solutions can be found in Osborne and Rubinstein (1990) and Friedman (1986, especially pp. 205 ff). These solutions have geometric representations and the fact that our game has a geometric interpretation leads to the connection with bargaining theory: the box with apexes at (0, 0) and (−α 1 , −α 2 ) that we previously pointed out in Fig. 1a is in some sense the mirror image of the geometric construction of the Kalai-Rosenthal solution, in which one would draw a box from the same starting points extending into the positive quadrant rather than the negative quadrant. 7 At the I * solution, the apex point 1−δ δ (α 1 , α 2 ) is the Kalai-Rosenthal solution by construction: the slope of the diagonal in Fig. 1b is the slope of the ray to the solution.
The rectangle with apex 1−δ δ (α 1 , α 2 ) traced out by the investment pairs that achieve minimal cooperation (the orange rectangle in Fig. 1a) can be viewed as the bargaining set available to the players before they undertake investment. Assuming that they coordinate investment to just achieve minimal cooperation, it is only points on this rectangle that are available to them as equilibria from an ex ante point of view. (This is analogous to Kreps and Scheinkman's R functions.) The apex of this rectangle trivially satisfies both the Kalai-Smorodinsky and the Kalai-Rosenthal constructions. In this sense, our model is equivalent to a bargaining solution.

Conclusion
We have an all-or-nothing outcome: either agents fully cooperate if the marginal cost of investment is below the monitoring threshold or they do not cooperate at all.
The folk theorem predicts cooperation, but rests on a requirement of patience that does not seem to have much of an empirical correlate. Our model bridges this gap in that, with the interpretation of 1 − δ as equivalent to a discount factor, or as the monitoring precision of a coordinating device, cooperation emerges regardless of patience or precision, as long as it is not too costly, in the sense of the requirement that c ≤ δ.
We believe that our model explains the existence of alliances. Alliances are a significant part of the business landscape: they typically entail joint investment and the cooperation they embody is long lasting. We also explain the failure of alliances to form, and both the existence and failures of alliances rest on the interaction of costs and patience (or, alternatively, interpreting δ as the monitoring and enforcement structure), which do have empirical correlates.