Research and Development Cooperatives and Market Collusion: A Global Dynamic Approach

We present a continuous-time generalization of the seminal research and development model of d’Aspremont and Jacquemin (Am Econ Rev 78(5):1133–1137, 1988) to examine the trade-off between the benefits of allowing firms to cooperate in research and the corresponding increased potential for product market collusion. We show the existence of a solution to the optimal investment problem using a combination of results from viscosity theory and the theory of planar dynamical systems. In particular, we show that there is a critical level of marginal cost at which firms are indifferent between doing nothing and starting to develop the technology. We find that colluding firms develop further a wider range of initial technologies, pursue innovations more quickly, and are less likely to abandon a technology. Product market collusion could thus yield higher total surplus.


Introduction
An important reason for allowing firms to set up research and development (R&D) cooperatives is that these "organizations, jointly controlled by at least two participating entities, whose primary purpose is to engage in cooperative R&D" [1] internalize technological spillovers-the free flow of knowledge from the knowledge creator to its competitors. Indeed, Bloom et al. [2] estimate that a 10% increase in a competitor's R&D is associated with up to a 3.8% increase in a firm's own market value. The exemption for R&D cooperatives in anti-cartel legislation is thus perceived to diminish the failure of the market for R&D. However, as Scherer [3] observes: "the most egregious price fixing schemes in American history were brought about by R&D cooperatives", an observation that constitutes the classic counterargument to a permissive antitrust treatment of R&D markets [4][5][6]. For instance, Goeree and Helland [7] find that in the US the probability that firms join an R&D cooperative has dropped due to a revision of antitrust leniency policy in 1993. This revision is perceived as making collusion less attractive. They conclude that "Our results are consistent with RJVs [research joint ventures] serving, at least in part, a collusive function." The laboratory experiments of Suetens [8] also show that members of an RJV are more likely to collude on price. At the same time, it is quite well established that the prospect of future market power enhances a firm's incentives to invest in R&D [9]. As Greenspan [10] puts it: No one will ever know what new products, processes, machines, and cost-saving mergers failed to come into existence, killed by the Sherman Act before they were born. No one can ever compute the price that all of us have paid for that Act which, by inducing less effective use of capital, has kept our standard of living lower than would otherwise have been possible.
In this paper, we develop a dynamic model of R&D that considers explicitly the cost of "new … processes" that "failed to come into existence … before they were born" because of the ban on price-fixing agreements.
The channels through which cooperation in R&D facilitates product market collusion have been examined in a number of theoretical studies [11][12][13][14][15]. According to Fisher [16, p. 194]: … [firms] cooperating in R&D will tend to talk about other forms of cooperation. Furthermore, in learning how other firms react and adjust in living with each other, each cooperating firm will get better at coordination. Hence, competition in the product market is likely to be harmed.
In the short run, the reduced intensity of product market competition is likely to hurt consumers. At the same time, it could enhance the functioning of an R&D cooperative. For instance, Geroski [17] argues that it is the feedback from product markets that directs research toward profitable tracks and that, therefore, for an innovation to be commercially successful, there must be strong ties between marketing and development of new products. And Jacquemin [18] puts forward that R&D cooperatives are fragile and unstable. He reasons that when there is no cooperation in the product market, there exists a continuous fear that one partner in the R&D cooperative may be strengthened in such a way that it will become too strong a competitor in the product market. Preventing firms from collaborating in the product market may therefore destabilize R&D cooperatives or prevent their formation in the first place. Our focus is on private incentives to develop cost-saving technologies over time. In particular, we show that if firms collude in the product market, a wider range of technologies is fully developed. We also show that firms competing in the product market realize an inferior productive efficiency. We thus identify situations where product market collusion increases total surplus.
Dynamic models of R&D were first introduced to study patent races whereby successful innovators capture the entire market. This literature starts with Loury [19] and Lee and Wilde [20] ( [21] surveys the early contributions). Patent race models examine, in essence, the time it takes for a cost-saving innovation to be completed. R&D investments reduce this completion period. Because in these models the R&D process itself cannot fail, the R&D-investment decision is transformed into a static one. Meanwhile, a large literature has developed on the relation between intellectual property rights and antitrust policies. For instance, Quirmbach [22] finds that there is an optimal level of collusion that is in-between perfect competition and full collusion. And Green and Scotchmer [23] show that it is optimal to allow for collusion through sequential licensing in case the next innovation is a truly new application of existing patents. More recently, another strand of dynamic R&D models has developed: continuous-time generalizations of strategic R&D models. These models allow for "smoothing the investment efforts over a long time" [24], a type of investment behavior that is observed in practice and that constitutes a key feature of continuoustime models. Cellini and Lambertini [24] is the first continuous-time generalization of the seminal analysis of d'Aspremont and Jacquemin [25]. In the duopoly game of d'Aspremont and Jacquemin [25], firms first invest in cost-reducing R&D and then play a Cournot game in the product market. In the continuous-time version of Cellini and Lambertini [24], both firms start from an initial technology (that is, a level of marginal cost) and invest continuously in R&D. This gradually reduces the initial level of marginal cost toward the steady-state level. In contrast to the static generalization of d'Aspremont and Jacquemin [25] by Hinloopen [26], Cellini and Lambertini [24] find that the aggregate level of R&D is monotonically increasing in the number of independent competitors.
We also consider a continuous-time generalization of d'Aspremont and Jacquemin [25]. There are two distinguishing features of our analysis. First, we consider all possible initial marginal cost levels, including those exceeding the choke price (the lowest price for which there is no demand). Especially in the early stages of development, it is quite likely that the cost of a new technology (the cost, say, to develop a prototype) exceeds the highest willingness to pay in the market. We characterize situations where such initial technologies are only developed if firms collude in the product market. Indeed, excluding initial marginal costs that are above the choke price ignores " … new … processes … [that] failed to come into existence, [as they are] killed by the Sherman Act before they were born." These instances constitute a direct welfare gain of product market collusion.
Second, in addition to near-equilibrium paths, we consider all trajectories that are candidates for an optimal solution. This global analysis yields a bifurcation diagram that indicates for every possible parameter combination the qualitative features of any market equilibrium as well as of the transient dynamics toward it. We thus identify critical parameter values: points in parameter space at which the optimal investment function changes qualitatively. In particular, we determine the value of marginal costs for which R&D investments are terminated, and for which they are not initiated at all. We prove that these critical cost levels are affected by firm conduct. Therefore, extending the R&D cooperative to product market collusion can lead to qualitatively different long-run solutions, in spite of starting from an identical initial technology.
The related literature [24,[27][28][29][30] has not considered initial marginal cost levels that exceed the choke price nor has it carried out a global analysis. In all these papers, any of the initial (permissible) technologies will be developed to full materialization; technologies that are only developed under specific regimes (i.e., product market collusion) remain hidden. The only exception is Hinloopen et al. [31], who characterize the equilibria of a continuous-time dynamic monopoly with R&D investments. We expand their analysis in three directions. First, we consider a duopoly rather than a monopoly. Second, we examine two different scenarios: one in which firms cooperate in R&D and compete in the product market (labeled "partial collusion"), and one in which firms cooperate both in R&D and in setting price (labeled "full collusion"). Indeed, comparing the two scenarios allows us to examine the effects of extending cooperation in R&D toward collusion in the product market. And third, rather than relying on numerical simulations, we prove a set of propositions that characterize the dynamics of the model throughout the entire parameter space.
Our framework yields four possible outcomes for any initial draw of a new technology (cf. [31]). First of all, a "Promising Technology" arrives, whereby the initial technology is developed through continuous R&D investments. This can occur for initial cost levels both below and above the choke price. In the latter case, production starts only after some time, because early R&D efforts have to bring down marginal cost below the choke price. Second, a "Strained Market" arises: initial marginal cost is below the choke price and firms invest in R&D, but the technology is not likely to be developed to full materialization. In case of an "Uncertain Future," the third situation, it is not immediately clear whether the long-run steady state will be reached, or that it is optimal to gradually leave the market. Only time will tell. Fourth, an "Obsolete Technology" can emerge: whatever the initial marginal cost, the technology is either not developed or developed only to be taken off the market in due time. The long-run steady state will not be reached in either case.
All four technologies can emerge under both partial collusion and full collusion. Comparing the two scenarios throughout the entire parameter space, we find that if firms collude in the product market (i) it is more likely that an initial technology qualifies as a "Promising Technology," and if so, that it is more likely to be developed further, (ii) it is less likely that an initial technology qualifies as an "Obsolete Technology," and if so, it is more likely that firms invest in R&D, albeit temporarily, and (iii) if an initial technology causes a "Strained Market" or if it induces an "Uncertain Future," it is less likely that it will be taken off the market in due time. Put differently, due to product market collusion it is more likely that firms invest in R&D, and that these investments eventually lead to a steady state with positive production.
Our analysis qualifies the per se prohibition of collusion in product markets for high-tech industries. A higher total surplus obtains if colluding firms develop an initial technology and arrive at the saddle-point steady state while firms that compete in the product market would not develop the technology at all. We show that this is more likely to happen if new technologies arrive in circumstances that offer a high profit potential (that is, large markets and efficient R&D processes). Under these circumstances, product market collusion can also yield higher total surplus if competing firms would develop the new technology as well, be it to take it off the market in due time, or to arrive at the saddle-point steady state. And in so far, higher R&D investments as such are desirable (as suggested in the endogenous growth literature; see, e.g., [32,33]) the case for prohibiting collusion per se is further weakened. On the other hand, colluding firms tend to hold on longer to technologies that are destined to leave the market. This is not desirable from a social welfare point of view if that prevents the development of new, superior technologies.
A particularly difficult situation arises when the initial technology is above the choke price and if it will be developed only if firms collude in the product market. The welfare cost of prohibiting firms to collude then remains hidden because no production is affected by this prohibition. There is no production yet, and because collusion is prohibited, there will be no production in the future. Put differently, no production will be taken off the market if firms are prohibited to collude in the product market, leaving the welfare cost unnoticed. Our analysis thus offers a first glance at "new … processes … [that] failed to come into existence, killed by the Sherman Act before they were born." The remainder of the paper is organized as follows. The basics of the model are introduced in Sect. 2. In Sect. 3, the necessary conditions for optimal production and investment schedules are derived under partial collusion and full collusion. The corresponding bifurcation diagrams are derived in Sect. 4 and the two scenarios are compared in Sect. 5. Section 6 concludes. Appendices contain the proofs of all propositions.

The Model
Our present model is an extension of the global monopoly framework of Hinloopen et al. [31] to two firms, and it builds on Cellini and Lambertini [27]. Do note that Smrkolj and Wagener [34] show that the equilibrium considered in [27] is not subgame perfect. Time t is continuous: t ∈ [0, ∞[. There are two a priori fully symmetric firms that both produce a homogeneous good at constant marginal costs c i (t). At every instant, the market price p(t) is given as where Q(t) = q 1 (t) + q 2 (t), with q i (t) the quantity produced by firm i at time t, and whereĀ is the choke price.
Each firm i can reduce its marginal cost c i (t) by investing in R&D. In particular, when firm i exerts R&D effort k i (t), its marginal cost evolves as where k j (t) is the R&D effort exerted by its rival and whereβ ∈ [0, 1] measures the degree of spillover. Note that efficiency of production is assumed to decrease at a constant rate, as captured byδ > 0. This depreciation is due to (exogenous) aging of technology and organizational forgetting [28,35]. As Benkard [36, p. 590] observes: "… an aircraft producer's stock of production experience is constantly being eroded by turnover, lay offs and simple losses of proficiency at seldom repeated tasks. When producers cut back output, this erosion can even outpace learning, causing the stock of experience to decrease." In our model, R&D investment yields know-how gains but the logic of the argument is the same. For instance, complementary inputs that are typically purchased also constitute a fraction of production cost. Incorporating these inputs becomes ever more costly due to their inherent evolution over time, especially for firms that are relatively sluggish in R&D, as R&D efforts also determine any firm's "absorptive capacity" [37].
A non-positive depreciation rate yields trivial equilibria. Every initial technology will be developed in caseδ is negative, as there is an exogenous reduction in marginal cost over time. Forδ = 0, considerδ to be marginally positive. In that case, the value of initial marginal cost that would make it optimal not to invest in R&D is far above the choke price because only an infinitesimally small investment in R&D is then needed to reduce marginal cost over time.
Both firms are endowed with a given identical initial technology c i (0) = c j (0) = c 0 , which represents the state of the technology at the moment of the invention of the product. Per unit of time, the costs of R&D efforts are whereb > 0 is inversely related to the cost-efficiency of the R&D process. The R&D process is thus assumed to exhibit decreasing returns to scale ( [38]; see also the discussion in [31]). Both firms discount the future with the same constant rateρ > 0. Either firm's instantaneous profit therefore equals with corresponding total discounted profit The model has five parameters:Ā,β,b,δ, andρ. To simplify the analysis, we rescale the model such that it has only three parameters. Rescaling is done by choosing "natural units" for the problem; it does not involve making special parameter choices. Rather, each choice of parameters in the original model corresponds to a choice of parameters in the rescaled model. The complexity reduction obtained by the scaling is a consequence of the fact that in the original parameters, many choices give rise to mathematically equivalent models. In mathematical terms, we embed the given fiveparameter family of models in a six-parameter family. We then show that the scaling transformations we consider allow us to choose three parameter values to be equal to 1, effectively reducing the number of parameters to three.

Lemma 2.1 The following equations define new variables
and new parameters φ =Ā/(δ √b ) andρ =δρ. In the new variables, the model takes the form:π wherec i ≥ 0, and with the control restrictionsq i ≥ 0 andk i ≥ 0.
The proof of the lemma is given in "Appendix A." Remark 2.1 Rescaling the model as in Lemma 2.1 introduces a new parameter: φ. It is one-to-one related to the profit potential of a technology. Higher potential revenues come with a higherĀ, and each unit of R&D effort costs more ifb increases, while it reduces marginal cost by less the higher isδ. In sum, a lower (higher) φ corresponds to a lower (higher) profit potential.

Remark 2.2
In mathematical terms, the original model is a specimen of the sixparameter model given bẏ with parameters values The-equivalent-model in new variables is an instance of the same six-parameter model, but with parameters values We can, and will, without loss of generality drop the tildes from the "new" variables, the bars from the parameters, and take A = 1, b = 1 and δ = 1.

Partial Collusion and Full Collusion
In this section, we derive the necessary conditions for optimal production and investment schedules in case firms cooperate in R&D but compete in the product market (Sect. 3.1), and in case firms cooperate in R&D and collude in the product market (Sect. 3.2).

Partial Collusion
Both firms operate their own R&D laboratory and production facility. They select their output levels non-cooperatively and adopt a strictly cooperative behavior in determining their R&D efforts so as to maximize joint profits. These assumptions amount to imposing a priori the symmetry condition It may seem reasonable to assume that when firms cooperate in R&D, they also fully share information, that is, to assume the level of spillover to be at its maximum (β = 1; see [39]). For the sake of generality, we do not a priori fix the value of β at its maximal value. There are also intuitive arguments for not doing so as there might still be some ex post duplication and/or substitutability in R&D outputs if firms operate separate laboratories (see the discussion in [40]). The instantaneous profit of firm i is with Q = q 1 + q 2 , yielding its total discounted profit over time As firms jointly decide on their R&D efforts, the only independent decisions are those of production. However, as quantity variables do not appear in the equation for the state variable (9), production feedback strategies of a dynamic game are simply static Cournot-Nash strategies of each corresponding instantaneous game. Maximizing π i over q i ≥ 0 gives us standard Cournot best-response functions for the product market Note that the constraint q i ≥ 0 is binding when q j ≥ 1 − c. Solving for Cournot-Nash production levels, we obtain Consequently, the instantaneous profit of each firm is We assume that firms face no financial constraints; they can invest in R&D prior to production. Indeed, credit rationing would impose an upper limit on the value of an indifference point; qualitatively it would not change our conclusions, however. Also, for a sample of Italian manufacturing firms, Piga and Atzeni [41] find that credit constraints are negligible for R&D intensive firms. And Bond et al. [42] find no significant relationship between the level of R&D investments and cash flow for German and UK firms, while Harhoff [43] finds a weak but statistically significant relationship for both small and large German firms. The sensitivity of R&D investments to cash flow fluctuations seems to be stronger for US firms (e.g., [44,45]), but by and large, the literature on the importance of financial constraints for R&D investment is inconclusive (see [46] for an overview).
The dynamic optimization problem of the R&D cooperative boils down to finding an R&D effort schedule k * for either firm that maximizes the total discounted joint profit, taking into account the state Eq. (9), the initial condition c(0) = c 0 , and the control constraint k(t) ≥ 0 which must hold at all times. Note that according to (9), if c 0 > 0, then c(t) > 0 for all t. The natural state space of this problem would be the interval ]0, ∞[ of positive marginal cost levels, but for mathematical convenience, we extend this to R by specifying that In order to close the model, we have to specify the set of admissible effort schedules k(t).

Definition 3.1 An R&D effort schedule is admissible if it is a bounded nonnegative measurable function.
To solve this problem, we introduce the current-value Pontryagin function (also called the un-maximized Hamilton or pre-Hamilton function), whereby we omit a factor 2 for joint profits to obtain the solution expressed in per-firm values (due to symmetry, maximizing per-firm total profit corresponds to maximizing joint total profit) where λ is the current-value costate variable of a firm in the R&D cooperative. The costate (or shadow value) measures the marginal worth of the increment in the state c for each firm at time t when moving along the optimal path. As we expect an increase of the marginal costs to entail lower profits for the firm, we expect the shadow value to be non-positive-that is λ(t) ≤ 0-along optimal trajectories. We use Pontryagin's maximum principle to obtain the solution to our optimization problem. Maximizing over the control k ≥ 0 yields The maximum principle states further that the optimizing trajectory necessarily corresponds to the trajectory of the state-costate systeṁ where k is replaced by its maximizing value. For λ ≤ 0, relation (16) gives a oneto-one correspondence between the costate λ and the control k. We use this relation to transform the state-costate system into a state-control system which an optimizing trajectory has to satisfy necessarily as well. This system consists of two regimes (following the two part composition of the Pontryagin function). The first one corresponds to c < 1 and positive production (q = (1 − c)/3). The second one corresponds to c ≥ 1 and zero production. Note that in the non-rescaled model, the analogous conditions for positive and zero production are c(t) <Ā and c(t) ≥Ā, respectively. The state-control system with positive production consists of the following two differential equations:ċ The state-control system with zero production is given bẏ

Full Collusion
Under full collusion, firms determine jointly their R&D efforts and their output levels. This amounts to imposing a priori the symmetry conditions k i (t) = k j (t) = k(t) and q i (t) = q j (t) = q(t). Equation (8) reads again as Eq. (9). The profit of each firm at every instant is with corresponding total discounted profit The optimal control problem of the two colluding firms is to find controls q * and k * that maximize the profit functional Π subject to the state Eq. (9), the initial condition c(0) = c 0 , and two control constraints that must hold at all times: q ≥ 0 and k ≥ 0. Notice, again, that according to (9), if c 0 > 0, then c(t) > 0 for all t.
The current-value Pontryagin function in case of full collusion reads as: where λ is the current-value costate variable. It now measures the marginal worth at time t of an increment in the state c for a colluding firm when moving along the optimal path. The necessary conditions for the solution to the dynamic optimization problem consist again of a state-control system which has two regimes. As in the partial collusion case, the first regime corresponds to c < 1 and positive production (q = (1 − c)/4), while the second regime corresponds to c ≥ 1 and zero production.
The state-control system in the region with positive production reads aṡ whereas the state-control system with zero production iṡ
Depending on the value of D, there are three different situations.
A. If D > 0, the state-control system with positive production (23) has three steady states: is the unique steady state of the state-control system with positive production, which is unstable.
The system consequently exhibits a saddle-node bifurcation at D = 0.

Remark 4.1
The stable manifold of the saddle-point steady state is one of the candidates for an optimal solution. However, as neither the Mangasarian nor the Arrow concavity conditions are satisfied, the stable manifold is not necessarily optimal. Proposition 4.1 already implies that there should be other candidates for optimality as there is a parameter region for which there is no saddle point and hence no stable manifold to it.
The following result clarifies ("Appendix C" contains the proof). Fig. 1. The thick black lines W s − and E indicate optimal solutions. The dotted vertical line c = 1 separates the region with zero production from the region with positive production. We label the trajectory E the "exit trajectory", as following this trajectory implies that firms eventually leave the region with positive production. Proposition 4.2 only reduces the set of trajectories by applying necessary conditions for optimality, but there is no guarantee that an optimal solution exists. The next proposition summarizes when an optimal solution exists.

Proposition 4.3
For all admissible values of the parameters, and all initial points, the optimal control problem has at least one solution, which is among the candidates The proof is in "Appendix D." To assess the dependence of the solution structure on the model parameters, we carry out a bifurcation analysis. This consists of identifying those parameter values for which the qualitative structure of the optimal dynamics changes. These "bifurcating" values bound open parameter regions such that the optimal dynamics are qualitatively identical for all parameter values in a region (see [47,48]). Put differently, for all points in a region, a sufficiently small change in parameter values will not lead to a qualitative change of the optimal dynamics; regions characterize stable types of dynamics.
System (25) has four distinct stable dynamics types (cf. [31]). These are illustrated in Fig. 2 in case of partial collusion. Note that the same types emerge under full collusion (the stable dynamics types are compared across scenarios in Sect. 5). The first type is a "Promising Technology." In this case, there exists an initial technologyĉ > 1 that is an indifference threshold: a point in state space where the decision maker is indifferent between two optimal trajectories that have distinct long-term limit behavior. In particular, for 0 < c 0 ≤ĉ it is optimal to start developing the initial technology, ending up in the saddle-point steady state in the region with positive production. If 1 < c 0 ≤ĉ, initially firms invest only in R&D; production begins whenever c(t) < 1. If c 0 ≥ĉ, it is optimal not to initiate R&D efforts; potential future profits do not suffice to compensate for losses that would be incurred in the initial periods during which firms would invest in R&D but would not produce yet (note that for c 0 =ĉ, there are two distinct R&D investment trajectories, which are, nevertheless, both optimal; see also Proposition 4.3).
The second type corresponds to a "Strained Market," where there is an indifference threshold below the choke price (that is, in the region with positive production): 0 < c < 1. In this case, if 0 < c 0 <ĉ, the initial technology will be developed toward the saddle-point steady state. Ifĉ < c 0 < 1, the exit trajectory applies; R&D investments only serve to slow down the technological decay.
In a small part of the parameter space, the third type arises: an "Uncertain Future." Initial technologies (states) are now divided by a repelling steady state (rather than an indifference point). If the system starts exactly at the repelling point c R , it stays there indefinitely; when it starts close to it, it stays there for a long period of time, after which it converges to one of the two attractors: the steady state or the exit trajectory. The fourth type typifies the dynamics of an "Obsolete Technology." Whatever the initial technology, (eventually) the firms leave the market; R&D investments are only used to slow down the technical decay.
The four different dynamics types are grouped conveniently in a bifurcation diagram (see Fig. 3): the graph that indicates for every possible parameter combination the qualitative features of any market equilibrium as well as the transient dynamics toward them. In Fig. 3, the uppermost curve represents parameter values for which the indifference point is exactly at c = 1. At the saddle-node curve (SN), an optimal repeller and an optimal attractor collide and disappear. The curve SN' corresponds to saddle-node bifurcations in the state-control system that do not correspond to optimal dynamics. At the indifference-attractor bifurcations (IA), an indifference point collides with an optimal attractor and both disappear. Finally, at an indifference-repeller bifurcation (IR), an indifference point turns into an optimal repeller. The central indifferencesaddle-node (ISN) bifurcation point at (ρ, φ(1 + β)) ≈ (2.14, 8.78) organizes the bifurcation diagram. The curve representing indifference points at c = 1 obtains a value of φ(1 + β) ≈ 2.998 for ρ = 1 × 10 −5 .

Collusion and the Incentives to Innovate
In this section, we compare the global optimum of the two scenarios. For a welfare comparison, we introduce total discounted values of profits (Π ), consumer surplus (CS), and total surplus (TS): where at time t = 0 firms start with c 0 and then invest along the optimal trajectory We first formally establish that the two scenarios yield different (optimal) trajectories. In Fig. 4, the bifurcation diagrams of both scenarios are superimposed. There are significant quantitative differences between the two diagrams, as reflected by the different locations of the curves that divide the parameter space. Let I i , I I i , . . . , i = 1, 2 denote regions I , I I, . . . under scenario i, with i = 1 (2) corresponding to partial (full) collusion. The following then holds (see "Appendix E" for the proof).

Proposition 5.1
The following inclusions hold: The first inclusion of Proposition 5.1 implies that the "Promising Technology" region is larger if firms collude in the product market; due to collusion, the situation where firms first invest in R&D, and only after some initial development period start producing, is more likely to occur. From the third inclusion follows that the "Obsolete Technology" region is smaller if firms collude; firms that collude are less likely either not to develop an initial technology or to invest in R&D only to abandon the technology in time.

R&D Investment Incentives
In line with much of the related literature [9], Proposition 5.1 suggests that colluding firms have in general a stronger incentive to invest in R&D. This turns out to be the case, as the next proposition formally shows (see "Appendix F.1" for the proof). Third, firms that collude in the product market abandon obsolete technologies at a lower pace. This implication has a similar vein as the argument of Arrow [49], that a monopolist has less incentive to invest in R&D than an otherwise identical but perfectly competitive market, because by doing so the monopolist replaces current monopoly profits by future (higher) monopoly profits. Here, the alternative for colluding firms is to exit the market more quickly (rather than staying in the market as a monopolist, as in Arrow [49]), an alternative that for them is not optimal (see Fig. 5).
The difference in R&D intensity across the two scenarios is also reflected in the type of trajectories that firms select. To characterize this difference for all possible situations, it is convenient to have defined the threshold level of initial marginal cost c between "eventual exit" and "eventual positive production." Formally, setĉ = 0 in the "Obsolete Technology" region and letĉ 1 andĉ 2 denote the threshold level for the partial collusion and the full collusion scenarios, respectively. We can then state the following (see "Appendix F.2" for the proof). The implications of Proposition 5.3 are twofold. First, if firms collude in the product market, the set of initial technologies that are developed toward the saddle-point steady state is larger (see Fig. 6). In particular, if the initial technology c 0 falls in the nonempty interval ]ĉ 1 ,ĉ 2 [, it will only be brought to full materialization if firms collude in the product market. Second, the set of initial technologies that triggers no investment in R&D at all or that induces firms to select the exit trajectory is smaller if firms collude in the product market. Figure 7 illustrates this for a "Strained Market." The strained investment circumstances induce partially colluding firms to exit the market in due time for all c 0 >ĉ 1 . In contrast, fully colluding firms exit the market only for c 0 >ĉ 2 . Initial technologies c 0 in the interval ]ĉ 1 ,ĉ 2 [ are therefore only brought to full maturation by firms that collude in the product market.
We can conclude that due to collusion in the product market (i) it is more likely that an initial technology qualifies as a "Promising Technology," and if so, that it is more likely to be developed further, (ii) it is less likely that an initial technology qualifies as an "Obsolete Technology," and if so, it is more likely that firms invest in R&D, albeit temporarily, and (iii) if an initial technology causes a "Strained Market" or if it induces an "Uncertain Future," it is less likely that it will be taken off the market in due time. Put differently, due to product market collusion it is more likely that firms invest in R&D, and that these investments eventually lead to a steady state with positive production.

Total Surplus
We next consider the effect of product market collusion on total surplus. Obviously, collusion in the product market yields higher total surplus if colluding firms develop an   Figure 8 illustrates some comparative statics of the indifference points in this case. Indeed, these points are positively related to market size and R&D efficiency. Note, however, that also ĉ =ĉ 2 −ĉ 1 increases if the R&D process becomes more efficient and/or if the market size becomes larger, the more so the lower is the discount rate (in Fig. 8, a lower discount rate corresponds to a larger slope of the convex curves). Because future mark-ups are positively related to both market size and R&D efficiency, an increase in either of these two has a larger (positive) effect on future profits if firms collude in the product market. And these future benefits feature more prominently in total discounted profits if the discount rate is lower. Therefore, indifference points correspond to lower marginal cost values if the discount rate goes up, all else equal (cf. the relative location of C 1 and C 2 in Fig. 8).
Product market collusion can also yield higher total surplus if colluding firms arrive at the saddle-point steady state while firms that compete in the product market would select the exit trajectory. In these cases, firms that compete in the product market temporarily produce more. This is offset by the added benefits of sustained R&D investments under full collusion if the discount rate is sufficiently small (see Fig. 7).
Finally, collusion in the product market can also yield a higher total surplus if under both scenarios firms would select the trajectory toward the saddle-point steady state: in Fig. 9, for all c 0 ∈]c ,ĉ 2 [, total surplus is higher if firms collude in the product market. In this example, the discount rate is high: ρ = 10, which corresponds, for instance, to the non-rescaled variablesδ = 0.01 andρ = 0.1. Also, the initial marginal costs are sufficiently high. In such an environment, the higher R&D investments and the reduced importance that is attached to future surplus work in favor of product market collusion as under this scenario firms will reach the production stage more quickly, a benefit that more than offsets the welfare loss of future increased mark-ups. That is, a higher (rescaled) discount rateρ =ρ/δ implies either a higher discount rateρ or a lowerδ. With a lowerδ, cost reductions take longer, such that the time difference in reaching the production stage between the two scenarios becomes more pronounced.

Conclusions
Schumpeter's famous observation continues to challenge the design of optimal competition policies for high-tech sectors. The classic rationale for competition policies is rooted in their effect on total surplus. Typically, product market collusion transfers consumer surplus to firm profits, resulting in a net loss of total surplus. To date, the literature considers this result to be robust to the increased incentive to invest in R&D that comes with collusion in the product market. Our analysis shows that it actually fails this robustness check if the phase of development prior to production is taken into account and/or if all possible R&D investment trajectories are considered. According to our analysis, extending an R&D cooperative agreement to collusion in the product market is welfare enhancing if the market size is large and/or the R&D process is efficient, given a relatively modest discount rate. The profit potential of a new technology is then relatively large. As a result, firms that collude in the product market bring more initial technologies to full materialization.
A particularly disturbing situation arises when the initial draw c 0 out of ]ĉ 1 ,ĉ 2 [ is above the choke price (c 0 > 1). The welfare cost of prohibiting firms to collude in the product market then remains hidden because no production is affected by this prohibition. There is no production yet, and because collusion is prohibited, there will be no production in the future. Put differently, no production will be taken off the market if firms are prohibited to collude in the product market.
Our analysis thus signals a potential problem for antitrust policy as it shows that prohibiting collusion in the product market per se is not univocally welfare enhancing. It also shows that the associated welfare costs might not surface. Further research is needed to substantiate our qualification of prohibiting collusion per se, including the development of richer models that allow for learning by doing, stochastic R&D, and asymmetries between firms.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Appendix A: Proof of Lemma 2.1
We shall refer to the original variables t, q i , . . . as the "old" variables, and to the variablest,q i , . . . as the "new" variables.
In the new variables, the left-and right-hand sides of equation (2) take the form

Equation (2) then simplifies to
as claimed in the lemma. Writing the total discounted profit in the new variables yields

Appendix B: Proof of Proposition 4.1
Second rescaling of the problem Recall the dynamic optimization problem: to maximize subject to the dynamic restrictionċ = (1 − φ(1 + β)k)c. This problem is rewritten by introducing constants and the variable u = k/K . It is then seen to be equivalent to the problem to maximize subject to the dynamic restrictionċ and the control restriction u ≥ 0. The Pontryagin function of this problem is which is maximized at This yields the Hamilton function If λ ≤ 0, the associated state-costate equations read aṡ whereas if λ > 0, they simplify tȯ For λ ≤ 0, relation (32) defines a variable transformation that puts the system into state-control forṁ Note that this system is only valid for u ≥ 0, as for λ > 0, the relation between u and λ fails to be one-to-one. For later use, we note that in (c, u) variables, the Hamilton function takes the form

B.1 Steady States
To determine the steady states of the state-control system (36), we solve the equationṡ c = 0,u = 0. It is immediate that this system has no solutions if c > 1. If 0 ≤ c ≤ 1, the equationċ = 0 is satisfied if c = 0 or u = 1. Substitution intou = 0 of the former yields the steady state (c, u) = (0, 0). Substitution of the latter leads to the quadratic equation c 2 − c + 1/(4μ) = 0, which can be written as Note that D < 1 4 , as all parameters are assumed to have positive values. For D > 0, the quadratic equation has two real solutions both satisfying 0 < c ± < 1; for D = 0, there is a single real solution c = 1/2, while for D < 0, there is no real solution. Summarizing, in the region 0 ≤ c ≤ 1, there is always the steady state (c, u) = e 0 = (0, 0). If D = 0, there is the additional steady state and if D ≥ 0, there are These are all the steady states of the state-control system (36).

B.2 Stability
To analyze stability, we have to determine the eigenvalues of at the steady states e 0 , e + and e − . As which has eigenvalues ρ and 1, the point e 0 is always an unstable node. Denote the eigenvalues of the matrix by λ i ± , i = 1, 2. They satisfy λ 1 ± + λ 2 ± = trace D F(e ± ) = ρ and λ 1 ± λ 2 ± = det D F(e ± ) = ±8ρμc ± √ D. We have seen before that c ± > 0 whenever it is real. If D > 0, it follows that the eigenvalues λ 1 − , λ 2 − have opposite sign, and e − is a saddle, whereas λ 1 + and λ 2 + have the same sign and positive sum, implying that e + is an unstable node.
Expressing these results in the original variables, we obtain the results announced in the proposition.

B.3 Bifurcation Analysis
It remains to prove the occurrence of a saddle-node bifurcation. If μ = μ b = 1, then D = 0 and the point e b = (c b , u b ) = (1/2, 1) is a steady state with eigenvalues 0 and ρ, respectively.
We use a result from Sotomayor [50] (quoted as Theorem 3.4.1 in Guckenheimer and Holmes [51]), which for planar dynamical systems states that if the familẏ parametrized by μ satisfies the following three conditions A. D x F(x 0 ; μ 0 ) has a simple eigenvalue 0 with right eigenvector v and left eigenvector w; 0) and w T = (2, 1) are respectively left and right eigenvectors associated with the eigenvalue 0. Furthermore, We conclude that a non-degenerate saddle-node bifurcation occurs in the system at μ = 1. This completes the proof of Proposition 4.1.

Appendix C: Proof of Proposition 4.2
As in the proof of Proposition 4.1, introduce the constants as well as the rescaled control variable u = k/K . The state-control system then takes the formċ Recall also the notations for the three steady states of the system, and introduce e 1 = (1, 0). To prove the proposition, the state-control space is partitioned into four subsets, R 1 , R 2 , R 3 and E. Of these, the sets R 3 and E are independent of the values of the system parameters. They are given as R 3 = {(c, u) : 0 < c < 1, u = 0} and E = {(c, u) : c ≥ 1, u = 0}. Let U = {(c, u) : u > 0} be the upper half plane. Given the set R 1 , the set R 2 is equal to R 2 = U \R 1 .
It remains to specify R 1 , which is the first step in the proof. Then it is shown that no trajectory in either R 2 or R 3 can be optimal. The next step is to demonstrate that of the trajectories in R 1 , only those can be optimal which converge either to a steady state in R 1 , necessarily a saddle, or which end up in the "exit trajectory" E. Then it has to be shown that the trajectories that are not excluded up to this point, the candidate trajectories, "cover" the state space; that is, for every initial state c 0 , there is at least one candidate trajectory passing through the line c = c 0 . Using parts of the remaining candidate trajectories, we construct a viscosity solution of the Hamilton-Jacobi equation, which is then necessarily the value function. This shows the optimality of the remaining trajectories.
We claim that τ is finite. Arguing by contradiction, assume that τ = −∞. Then for all t < 0 we have c(t) > 1, and Eq. (42) implies that for all t < 0 we have u(t) = u 0 e ρt . In particular, there is a t 1 < 0 such that u(t) < u 0 e ρt 1 =: K 1 < 1 for all t < t 1 . But for those values of t, it follows thatċ But for t sufficiently small, this is smaller than 1, contradicting the hypothesis that τ = −∞. Hence τ is finite.
Introduce u τ by the equation γ (τ ) = (1, u τ ). The set R 1 is defined as follows: it is the open region bounded by the concatenation of the curve γ taken between t = 0 and t = τ , connecting (1, u 0 ) and (1, u τ ), the vertical line segment connecting (1, u τ ) to e 1 , the horizontal segment connecting e 1 to e 0 , the vertical segment connecting e 0 to (0, u 0 ), and the horizontal segment connecting (0, u 0 ) to (1, u 0 ). See Fig. 10 for the possible shapes of R 1 .

C.2 Trajectories in R 2 Cannot Be Optimal
In the second step of the proof, the transversality condition is used to show that any trajectory that passes through points in R 2 cannot be optimal. Beginning with R 2 , we note that the subset of R 2 is a forward trapping region: once a trajectory of (42) is inside R Actually, we can make the sharper statement that if u > u 0 , theṅ To show that no trajectory that enters R 2 can be maximizing, pick an arbitrary trajectory γ such that γ (t 0 ) ∈ R (1) 2 at a given time t 0 . By the Poincaré-Bendixon theorem, γ (t) is either unbounded, or its ω-limit set is a steady state, or a limit cycle. The latter possibility is excluded, as the state-costate system, which is in one-to-one relation with the state-control system, has constant positive divergence everywhere (see [47]). There are no steady states in R (1) 2 . Hence there is a sequence t 0 , t 1 , . . . such that |γ (t i )| → ∞. In particular, there ist > t 0 such that u(t) > 2u 0 . But then u(t) is monotonely increasing toward infinity as t >t, as a consequence of (43).
Consequently, if t ≥t, thenċ ≤ (1 − 2u 0 ) c ≤ −c. By Gronwall's lemma it follows that Likewise, if t ≥t, then u(t) > 2u 0 andu ≥ ρ(u − μ). Gronwall's lemma implies then that If the trajectory γ (t) = (c(t), u(t)) is optimal, then by the Hamilton-Jacobi equation (see, e.g., Wagener [47]), the total profit Π takes the value Michel's transversality condition (Michel [52]) states that along a maximizing trajectory the relation lim t→∞ Π(c(t))e −ρt = 0 holds. Combining (46) and (37) yields Using that the first term between brackets is always nonnegative, and taking into account (45) yields that As 2u 0 − μ ≥ μ > 0, it follows that the right hand side of this inequality tends to infinity as t → ∞. But then lim t→∞ Π(c(t))e −ρt = ∞, and γ cannot be a maximizing trajectory. It remains to show that no trajectory passing through R (2) 2 = R 2 \R (1) 2 , the complement of R (1) 2 in R 2 , can be optimal. Consider therefore a trajectory γ such that γ (t 0 ) ∈ R (2) 2 for some t 0 . As in the definition of the region R 1 , using Gronwall's lemma it can be shown that there is some t 1 > t 0 such that u(t 1 ) > 1, and some t 2 > t 1 such that u(t 2 ) > 1 and c(t 2 ) = 1. But then γ enters the trapping region R (1) 2 , and we have already seen that such trajectories cannot be optimal.

C.3 Trajectories Intersecting R 3 Cannot Be Optimal
If a trajectory intersects R 3 , the state-control representation breaks down, and we have to switch to the state-costate representation.

C.4 Trajectories in R 1 with Wrong Limit Behavior Cannot Be Optimal
As the set R 1 is bounded, by the Poincaré-Bendixon theorem trajectories in R 1 can either converge to a steady state or leave R 1 (cf. the argument in Sect. C.2). Those entering either R 2 or R 3 have already been shown to be suboptimal. The remaining possibility is to leave R 1 through the point e 1 and enter the line segment E; these trajectories remain candidates for optimality. Trajectories remaining in R 1 have to converge to a steady state. From proposition 4.1, we learn that e 0 and e + are unstable nodes, to which no trajectory can converge as t → ∞. The only remaining candidate is then the saddle e − , if μ > 1, or the bifurcating point e b if μ = 1.
This completes the proof of Proposition 4.2.
Again we have to distinguish between the situations μ < 1 and μ ≥ 1.

D.1.1 No Steady States in R 1
If μ < 1, the only steady state of (42) is the origin e 0 , which is an unstable node. Therefore, the only candidate optimizer is the trajectory γ (t) passing through the point γ (0) = e 1 ; see Fig. 12a. Note that a corollary of the analysis performed above is that the set R 1 is a backward trapping region: if a trajectory is in R 1 for some time, it is in R 1 for all previous times, and it necessarily converges to the origin as t → −∞.
Write γ (t) = (c γ (t), u γ (t)). The fact that γ (t) ∈ R 1 for all t < 0 implies thaṫ c γ > 0 for all t < 0-recall that R 1 is open. Moreover, as u(t) = 0 for all t ≥ 0, it follows thatċ γ > 0 for all t as well, and that the map c γ : Then the image of the curve γ : R → R 2 is equal to the graph of the function for all t.

D.1.2 μ > 1: Construction of the Region S 1
If μ > 1, though R 1 is still a backward trapping region, there are at least two steady states in R 1 : apart from the origin e 0 , which is in the boundary of R 1 , we have e − and e + in the interior of R 1 . As seen before, if D > 0, the first is a saddle and the second a repeller; if D = 0, and hence μ = 1, these two points coincide in e b . Introduce the curve segments δ i , i = 1, . . . , 4, as follows: δ 1 is the part of the parabola u = 4μc(1 − c) connecting e 0 to e − , δ 2 the segment of the line u = 1 connecting e − to e + , δ 3 that part of the same parabola connecting e + to e 1 , and δ 4 the segment of the line u = 0 connecting e 1 to e 0 . All curves δ i are taken without their endpoints. Let finally S 1 ⊂ R 1 be the open subregion of R 1 that is bounded by the curves δ i , i = 1, . . . , 4. See Fig. 11.
Let, as before, γ (t) = (c(t), u(t)) be the trajectory of (42) satisfying γ (0) = e 1 . As the open set S 1 is bounded, the trajectory γ either converges to a steady state on the boundary of S 1 , or it enters S 1 for the last time by crossing one of the curves δ i . We analyze the possibilities one by one.

D.1.3 Invoking the Poincaré-Bendixon Theorem
We classify the possible limit behavior of the trajectory γ (t) that satisfies γ (0) = e 1 as t → −∞. The region R 1 being a bounded backward trapping region, γ (t) ∈ R 1 for all t < 0. The Poincaré-Bendixon theorem (cf. [53], p. 29) asserts that, asymptotically, γ (t) converges either to a steady state, a limit cycle, or a heteroclinic cycle. Since the statecontrol system (36) is diffeomorphic to the state-costate system (34), and since the latter has positive divergence everywhere, the existence of limit cycles or heteroclinic cycles is ruled out (cf. [47]). Therefore γ (t) can converge either to e 0 , or e − , or e + , as t → −∞.
Looking more precisely at the behavior of γ (t) in S 1 , we claim that either of the following possibilities holds: A. γ (t) ∈ S 1 for all t and γ (t) → e − as t → −∞; B. γ (t) ∈ S 1 for all t and γ (t) → e + as t → −∞; C. there is a largest value t 1 of t such that γ (t) ∈ S 1 and γ (t 1 ) ∈ δ 1 ; D. there is a largest value t 1 of t such that γ (t) ∈ S 1 and γ (t 1 ) ∈ δ 2 . This is equivalent to stating that if γ (t) ∈ S 1 for all t, it cannot tend to e 0 as t → −∞.
We proceed by analyzing these situations one by one.

D.1.4 The Trajectory γ Remains in S 1 and Tends to e −
Reasoning as in the situation D < 0, we obtain a policy function u (1) This function is however not defined for all c > 0. To construct a policy function for 0 < c < c − , we take a trajectory γ s on the left half of the stable manifold of the saddle e − .
We claim that this part of the stable manifold is contained in its entirety in the region S 2 that is bounded by δ 1 , the segment of u = 1 connecting e − to the point (0, 1), and the segment of the line c = 0 connecting the point (0, 1) to e 0 . It is straightforward to show that S 2 is a backward trapping region; consequently, every trajectory in S 2 converges to the unstable node e 0 as t → −∞.
The stable manifold of e − is tangent to the stable eigenspace of In particular, if c 0 < c − is sufficiently close to c − , then dw s /dc > 0 for all c ∈ [c 0 , c − ].
As in S 2 , we haveċ > 0 everywhere, and we construct as above a policy function u (2) f : ]0, c − [→ R, with lim c↑c − u (2) f (c) = u − = 1. It follows that the function is a continuous policy function that is defined for all c > 0.

D.1.5 The Trajectory γ Remains in S 1 and Tends to e +
As before, we can construct a policy function u (1) f : ]c + , ∞[ → R, with lim c↓c + u (1) f (c) = u + = 1. The remaining part of the policy function has to be furnished by the stable manifold of e − .
As above, the left half of this stable manifold furnishes a policy function u (2) f : ]0, c − [ → R, with lim c↑c − u (2) f (c) = u − = 1. We turn to the right half of the stable manifold. For values of c 0 larger than but close to c − , the point (c 0 , u 0 ) = (c 0 , w s (c 0 )) on the stable manifold is contained in the bounded open region S 3 that is bounded by the line u = 1 and the parabola u = 4μc(1 − c). In this region,ċ < 0 andk < 0. Fix (c 0 , u 0 ) and consider the trajectory γ of (42) such that γ (0) = (c 0 , u 0 ). This trajectory enters S 3 through the part of the parabola connecting its vertex (1/2, μ) with the point e + . It enters from the region S 4 that is bounded by that same part of the parabola, the line u = u + and the boundary of R 1 . In that region,ċ < 0, butk > 0. It follows that the trajectory has to enter S 4 through the line segment of c = c + connecting e + and (c + , μ), or through one of the endpoints.
If γ (t) → e + as t → −∞, then its graph defines a policy function u A continuous policy function is then given by Otherwise, there is a time t 1 < 0 such that c(t 1 ) = c + and u(t 1 ) > u + . As in this case γ (t) does not tend to a steady state in the boundary of S 4 , it has to enter S 4 for some t 2 < t 1 ; the only possibility for this is through the line u = 1. We therefore have For fixed values of c, the function H control (c, u) is minimal at u = 1. Hence the policy u (3) f is superior to u (1) f at c = c + , in the sense that Π (3) (c + ) > Π (1) (c + ), since u (1) f (c + ) = 1. In the same manner, it follows that Π (3) As both functions are continuous, there is a value c =ĉ such that Π (3) . This is an indifference point, as the manager is indifferent between two policies at this state. A policy function, which is at one point two valued, is then given by As the instantaneous profit π = α(1 − c) 2 − k 2 is bounded, it is not hard to show that if the initial condition is larger than some valuec, the optimal action is to set k = 0, which results in Π(c) = 0 for c >c. We conclude then that Π(c) is Lipschitz continuous everywhere. The next situation to be investigated is that the trajectory γ satisfying γ (0) = e 1 enters S 1 through δ 1 at some time t 1 < 0, and remains in S 1 for all t 1 < t < 0: see Fig. 12c.
Since γ (t 1 ) ∈ δ 1 , it follows that γ (t) ∈ S 2 for all t close to, but smaller than t 1 . As S 2 is a backward trapping region, γ (t) ∈ S 2 for all t < t 1 , and necessarily γ (t) converges to e 0 as t → −∞. Moreover,ċ > 0 in both S 1 and S 2 , and we can construct a policy function that is differentiable for all c > 0 exactly as in the situation that the trajectory remains in S 1 for t < 0 and converges to e 0 .

D.1.7 The Trajectory γ Enters S 1 for the Last Time Through δ 2
Finally, consider the situation that the trajectory γ that passes through e 1 at t = 0 enters S 1 through δ 2 for some t 1 < 0, and remains in S 1 for all t 1 < t < 0: see Fig. 12d.
Introduce c m by setting γ (t 1 ) = (c m , 1). Since γ (t 1 ) ∈ δ 2 , we have c − < c m < c + . Asċ(t) > 0 for t 1 < t < 0 as well as for t ≥ 0, we can construct a continuous policy function u (1) (1, 0). That is, e b is a stable node. The trajectory γ (t) for which γ (0) = e 1 can either cross the parabola u = 4c(1−c) for a value 0 < c < 1 2 or it can remain in R 1 for all t < 0 and tend to e b . In the first situation, a policy function can be constructed exactly as in the situation that μ > 1 and γ intersects δ 1 ; see Sect. D.1.6.
In the second situation, γ coincides with the unstable manifold of e b . There is a unique trajectoryγ connecting e 0 with e b , which is the limit of similar trajectories that connect e 0 with the stable manifold of e − for μ > 1. Asγ lies entirely in S 2 , we haveċ(t) > 0 there. Proceeding as usual, we can find a continuous policy function u f (c) such that its graph coincides for 0 < c < 1 2 with the curveγ , while for c > 1 2 , it coincides with γ . Note that u f (c) is continuously differentiable everywhere except at c = 0 and c = 1 2 .

D.1.9 Summary
For all parameters, we have constructed a policy function u f : ]0, ∞[ → R, which is single valued except at most at one pointĉ, the indifference point. Moreover, the values of the two trajectories originating at an indifference point are the same, and u f (c) = 0 if c is sufficiently large.

D.2 Policy Functions Generate Viscosity Solutions of the Hamilton-Jacobi Equation
Using relation (46), we have that is well defined at c =ĉ, continuous and continuously differentiable at all points c > 0 exceptĉ. Moreover, the value of the total profit (30) along a trajectory γ of the state-control system (36) We shall argue that V (c) is a viscosity solution-a term which we shall define shortly-of the Hamilton-Jacobi equation of our optimization problem, that the value function is another viscosity solution and that viscosity solutions of our problem are unique. From this argument, it follows that V (c) is the value function of our problem and that u(t) = u f (c(t)) is the optimizing control.

D.2.1 Viscosity Solutions
We quote the definition of viscosity sub-and supersolutions from Bardi where DW (x) is the gradient of W in x.
Hence V is a viscosity supersolution.
Consider now the situation that v is a continuously differentiable function such that V − v takes a local maximum atĉ.
which is a contradiction. There is no differentiable function such that V − v takes a local minimum; but then for all such functions, the inequality (49) holds atĉ, and V is a viscosity subsolution, and therefore a viscosity solution.

D.2.3 Uniqueness of Viscosity Solutions
We shall state a theorem that is a direct corollary of Theorem III.2.12 from Bardi and Capuzzo-Dolcetta to show that the value function V is the unique viscosity solution of the Hamilton-Jacobi equation (53).
The theorem depends on four assumptions. The first three concern the controlled dynamicsẏ = f (y, a). Proof The assumptions imply assumptions A 0 -A 4 in Chapter III of Bardi and Capuzzo-Dolcetta [54]. When formulating the problem as a minimization problem, is replaced by − , v by −v and λ by −λ; this shows that the definition of Hamiltonian in statement of the theorem is equivalent to that in equation (2.9) in Chapter III of Bardi and Capuzzo-Dolcetta. The result follows then from their Theorem III.2.12. As u f (c) < M, this control is admissible. We have already shown that the function V (c) is a viscosity solution of the Hamilton-Jacobi equation; it therefore coincides with the value function, which by the theorem is the unique viscosity solution. But that implies that the controls generated by the policy function are optimal.

D.2.4 Finishing the Proof
= ρ (u 2 − u 1 u 2 − 4c(1 − c)χ (μ 2 − u 1 μ 2 )) Consider first the situation that there is a value 0 ≤c ≤ 1 such that for all c ∈ ]c, 1] the optimal trajectories for both the partial and the full collusion case leave the production region through e 1 . We know that trajectories through e 1 can be optimal only if they have not crossed the line u = 1 yet: this is a consequence of the argument given in Sect. D.1.7. The term b of (56) therefore satisfies μ < 0 forc < c < 1. If c 0 = 1, then u(c 0 ) = 0, and the first term in (56) vanishes. Then b(c) ≤ 0 implies that u(c) is decreasing. This implies for values of c smaller than 1 that u(c) is positive, in particular u(c) > 0 for allc < c ≤ 1. Hence, R&D effort under full collusion is always larger than R&D effort under partial collusion if both lead to eventually leaving the market. Next, we consider the situation that there is somec > 0, such that for all c ∈ ]0,c[, the optimal trajectories for both the partial and the full collusion case converge to their respective steady states e 1 − = (c 1 − , 1) and e 2 − = (c 2 − , 1). As μ 2 > μ 1 , it follows that 0 < c 2 − < c 1 − ≤ 1/2. The stable manifold tending to e 2 − can only leave the region bounded by the parabola u = μ 2 c(1 − c) and the lines u = 1 and c = 1/2 through the line segment connecting the points (1/2, 1) with (1/2, μ 2 ). It follows that necessarily u 2 (c 1 − ) > u 1 (c 1 − ), or, equivalently, that u(c 1 − ) > 0. We have already established that trajectories tending to either e 1 − or e 2 − can only be optimal if they do not cross the line u = 1. Therefore if 0 < c <c, and the variations of constants formula implies u(c) > 0 for all c 1 − ≤ c <c. Moreover u 1 (c) < 1 if 0 < c < c 1 − , implying that b(c) < 0 there. Again using the variations of constants formula, we obtain u(c) > 0 for all 0 < c ≤ c 1 − as well. Finally, if the optimal trajectory of the full collusion case converges to e 2 − , whereas the optimal trajectory of the partial collusion case exits the production region through e 1 , we have that the former satisfies u ≥ 1 and the latter u ≤ 1.

F.2 Proof of Proposition 5.3
To prove Proposition 5.3, we again use the fact that the value of the integral Π over a trajectory starting at a point (c, u) equals If c =ĉ is an indifference point, there are valuesû (1) <û (2) such that the trajectories starting at (ĉ,û (i) ), for i = 1, 2, are both optimal and have both the same value. Note that the trajectory through (ĉ,û (1) ) goes to the right, and that through (ĉ,û (2) ) goes to the left. As Π ĉ,û (1) = Π ĉ,û (2) , Consider a fixed value of ρ and two values μ 1 , μ 2 of μ such that μ 2 = (9/8)μ 1 ; that is, (μ 1 , ρ) describes a partial collusion situation, and (μ 2 , ρ) is the corresponding full collusion situation. Assume first that there is an indifference point in the partial collusion problem; denote this point asĉ 1 , and the corresponding values of u asû We have seen in the proof of Proposition 5.2 that if a partial collusion and a full collusion trajectory either go both to e 1 or to e 1 − and e 2 − , respectively, the full collusion trajectory intersects a line c = constant at a larger u-value than the partial collusion one. Denote the intersection of the full collusion trajectory going to e 2 − with the line c =ĉ 1 by (ĉ 1 ,û (2) 2 ). We have thatû (2) 2 >û (2) 1 , and therefore also û (2) We argue by contradiction. Assume that the thresholdĉ 2 in the full collusion case exists and is below the threshold in the partial collusion case, then the full collusion trajectory going right, that is, to e 1 , has to intersect the line c =ĉ 1 in a point (ĉ 1 ,û 2 ). Moreover, this trajectory has to be optimal atĉ 1 . Using (57), this implies that Finally, the full collusion trajectory has to be above the partial collusion trajectory going to e 1 , implying û (1) Combining inequalities (58)-(61) implies But this is a contradiction. The proof in situation that the threshold is a repeller is similar and will be omitted.