Numerical Analysis of Markov-Perfect Equilibria with Multiple Stable Steady States: A Duopoly Application with Innovative Firms

This paper presents a numerical method for the characterization of Markov-perfect equilibria of symmetric differential games exhibiting coexisting stable steady states. The method relying on the calculation of ‘local value functions’ through collocation in overlapping parts of the state space, is applicable for games with multiple state variables. It is applied to analyze a piecewise deterministic game capturing the dynamic competition between two oligopolistic firms, which are active in an established market and invest in R&D. Both R&D investment and an evolving public knowledge stock positively influence a breakthrough probability, where the breakthrough generates the option to introduce an innovative product on the market. Additionally, firms engage in activities influencing the appeal of the established and new product to consumers. Markov-perfect equilibrium profiles are numerically determined for different parameter settings and it is shown that for certain constellations the new product is introduced with probability one if the initial strength of the established market is below a threshold, which depends on the initial level of public knowledge. In case, the initial strength of the established market is above this threshold, and the R&D effort of both firms quickly goes to zero and with a high probability the new product is never introduced. Furthermore, it is shown that after the introduction of the new product the innovator engages in activities weakening the established market, although it is still producing positive quantities of that product.


Introduction
Since the seminal contributions of Sethi [19], Skiba [20] and Dechert and Nishimura [9], it has been shown that rational planning over an infinite planning horizon can go along with outcomes that crucially depend on initial conditions. Such outcomes are denoted as history-dependent solutions, and the basins of attraction of the different long-run equilibria are separated by Skiba thresholds. Apart from environmental and resource economics, where such path dependencies have received a lot of attention (e.g., [22]), Skiba phenomena have also been identified in different dynamic industrial organization problems (e.g., [3,4,17]). Due to the nonlinear structure underlying such problems with multiple long-run equilibria, which are mostly represented by stable steady states, (partial) analytical characterizations of optimal dynamics in Skiba scenarios can typically be given only for dynamic optimization models with one-dimensional state spaces. Therefore, different numerical approaches have been developed to characterize optimal solutions in problems with multiple stable steady states. These methods rely on the numerical calculation of stable paths in the state-costate system derived from the Maximum Principle [13,15], nonlinear model predictive control [14], or collocation [4].
In spite of this rich literature, relatively little attention has so far been paid to Skiba phenomena arising in dynamic models with multiple decision makers, where strategic interactions occur. Dechert and O'Donnellv [10] show path dependency of the state trajectory in a Markov-perfect equilibrium of a stochastic discrete time shallow lake game. Dockner and Wagener [12] obtain a similar insight in a deterministic and continuous time version of the game. Both of these contributions deal with games with one state variable. To the best of our knowledge, for no differential game with more than one state variable has a Markov-perfect equilibrium exhibiting multiple locally stable steady states been calculated.
The aim of this paper is to fill this gap. The difficulty is that numerical methods relying on the canonical system derived from the Maximum Principle typically cannot be used to characterize Markov-perfect equilibria, since knowledge about the derivatives of the feedback strategies is needed to formulate the costate equations. Instead, we employ Hamilton-Jacobi-Bellman equations to characterize the value functions of the players and the feedback strategies. The technical challenge in this respect is that in scenarios with coexisting stable steady states the feedback strategies of the players typically exhibit jumps, which result in kinks in the value functions so that they are not everywhere differentiable. Standard collocation methods based on polynomial approximations of the value functions are not able to capture such kinks. Therefore, we propose an adaptation of a collocation algorithm, where 'local value functions' on parts of the state space are generated and the actual value function of the game is found as the upper envelope of these local value functions.
We use our method to solve a problem in the area of Industrial Organization where incumbent firms have an option to innovate. In particular, we consider a duopoly in which the two firms compete on a homogenous established product market, while at the same time they are involved in an innovation race. The one who obtains the innovation breakthrough first gains the opportunity to introduce a new product on the market. The new product is a strategic substitute of the established one, and it is better than the old one. Due to, e.g., patent protection, the innovation laggard will never innovate, so it will keep on operating just on the established product market.
It is an investment problem where initially the firms can invest in means (e.g., public relations, lobbying) to increase the reservation price of the established product. At the same time, they can invest in R&D to raise the innovation probability. R&D investments of both firms have an additional effect in that they increase a public knowledge stock, which in turn has a positive effect on the innovation hazard rate. After the innovation breakthrough, the winner of the innovation race can invest to increase the reservation price of the new product.
As such, this technically leads to a multi-mode differential game (see Dockner et al. [11]). Mode 1 describes the situation before the innovation breakthrough where we have two state variables, namely the reservation price of the established product and the public knowledge stock. Mode 1 passes into mode 2 at the uncertain point in time where firm 1 wins the innovation race. The two state variables in mode 2 are the reservation prices of the established and the new product. In case, firm 2 wins the race, mode 1 is followed by mode 3, which has the same state variables as mode 2 with the difference that now firm 2 controls the development of the reservation price of the new product.
Essentially, three different solutions will prevail. First, for high R&D costs, either innovation will not occur, or the innovation probability will be very low in the extraordinary case of a very high initial public knowledge stock. Second, in the opposite case, thus where R&D costs are low, either one of the firms will eventually innovate with probability one. Third, in case R&D costs are neither low nor high, technically the most interesting situation arises where history-dependent equilibria occur. In a state plane with the established product reservation price on the horizontal axis and the public knowledge stock on the vertical axis, an upwardsloping Skiba curve separates the initial situations leading to different solutions. Below the Skiba curve, the initial knowledge stock is too low to guarantee a profitable innovation policy. Therefore, the firms refrain from innovating and the resulting trajectory converges to a steady state where nobody innovates and both firms are active on the established product market. Above the Skiba curve, the firms are engaged in an innovation race where the resulting trajectory converges to a steady state with a relatively high public knowledge stock. Since both firms keep on investing in R&D, with probability one either one of the firms will eventually innovate.
If we review the existing literature from an application point of view, thus considering dynamic innovative duopolies, two papers have to be mentioned. First, we note that Breton et al. [2] also analyze a setup with a public knowledge stock. Their result is similar to our Skiba solution in that initial public knowledge should be high enough to guarantee that the innovation will be obtained in the long run. The difference is that in Breton et al. [2] the R&D decision is binary, they consider process innovation instead of product innovation, and they have a discrete-time model, whereas our model is in continuous time. Employing a piecewise deterministic dynamic oligopoly game like the present paper, is also done in Haurie and Roche [16]. However, where we consider Markov-perfect equilibria, they develop Open-loop Nash equilibria with jumps that relate to market size, rather than designing a new product as in our case. An important observation at this point is that, to our knowledge, our paper is the first to develop history-dependent equilibria in a piecewise deterministic game.
The paper is organized as follows. Section 2 develops the model, whereas Sect. 3 describes the derivation of the Markov-perfect equilibria. Section 4 provides a detailed account of the numerical approach used in the paper. Solutions with a unique steady state, thus where either R&D investment costs are high or low, are discussed in Sect. 5. The conceptual and numerical challenges associated with the Skiba scenario are treated in Sects. 6 and 7 presents the economic insights obtained using the methodology developed in the previous section. Section 8 concludes.

The Model
We consider the interaction of two firms producing a homogeneous established product based on an old technology. The inverse demand for the established product at any time t is given by where q f o (t) is the output of firm f = 1, 2 of the old product. The reservation price α o (t), which is influenced by the public acceptance of the old product and its production technology as well as by potential market obstacles due to regulatory measures, evolves over time with At the same time both firms engage in R&D investments I f r ≥ 0, f = 1, 2 in order to develop a new product based on a cleaner technology. For simplicity, it is assumed that the firm which develops the new product first will be able to prevent the competing firm from entering that market due to patent protection or other technological reasons. The hazard rate of firm f is given by where z(t) is the public knowledge stock. This formulation of the hazard rate captures that the firms need to invest in own R&D in order to transform the publicly available knowledge into an innovation breakthrough. Furthermore, we assume that there are spillovers from the firms' R&D investment to the public knowledge stock. Hence, the dynamics of the knowledge stock readsż wherez would be the stationary level of the public knowledge stock in the absence of firms' R&D.
Once one of the firms has reached the innovation breakthrough it is able to offer a new cleaner product, which is horizontally differentiated from the old product and has a reservation price α n . The inverse demand system now reads where it is assumed that firm f ∈ {1, 2} is the innovator and η ∈ (−1, 1) denotes the horizontal differentiation parameter. For positive values of η the established and the new product are substitutes and in what follows we will focus on such a scenario. Similar to the established market, also the reservation price of the new product evolves over time and can be influenced by activities of the firm producing the new product, denoted by I in ∈ R. In particular, we haveα and α n (0) = α ini n <α n . The new product is assumed to be based on a more appealing and cleaner technology, which implies that in the absence of any firm activities in the long run the reservation price for the new product would be higher than that of the established product. Hence, we assumeα n >α o . Marginal production costs are assumed to be constant, symmetric across firms and identical for both products. To save notation, we normalize them to zero.
We model the interaction between the firms as a multi-mode differential game (see e.g., Dockner et al. [11] or Dawid et al. [8]). The initial mode m 1 denotes the time period before the new product is introduced, whereas the mode m 2 corresponds to scenario where firm 1 has innovated (and firm 2 therefore will stick to the old product forever) and mode m 3 to the analogous case with firm 2 as the innovator. In each mode quantities are determined by Cournot competition and straightforward calculations yield for mode m 1 for quantities and market profits. Furthermore, in mode m 2 we obtain, under the assumption that all quantities are positive, Prices are obtained by inserting these expressions into (3) and the market profits read π m 2 1 = q m 2 1o p o q m 2 1o , q m 2 2o , q m 2 1n +q m 2 in p n q m 2 1o , q m 2 2o , q m 2 1n , π m 2 2 = q m 2 2o p o q m 2 1o , q m 2 2o , q m 2 1n . (7) Analogous results arise in mode m 3 with firm 2 as the innovator. Concerning the costs of the different firm activities, it should first be noted that R&D investments are always nonnegative, whereas the activities I f o , I f n influencing the general acceptance of a product could in principle be negative, in particular, for a firm that is active on both markets. Hence, we assume quadratic costs for such activities, where cost functions are symmetric across firms, i.e., ξ h = ν h 2 I 2 f h with h ∈ {o, n}, whereas R&D costs are of the form ξ r = μ r I rh + ν r 2 I 2 f r . In this formulation, for simplicity, the parameters associated with the linear lobbying cost are normalized to zero. Whereas this normalization does not affect the qualitative properties of the solution, such a normalization is not possible for the R&D costs. If R&D costs are purely quadratic then optimal R&D investments are always positive, which precludes the existence of a steady state with zero hazard rate. This would be a substantial alteration to the model, of which the potential existence of such a steady state is an important characteristic.
The instantaneous payoff of firm f in the different modes is therefore given by The objective of firm f is to maximize its infinite horizon expected discounted payoff under discount rate r > 0, subject to the state dynamics (1), (2) and (4), the mode process m(t) as well as the control constraints I f r ≥ 0, f = 1, 2 and These constraints capture that a firm can influence the reservation price of a product only if it is active on the corresponding market.

Characterization of Equilibria
In order to analyze the investment behavior of the firms in this model, we consider stationary Markov-perfect equilibria (MPE) of the game. A stationary Markovian strategy of firm f is given by a triple (φ f o , φ f r , φ f n ), where each of these feedback strategies has the form The upper bounds of the state variables,ᾱ o ,z,ᾱ n are assumed to be sufficiently large to ensure that the stable steady states characterized in the following analysis are interior. Although we write the feedback in this general form, clearly some components are irrelevant in some modes. In particular, due to (8), φ f n = 0 has to hold in mode m 1 and for the non-innovator also in mode m 2 (respectively, m 3 ). Furthermore, φ f r = 0 in modes m 2 and m 3 since no more innovations are possible. Also, the state variable α n is constant in mode m 1 and the knowledge stock z irrelevant in modes m 2 and m 3 . To ease notation in what follows we will drop these arguments in the corresponding modes. In accordance with the literature (see Dockner et al. [11]), we consider only non-anticipating strategies, i.e., strategies where firms cannot condition their actions on realizations on the time of mode transitions which lie in the future.

Post-Innovation Phase
In order to characterize the equilibrium strategy profile and the induced dynamics, we start by considering modes m 2 and m 3 . Since these modes are structurally symmetric, we can restrict attention to m 2 . No transition out of this mode is possible, which means that the firms are essentially engaged in an infinite horizon time-autonomous deterministic differential game. Taking into account that the state z is irrelevant in m 2 , the value functions of the two firms can therefore be written as V m 2 f (α o , α n ). The Hamilton-Jacobi-Bellman (HJB) equations for the two firms read Maximization of the right hand sides of these two HJB equations gives The interpretation of these terms is standard and straightforward. Due to the quadratic cost functions, the optimal effort in changing the reservation prices is proportional to the marginal value of such a change in terms of future discounted profits for the considered firm. Taking into account (7), it is easy to see that the game in mode m 2 has a linear-quadratic structure and we therefore consider value functions that are quadratic polynomials of the states, i.e., Inserting this expression into (11) and the HJB Eqs. (9) and (10), and comparing the coefficients of the different terms on the left and the right hand side of the HJBs, generates a system of 12 nonlinear equations for the 12 unknown parameters in the value function. This system can be easily solved numerically using standard algorithms like a Newton method. Similar to the case of capital accumulation games with asymmetric product ranges (see Dawid et al. [7]), typically there are multiple solutions among which only a single one induces dynamics with a single steady state and therefore implies that the transversality conditions for the firms' optimization problems are satisfied. The equilibrium dynamics in mode m 2 is then characterized by a unique stable steady state and in what follows we will only consider parameter constellations where both firms produce positive quantities of all the products they are offering in that steady state. The analysis of mode m 3 is analogous to that of mode m 2 with the roles of the two firms inverted.

Pre-Innovation Phase
In mode m 1 the two firms are competing to generate the breakthrough to the new product and technology. Hence, in this mode the situation of the two firms is symmetric and we will restrict attention to symmetric equilibria where In such a scenario, the two firms also have symmetric value functions and denoting this value function by V m 1 1 = V m 1 2 = V m1 we obtain the following HJB equation in mode m 1 : The feedback functions in mode m 1 , therefore, are given by The above expression for φ r highlights that the optimal level of R&D investment is determined by three effects. First, R&D carried out by the firm increases the public knowledge stock thereby influencing the future value of the game for the firm. Second, own R&D positively affects the probability of a breakthrough at time t, which would make the firm monopolist thereby inducing a jump in the value function of . Third, investment is (negatively) influenced by the R&D cost parameters ν r and μ r .

Numerical Method
Inserting the optimal feedback strategies into (12) yields after minor transformations the following formulation of the HJB equation, which is the basis for the numerical characterization of the equilibrium outcome: This is a nonlinear partial differential equation for V m 1 , and it is easy to see that contrary to the situation in mode m 2 the general form of the solution of this equation cannot be guessed and a closed form solution seems infeasible. Hence, we rely on a numerical collocation method to calculate an approximate solution for (12), see e.g., Dawid et al. [4] or Vedenov and Miranda [21]. In particular, taking as state space where T i (x) denotes the i-th Chebyshev polynomial and n o , n z are parameters determining the number of basis functions. We denote the number of basis functions by n = n o n z . The value function is then approximated bŷ which can be rewritten in a more compact way asV Here and in what follows all vectors are column vectors, v T denotes the transpose of the vector v and · the matrix product.
In order to determine the weights c of the base functions in this approximation, we require thatV m1 satisfies the HJB Eq. (12) on a set of nodes 1 Overall, the requirement that (15) holds at all nodes in N generates n nonlinear equations, which are used to determine the weights c.
More formally, we first introduce the three matrices B, B α o and B z , which contain the values of all base functions and their partial derivatives at all nodes in N : . . , n z . Using this notation, the values of the feedback strategy at all nodes for a given vector of approximation weights c can be expressed as where . . , n and the maximum in the second line is applied to each entry of the matrix. Using these vectors, the firm's instantaneous payoff, the state dynamics and the discount rate, adjusted for the transition rate to mode m 2 , at all nodes in N can be directly calculated. We collect them in the vectors f , g o , g z , r , which are defined as follows We can now write the system of equations derived from the condition that (15) has to hold on all nodes in N , as Here, diag(v) denotes the diagonal matrix of vector v and I n the n-dimensional vector of 1s. It should be noted that I o as well as I r , and therefore also f , g o , g z and r depend on c, which implies that (18) is nonlinear with respect to c. In order to find a vector c satisfying (18), we use an iterative algorithm, with the properties that in each step only a linear system has to be solved and the fixed point of the iteration is a solution of (18). In particular, we consider a sequence of vectors c(s), s = 0, 1, . . . and adjust the notation developed above by denoting the vector of feedback controls which are calculated based on c(s) as I o (s), I r (s).
Analogously for the vectors f , g o , g z , r . In each step of the iteration, c(s) is determined as the solution to the following linear system of equations: with good convergence and approximation properties. For higher dimensional state spaces, the use of sparse grids methods, like Smolyak bases and nodes (e.g., [18]), is required in order to obtain acceptable approximations of the value functions in MPEs of similar games, see Dawid et al. [5].
Intuitively, in each iteration first, the feedback controls of the players at all nodes N , based on the previous approximation of the value function, are determined and then the value function which results from these strategies is calculated. Clearly, if the value function calculated in this way coincides with the approximation on which the determination of the control values was based then this value function satisfies (15) in all nodes in N . More formally, any fixed point c of (19) is a solution to (18). Overall, the numerical algorithm can be summarized as follows: (i) Choose n o , n z and calculate the nodes in N . Choose the stopping condition .  (13) and (14) using this approximation. (vii) Check (e.g., by a plot of the vector field) that the state dynamics resulting from these feedback functions is inward pointing at the entire boundary of the state space.
If the algorithm converges, the quality of the obtained solution is judged by considering the maximum of the relative deviation in (15) from zero, i.e., res = max where (α o , z) denotes the absolute value of the right hand side in the HJB Eq. (15) for V m 1 =V m 1 . Since the value function is in our game strictly positive on the entire state space, this indicator is well defined. For all results presented in this paper, we have < 10 −3 . The aim of the algorithm is to find (approximate) value and feedback functions which satisfy the HJB equation on the entire state space, whereas the algorithm itself only guarantees that the HJB equation holds (approximately) on the set of nodes N . Therefore, it is essential to check the residual of the HJB equation on the entire state space rather than only on N . A small residual on the entire state space ensures that the value function used by the firm to determine its feedback strategies is indeed very close to the expected discounted profit of the firm generated by the profile of feedback strategies used by both firms, no matter what the initial condition is. It should also be noted that if point (vii) in the description of the algorithm holds, then this implies that the transversality conditions of both players are satisfied.
In general, there is no guarantee for the convergence of this algorithm and the choice of the number of nodes and of the vector c(0) determining the initial approximation are crucial in this respect. Concerning the number of nodes, there is not only a tradeoff between the quality of the approximation and the computation time, but an extensive number of experiments also shows that the stability of the algorithm, in the sense that it converges for reasonably well chosen initial conditions, deteriorates as the number of nodes increases. Unfortunately at this point, there does not seem to be a rigorous mathematical analysis available, which characterizes the relationship between the value of n and the stability properties of the algorithm. In light of this tradeoff, the values n o = n z = 10 have been chosen for the numerical analyses in this paper, since this number of nodes generates solutions with very small residuals in the HJB equation and at the same time the stability properties of the algorithm are still very good for the model considered here. In principle, alternative approaches, like a Newton algorithm, could also be applied to find a solution of the nonlinear system of equations (18), however our numerical experiments indicate that for this class of problems the chosen approach has better stability properties, in a sense that convergence of the iterative algorithm can be obtained for a larger set of parameters and initial values c(0). The number of iterations needed till convergence varies strongly depending on the parameter setting and the initialization. Whereas in most of the scenarios considered below convergence was obtained in less than 10 iterations, also scenarios with approximately 100 iterations till convergence have been encountered.
With respect to the choice of c(0), the most crucial task is to come up with an initial vector which generates convergence for the default parameter setting(s). Analyses of the implications of variations in the parameters can then be carried out using a continuation method, in which the parameter under consideration is changed in small steps and in each step the previous solution is used as the initial value when the algorithm is applied to the new parameter constellation. However, a continuation method can also be applied to generate the initial value c(0) for the default parameter setting. In this respect, it is useful to first consider the problem for a variation of the default setting, in which the discount rate is replaced by a much larger value. This essentially static problem can be solved easily, and then the discount rate can be stepwise reduced until its default value is reached. Figure 1 illustrates the results generated with the method described in the previous section by showing the value function in mode m 1 , the equilibrium feedback function for R&D investment in mode m 1 , as well as the dynamics in the state space for a base parameter setting 2 and two different values of the R&D cost parameter μ r . Whereas from a theoretical perspective, there is no guarantee that the equilibria shown here are unique for each of the parameter constellations, and our numerical explorations (e.g., with alternative initializations c(0) of the algorithm) have not generated any alternative equilibria.

Scenarios with Unique Steady States in Mode m 1
Considering the R&D investment functions and the state dynamics, it can be clearly seen that qualitatively different equilibrium behavior emerges under these two levels of R&D costs. For low values of μ r (left panels) firms invest in R&D for almost all levels of the public knowledge stock and public R&D investment alone is sufficient to move the level of the knowledge stock into the regions where firms choose a positive level of R&D. Accordingly, the unique stable steady state under this MPE profile is characterized by positive R&D investments of both firms, which also implies a positive hazard rate. Hence, with probability one the innovation breakthrough will eventually be reached by one of the firms, and a transition to mode m 2 or m 3 occurs. A substantially different picture emerges for high levels of R&D costs (right panels). In such a setting, the minimal knowledge stock needed to induce positive R&D investments of firms is so large that it is never reached in the absence of firms' R&D. Furthermore, even if the initial stock would be above this threshold the firms R&D investment would be too small compared to the speed by which knowledge becomes obsolete such that the knowledge stock would still decrease. Hence, in the unique steady state both firms abstain from R&D investment, which implies that, although the public knowledge stock converges to a positive levelz, the hazard rates of both firms are zero. This means 2 The base parameter setting has been chosen in such a way such that the different assumptions made are  that, whenever the initial value of the knowledge stock is in the part of the state space where φ m 1 r = 0, with probability one the breakthrough is never reached and both firms stay only active on the established product market.
Comparing the value functions for the two values of μ r , one can make the at first sight the observation that for high values of the acceptance of the established product (α o ) and low levels of the public knowledge stock (z) an increase in the R&D cost parameter μ r induces an increase in the value function of the firms. For such values of the state variables, it is profitable for the firms to coordinate on not engaging in the R&D race. However, for low values of μ r it is rational for each firm to invest in R&D regardless of whether the competitor pursues the innovation breakthrough or not. A high value of the R&D cost parameter then allows the firms to coordinate in equilibrium on not investing, which increases their profit compared to a situation where both engage in R&D. The situation is different if the acceptance of the established product is low and the public knowledge stock is high. In such a scenario, firms gain substantially from introducing the new product and at the same time the hazard rates are relatively large, which implies that the expected duration of R&D expenditures till the breakthrough are relatively small. Here, the expected gain from innovation outweighs the costs and hence the value for the firms is larger if the R&D costs are so small that in equilibrium there is positive R&D investment and transition to modes m 2 or m 3 occurs with probability one. Figure 1 as well as the intuitive discussion above shows that the firms' incentive to invest in R&D is an increasing function of the public knowledge stock as well as a decreasing function of the acceptance of the old product. Considering now values of the R&D cost parameter between the two used in Fig. 1, one would expect that for some values of μ r firms have no incentive to invest in R&D if the knowledge stock is close to the levelz, but in equilibrium engage with sufficient effort in R&D for large values of z such that a knowledge stock substantially abovez can also be sustained in the long run. For such values of μ r , the MPE of the game in mode m 1 exhibits coexisting stable steady states. Although such phenomena so far have hardly been treated in the framework of MPEs in differential games, the extensive literature on coexisting stable steady states in dynamic optimization problems (e.g., [3,4,15]) implies that generically the actions of the players exhibit a jump along the curve separating the basins of attraction of the two stable steady states, which in the literature is referred to as Skiba curve. In general, this insight generates conceptual and numerical problems for the analysis. First, there is a conceptual problem, since the jump in the action of the opponent along a curve may induce a discontinuity of the value function of a player. In particular, this happens if the player chooses values of the own controls such that the state does not cross the curve. For asymmetric games, this implies that generically the player has an incentive to keep the state on one side of the curve, namely the side where her value function is strictly larger. This implies that in such a game an MPE inducing two coexisting stable steady states separated by a Skiba curve can only exist under rather restrictive conditions. In particular, a Skiba curve can exist if there is a controllability problem in the sense that the player cannot choose the own controls such that the state moves from the part of the state space associated with the lower value to the other part associated with the higher value. This issue is discussed in more detail in Dawid et al. [6].

Theoretical Considerations
In the framework of a symmetric game, like the one considered here, the existence of a symmetric MPE with coexisting stable steady states is less problematic, because in such an equilibrium the value functions of both players coincide. If the boundary between the basins of attraction of the two stable steady states is determined by the intersection of the 'local value functions' around the steady states (which are identical for both players), none of the players has a jump in the value function, and the problem sketched above for asymmetric games does not arise. The local value functions capture the value of the game under trajectories which result from optimal behavior of both players under the constraint that the state converges to a certain (locally stable) steady state. More formally, let us assume that the state space in mode m 1 can be separated into two closed regions S a , S b such that S a ∪ S b = [α l o , α h o ] × [z l , z h ] and the intersection S = S a ∩ S b is a connected curve in the state space. Assume further that there exist functions V m 1 a , V m 1 b which are continuous on S a (respectively, S b ) and are continuously differentiable in the interior of these regions. Furthermore, V m 1 a (V m 1 b ) satisfies the HJB equation (12) on the interior of S a (S b ) and we have V m 1 a = V m 1 b along the curve S. Finally, we define φ m 1 hx , h = o, r, x = a, b as the feedback function resulting from inserting V m 1 x into (13) [respectively, (14)]. The fact that the two feedback functions exhibit jumps along the curve S in the state space raises technical problems when considering the dynamic optimization problem of firm f that results from inserting the feedback function of the other firm into the state dynamics and the transition rates between modes. After this insertion, neither the state dynamics nor transition rates between modes are continuous on the entire state space and therefore the assumptions required for the standard result that the value function is the unique solution (in the viscosity sense) of the HJB equation (see e.g., Theorems 2.8 and 2.12 in Bardi and Capuzzo-Dolcetta [1]) do no longer hold. Actually, without these continuity assumptions in general the dynamic optimization problem might not even be well defined, since for an own control path inducing that the set of points in time at which the state crosses S has positive measure, the solution to the state dynamics would not be well defined. Addressing this general technical issue, which arises quite naturally in differential games if the feedback strategies are not restricted to functions that are continuous with respect to the state is beyond the scope of this paper. Hence, in what follows we restrict attention to strategy profiles, where for each initial condition the set of points in time where the firms' controls jump along the induced state trajectory have measure zero. As will become clear from our numerical analysis below, in the game considered here this restriction of the strategy space is not restrictive.
Given that we consider this strategy space, the following proposition shows that the combination of the two local profiles described above generates a Markov-perfect equilibrium.

Proposition 1 The symmetric profile
Proof Consider the optimization problem of firm 1 assuming that firm 2 is choosing the considered strategy (φ o , φ r ). Since V m 1 x is a smooth function satisfying the HJB equation of firm 1 in the region S x , x = a, b and trajectories stay bounded in S x if firm 1 uses the feedback function (φ ox , φ r x ), it follows from standard arguments that no other feedback strategy can generate a larger expected value for firm 1 as long as the state trajectory does not leave the region S x . Consider now an initial condition (α o , z) ini in the region S x , x = a, b and assume that there is an alternative feedback function (φ o ,φ r ) generating a strictly higher expected value for firm 1 than (φ o , φ r ) does. Due to the argument given above it must be that the state trajectory induced by the profile (φ o ,φ r ), (φ o , φ r ) crosses at least once the boundary S between the two regions. In what follows we show that this assumption results in a contradiction, where we first deal with the case where the induced trajectory exhibits a finite number of jumps and, as a second step demonstrate that this implies also a contradiction for all trajectories which cross S infinitely often.
First, we consider the case with a finite number of intersections with S. Denote by (α o ,ẑ) the last point where the trajectory hits S. And assume without loss of generality that the trajectory is in region S a before it hits (α o ,ẑ). ,ẑ), the expected discounted payoff under the trajectory after hitting (α o ,ẑ) must be V m 1 a (α o ,ẑ). Since V m 1 a is the value function for firm 1 for the problem on S a this implies that if (α 0 ,z) is any state on the trajectory between (α o ,ẑ) and the previous point where S is hit, the expected value in that state cannot be larger than V m 1 a (α 0 ,z). The same argument can then be applied to the previous part of the trajectory running in S b and so on until the initial condition is reached. Hence the value at (α o , z) ini cannot be larger than V m 1 x , which yields a contradiction. Second, assume that the induced trajectory crosses S infinitely often. Denote this trajectory by (α(t),z(t)). Given the assumption that this trajectory yields a higher value for firm 1 compared to V x ((α o , z) ((α o , z) ini )), whereṼ denotes the value generated by trajectory (α(t),z(t)). Furthermore, we consider the trajectory (α(t),z(t)) given by for some given T such that (α(T ),z(T )) / ∈ S. We denote the value generated by this trajectory byṼ . Due to the exponential discounting in the objective function of firm 1 and the fact that the instantaneous profit function is bounded from above, we have Therefore, there exists aT such thatṼ ((α o , z) The assumption that the set of times where (α(t),z(t)) intersects S has measure zero, implies that (α(t),z(t)) intersects with C only finitely often. Hence, we obtain a contradiction to the statement which was shown in the first part of this proof.
Proposition 1 implies that pasting in a continuous way two functions, which satisfy the mode m 1 HJB equation on parts of the state space, yields the symmetric value function in this mode for a symmetric Markov-perfect equilibrium, provided that the two regions are invariant under the induced equilibrium dynamics. 3 Although in the context of the considered model we show this property only for a scenario with two invariant parts of the state space, our argument also holds in scenarios with a larger number of invariant subspaces as long as the local value functions coincide at the boundaries between the associated regions. 4

Numerical Approach
Whereas Proposition 1 provides a theoretical foundation for the characterization of symmetric MPEs with multiple stable steady states, also the numerical computation of the value function in such scenarios is not straight forward. A jump of the firms' actions along the Skiba curve implies that the value function V m 1 exhibits a kink along this curve. The fact that the collocation method relies on a polynomial approximation of the value function implies that the approximate value functionV m 1 by definition is smooth on the considered state space and therefore cannot exhibit a kink. Hence, the procedure described in Sect. 4 has to be adjusted if we deal with parameter constellations under which the MPE feedback strategies induce state dynamics with multiple stable steady states.
In order to generate the 'local value functions' around the two coexisting steady states, we first generate two overlapping regions of the state space on which the local value functions are calculated such that each region contains only one steady state. The curve S is then determined as the intersection of the two local value functions. It is obvious that at a potential steady state with no R&D investment we must have z =z. On the other hand, Fig. 1 and the consideration of the complementarity between knowledge stock and R&D investment suggests that the steady state with positive R&D is characterized by a knowledge stock substantially abovez. Hence, we choose the two overlapping regions of the state space where α l o , α h o , z l , z h are given according to our base parameter setting and we have z b l < z a h . For values of the R&D costs μ r between the values considered in Fig. 1 the collocation algorithm is applied separately to these two regions. It is then checked whether each of the two regions is invariant under the induced state dynamics, and, if this is the case, the (approximate) value functions are appropriate local value functions on the considered part of the state space. 5 In order to foster the generation of such appropriate local value functions, which induce dynamics under which the region is invariant, it is useful to carefully select the initial guess of the value function for the iteration in the collocation algorithm. To this end, when considering the region surrounding the low investment equilibrium (i.e., R a ), we initially calculate the value function for a value of μ r , which is sufficiently large such that the zero R&D steady state is the only stable fixed point in the whole state space (e.g., μ r = 1, see Fig. 1). The initial guess c(0) for the collocation for the actual value of μ r is then obtained by a continuation method by decreasing μ r in small steps and always recalculating the approximate value function with the initial guess in the collocation given by the value function of the previous step. Alternatively, a homotopy method could be used. However in multi-mode problems, in which typically numerically determined value functions from other modes appear in the HJB equation, the derivative of the right hand side of that equation with respect to the changing parameter, which is needed for the homotopy, is often not available. Similarly, the approximate value function on R b is obtained through a continuation method increasing μ r from a value where the unique stable steady state in the state space exhibits positive R&D investment.
To determine the boundary between the basins of attraction of the two locally stable steady states, the difference between the two local value functions R a and R b is considered in the z) = 0 holds on a closed curve S such that on both sides of S the state dynamics, generated by the feedback strategies based on the corresponding local value function, points inward, then Proposition 1 can be used to conclude that the combination of the feedback strategies gives a Markov-perfect equilibrium profile. Due to the fact that R a (α, z) − R b (α, z) is a polynomial in (α, z) the line S can be easily calculated, if such a line exists. In the following section we use this approach to analyze the firms' equilibrium investment behavior in scenarios with two stable steady states.

Pre-Innovation Phase
Applying the procedure described in the previous section allows us to compute the value functions and corresponding feedback functions of an MPE for parameter settings of the model where two stable steady states exist in the pre-innovation mode. Figure 2 shows the result of these calculations for μ r = 0.27 and z l b = 0.35, z h a = 0.6. The local value function on R a is depicted in red, whereas that on R b is blue. The Skiba curve S, determined by the intersection of the two functions, can be clearly seen and the value function V m 1 is the upper envelope of the two local value functions.
AS argued above, it follows from Proposition 1 that the value function V m 1 corresponds to a symmetric Markov-perfect equilibrium in mode m 1 and in Fig. 3 we depict the equilibrium feedback functions for activities building up acceptance of the established product and for R&D investment. We observe that both feedback functions exhibit jumps along the Skiba curve S. The state dynamics induced by this symmetric equilibrium strategy profile is shown in Fig. 4. It can be clearly seen that indeed two locally stable steady states exist and their basins of attraction are separated by the Skiba curve S indicated as a black line. We denote Considering again the equilibrium feedback functions it can be observed that, when moving from the basin of attraction of (α o , z) h * m 1 to that of (α o , z) l * m 1 , not only the investment in R&D exhibits a downward jump, but also the firms' activities for strengthening the old market jumps upward. This is quite intuitive since in the basin of attraction of (α o , z) l * m 1 , and there is a positive probability that the new product is never introduced and therefore the expected future returns of increasing the reservation price for the established product are larger compared to the scenario where the state converges to (α o , z) h * m 1 . This is also reflected in the observation that Turning to the firms' equilibrium investment in R&D, we observe the same qualitative properties as in the two cases depicted in Fig. 1. Due to the complementarity between public knowledge and firms' R&D, these investments increase with z. Furthermore, a large reservation price on the established market reduces the firms' incentive to invest in R&D. Together, these two properties imply that the Skiba curve, along which the firms' strategies jump, is upward sloping in the state space. This in turn implies that for a given level of the public knowledge stock the initial strength of the established market (captured by α o ) can determine whether in equilibrium there is persistent investment in the development of the new product, which would mean that the product is introduced with probability one. Such a scenario arises only if the initial strength of the established market is sufficiently small (i.e., (α o , z) is left of the Skiba curve in Fig. 4). If the established market is strong, then, apart from a short transient phase, firms abstain from investment in R&D, which implies that the probability that the new product reaches the market is close to zero.
These insights have important implications for the understanding of the introduction of a new product which is (abstracting from the firms' activities to influence acceptance of the products) more attractive for consumers than the established one. The result suggests that under certain constellations of the R&D cost parameters and the level of public knowledge stock, the strength of the established market might prevent the development and the introduction of the new product. This insight has clear policy implications, which are discussed in more detail in Sect. 8. Our results are also interesting from a technical perspective, since to our knowledge this is the first instance of an MPE with coexisting stable steady states in a differential game in which the dimension of the state space is larger than one and also the first instance of such a phenomenon in a multi-mode differential game.

Post-Innovation Phase
After one firm has introduced the new product, the market dynamics and the incentives of the competitors to invest in activities strengthening the established (as well as the new) market, change significantly. Without loss of generality, we focus on the case of mode m 2 where firm 1 has been successful in introducing the new product. According to our assumption that the new market for technical reasons or due to patent protection is characterized by such high entry costs for firm 2 that this firm never enters, we have an asymmetric scenario, in which firm 1 can influence the development of the strength of both markets, whereas the activities of firm 2 are restricted to the established market. Figure 5 shows the feedback functions of both firms in mode m 2 for our base parameter setting. Considering firm 2, which is only active on the established market, it can be observed that investments in strengthening that market are positively affected by the size of that market (like in mode m 1 ) and negatively affected by an increase in α n . The intuition for this effect is similar to the one discussed for m 1 , namely that an increase in α n reduces the quantity firm 2 sells on the established market, which reduces the incentive to invest in an increase in the (reservation) price of the established product. The monotonicity properties of firm 1's activities for influencing α o are the same as those of the investment of firm 2, however, Fig. 5 shows that it can actually be optimal for firm 1 to engage in costly activities to reduce the strength of the established market although the firm is still active on that market. In particular, this is true if the strength of the new market is above a certain threshold, where, as is to be expected, this threshold is an increasing function of the strength of the established market. The activities of firm 1 with respect to the strength of the new market are always directed toward an increase in the size of that market. Analogous to the established market the level of these activities are positively influenced by the strength of the market itself and negatively affected by the size of the other market. Figure 6 shows the evolution of the reservation prices in the two markets and the corresponding trajectories of investments in both modes under the assumption that the realization of stochastic innovation time is τ = 40. This means that we consider a situation where the size of the established market has more or less reached the steady state value of the pre-innovation phase before the innovation occurs. It can be clearly seen that the innovation leads to a downward jump in the investment of firm 1 in the established market and at the same time to an upward jump of the activities of firm 2 on that market. The effort of firm 1 to strengthen the new market after its emergence are substantially larger than both the firms' efforts on the established market prior to innovation and also those of firm 2 on the established market after the innovation. This is due to the large market power of firm 1 with respect to the new product, which results in large output quantities. Shortly after the innovation, the investments of firm 1 with respect to the evolution of α o become negative and stay negative in the long run. Therefore, the size of the established market shrinks to a value which is only slightly above the levelα o which would emerge in the long run without effort by any of the firms to influence the reservation price on that market. The reservation price of the new product in the long run is larger than that of the established product, which does not only reflect the higher basic attractiveness of that market (captured byα n >α o ) but also the substantially larger (net) investments in activities boosting the acceptance of that product and its underlying technology. The dynamics of output quantities corresponding to this scenario (see Fig. 7) highlights the close connection between the firms investments in the evolution of the different markets and their output quantities on these markets. Contrary to the investments, the output of firm 2 on the established market exhibits however no upward jump after the innovation. The effects of the downward jump in the output of firm 1 on the established market is exactly neutralized by the positive quantity of the new product, such that initially the quantity of firm 2 remains unchanged after the innovation and only eventually decreases as the reservation price on the old market goes down. Furthermore, the figure shows that for the considered parameter setting the innovator over time reduces its output on the established market essentially to zero thereby realizing a full transition from the old product and technology to the new one. It should however be noted that firm 1 starts engaging in activities decreasing the size of the established market at a point in time where it still produces substantial positive quantities of that product. Overall, we end up in a scenario where each of the two firms completely focuses on one of the two markets.

Conclusions
The most important contribution of this paper is technical in the sense that, to our belief, this is the first paper generating a history-dependent solution in the setting of a Markov-perfect equilibrium of a differential game with more than one state variable. In a model with two state variables, a Skiba curve separates the basins of attraction of the different locally stable steady states. We design an adaptation of a collocation algorithm to develop the numerical solution.
We apply our method to a duopoly model where the two identical incumbent firms both have an option to carry out a product innovation. In a state plane with public knowledge on the vertical axis and the reservation price of the established product on the horizontal axis, the Skiba curve is upward sloping. If the initial values of the state variables are such that this point is situated below the Skiba curve, the firms do not innovate. On the other hand, if this point is located above the Skiba curve, it follows that with probability one the new product will eventually be introduced on the market.
The location of a Skiba curve forms an important input for the policy maker. Consider a situation where the new product is more environmental friendly and the initial values of the state variables are such that it is located below the Skiba curve. Then, the market outcome will be that the cleaner new product will not be invented. Essentially the policy maker has two methods to still make innovation work. The first method is to move the initial point in the state plane in such a way that it enters the desired area, which is above the Skiba curve. To do so, the policy maker could either move the point of the initial states upward by increasing public knowledge, for instance by subsidizing universities to do research in this area. Or the policy maker could move this point to the left by taxing the use of the more dirty established product, which reduces the reservation price.
The second method is to enlarge the basin of attraction of the innovation steady state, thus moving the Skiba curve downward. This can be done by subsidizing R&D investments.
It is important to note that the Skiba curve is only an input for the policy maker: it just shows what is needed to change an (undesired) market outcome. However, to determine whether it is in fact optimal to do so requires a richer setting in which welfare should be optimized such that, besides firm profits, also costs of the specific policy measure and consumer surplus are taken into consideration.