1 Introduction

Price stickiness is a way to model non-instantaneous price adjustment. This market imperfection is an important topic in macroeconomics, with many papers proving it from various types of data (e.g. Anderson et al. 2015; Lünnemann and Wintr 2011; Gorodnichenko and Weber 2016), but it is also worth considering in microeconomic problems like the market equilibrium in an oligopoly. Such a market can be modelled as a dynamic game—a differential game. The first such formulation of a model with sticky prices has been introduced by Simaan and Takayama (1978). The theoretical model has been further exploited e.g. by Fershtman and Kamien (1987, 1990), Tsutsui and Mino (1990), Piga (2000), Dockner and Gaunersdorfer (2001), Cellini and Lambertini (2004, 2007), Benchekroun et al. (2006), Colombo and Labrecciosa (2021), Hoof (2021), Wiszniewska-Matyszkiel et al. (2015), Wang and Huang (2015, 2018, 2019), Valentini and Vitale (2021) and Liu et al. (2017). There are also extensions of such models, considering additionally also advertising (e.g. Lu et al. 2018; Raoufinia et al. 2019) or applications of such a model adjusted to suit refineries Tominac and Mahalec (2018). For more exhaustive reviews of the subject see e.g. Dockner (2000) and Colombo and Labrecciosa (2017).

However, all of the models mentioned above focus on the market structure of Cournot oligopoly, in which firms produce a homogeneous product and they have entirely symmetric information. Analogous model with information asymmetry in which one of the firms (the leader) takes into account how the other firm (the follower) reacts to its strategy is called the Stackelberg model.

In static games, both the idea and solution of the Stackelberg problem are relatively simple and the informational advantage can be easily interpreted as a generalized first mover advantage: the sequence of moves with the leader as the first mover, or binding declaration of the leader about his choice of strategy before the choice of the follower (then the actual sequence of moves does not matter). Moreover, as a sequential optimization, the problem has a solution under only upper semicontinuity and compactness assumptions, unlike the Nash equilibrium problem requiring also the existence of a fixed point. In differential games, and, more generally, dynamic games, this generalized first mover advantage may be either required at each stage of the game, but it may also concern declaration of the leader’s strategy in the entire game before the first move and, depending on the information structure of the game, additional assumptions may be required. The situation is simple for the open loop information structure, when the strategies of the players are functions of time only, because then the standard definition applies. Conversely, a problem appears for the feedback information structure, with strategies (called feedback or Markov perfect) dependent on the current state. In the latter case, there are two extensions of the Stackelberg equilibrium. One of those concepts corresponds to the leader being the first mover at each stage. The other one, called global Stackelberg equilibrium, describes the situation in which the leader declares his feedback strategy before the game and the follower best responds to it. This approach is either equivalent to using threat strategies in order to enforce the global maximum of the leader’s payoff, or it requires imposing additional assumptions on the leader’s strategy. The reason is that calculating the best response of the follower to every possible leader’s strategy and then optimizing leader’s payoff with the resulting follower’s best response is ill-posed, since the best response to discontinuous strategies ceases to exist even in nice problems. Therefore, there are some a priori constraints of the class of the leader’s strategies: it is usually assumed that the leader’s strategy is linear. Nevertheless, it is enough to consider a class of functions defined by several real parameters. The resulting problem of the leader, however, is not a standard optimal control any more, but it becomes usual finite-dimensional optimization and the resulting strategy may be suboptimal if the leader’s optimal control after declaration does not belong to the assumed class of functions, so the solution may be not time-consistent. For deeper insight, see e.g. Başar and Olsder (1998) or Haurie et al. (2012) for general theory, while Martín-Herrán and Rubio (2021) for rare cases of coincidence of those two classes of Stackelberg equilibria with state-dependent information structure.

Similarly, the open loop Stackelberg equilibria, which are simpler to derive, are usually not subgame-perfect and it often turns out that the leader has incentives to change the declared strategy after the follower chooses his strategy being the best response to it.

Generally, solving the feedback Stackelberg problem is analytically very complicated and restriction to time-consistent, subgame-perfect solution makes it substantially more complicated, especially if the strategy sets are constrained, which even in linear quadratic problems with linear constraints leads to only piecewise-linear solutions. It can be expected that the best response of the follower to the leader’s strategy that is piecewise-linear with \(k\) pieces may result in the best response of the follower being piecewise linear with more than \(k\) pieces. This makes the leader’s optimization problem piecewise-linear-quadratic with more than \(k\) pieces.

The class of linear quadratic problems with linear constraints has been extensively studied for resource extraction problems for common or interrelated renewable resources sold at a common market, known also as productive asset oligopolies. Inherent constraints, like nonnegativity of the state variable and control or the constraint by admissibility of resource, in linear quadratic problems, may lead to numerous problems for Nash equilibria. Examples of such problems are as follows: the value function is piecewise quadratic with infinitely pieces for some parameters (Singh and Wiszniewska-Matyszkiel 2018), the problem is intractable in the standard way and the solution not even piecewise-linear (Singh et al. 2020), all the symmetric Nash equilibria are discontinuous (Singh and Wiszniewska-Matyszkiel 2019). Some difficulties may appear even in such optimal control problems, like e.g., in Singh and Wiszniewska-Matyszkiel (2020), where the solution is piecewise-linear with infinitely many pieces and the standard undetermined coefficient method returns a control far from the unique optimum. Nevertheless, such complications does not have to happen always in this kind of problems: there is a sequence of works, with piecewise linear dynamics in which this problem does not appear at a Nash equilibrium: Benchekroun (2008), Benchekroun et al. (2020), Vardar and Zaccour (2020), or at a Stackelberg equilibrium: Colombo and Labrecciosa (2019).

It is worth emphasizing that, as it has been proven in Wiszniewska-Matyszkiel et al. (2015), in the Cournot oligopoly with sticky prices, the strategies of the players in a feedback equilibrium are only piecewise linear with two pieces and the same applies to best responses to linear feedback strategies. So, what can be expected, a typical way of defining the global feedback Stackelberg equilibrium, in which calculating the best response of the follower is restricted to linear strategies of the leader only and the leader’s equilibrium strategy is indeed linear, cannot lead to a time-consistent global feedback Stackelberg equilibrium. Moreover, assuming a two-piece linear strategies of the leader results in only piecewise linear dynamics in the follower’s problem. Thus, a three-pieces linear best response can be expected, and it cannot be a priori excluded that the best response of the leader has more than two pieces. So, the global Stackelberg problem becomes extremely complicated.

Therefore, various simplifications of the dynamic Stackelberg equilibrium are considered. One of them is a model in which the leader’s informational advantage is increased by the fact that the less informed follower is also myopic.

Various models with myopia of at least one of two players, usually Stackelberg follower, are examined in marketing channels models, e.g. Taboubi and Zaccour (2002), Benchekroun et al. (2009), Liu et al. (2016), Martín-Herrán et al. (2012) and Wang et al. (2019), and in environmental problems, e.g. Hämäläinen et al. (1986) or Crabbé and Van Long (1993).

The first work with an attempt to capture the sticky price dynamics in a market with asymmetric information of this type is Fujiwara (2006), proposing a Stackelberg duopoly model with a myopic follower who expects immediate price adjustment. To the best of our knowledge, the subject has not been continued in the published literature. In Fujiwara (2006), the calculations are restricted only to finding the steady state of the open loop equilibrium (i.e. the information structure in which the strategies are functions of time only, not price) and the results are not fully proven. So, a natural step is to complete that analysis.

In this paper, we perform a complete analysis of both open loop and feedback form of the leader’s strategies in the model proposed by Fujiwara and we obtain interesting phenomena.

We compare our results with those for analogous market with Cournot duopoly structure derived in Wiszniewska-Matyszkiel et al. (2015).

2 Formulation of the model

We consider a differential game with 2 players, producers of the same good. Products of both producers are perceived by consumers as identical. Each of the firms has the same quadratic cost functions

$$\begin{aligned} C(q_{i})=cq_{i}+\frac{1}{2}q_{i}^{2} \quad \text {for } i=1,2, \end{aligned}$$
(1)

where \(c\) is some positive constant and \(q_i \ge 0\) denotes the production of \(i\)-th player.

The market is described by the inverse demand function

$$\begin{aligned} p^{\text {E}} = A - (q_1 + q_2). \end{aligned}$$

However, the price does not adjust immediately, but its behaviour is defined by a differential equation

$$\begin{aligned} {\dot{p}}(t)=\frac{dp}{dt}=s(p^{\text {E}}(t)-p(t))=s(A-(q_{1}(t)+q_{2}(t))-p(t)) \ , \ p(0) = p_0, \end{aligned}$$
(2)

where \(s > 0\) measures the speed of adjustment and \(A\) is some positive constant substantially greater that \(c\), which can be interpreted as the market capacity.

So, it is natural to consider the resulting problem as a differential game with players maximizing \(\varPi _i(q_1, q_2) = \int _0^{\infty } {{\,\mathrm{e}\,}}^{-rt} \big ( p(t) q_i(t) - C(q_i(t)) \big ) dt\), where \(q_i(t)\) is the decision at time \(t\). The above problem has been extensively studied in the literature (see the introduction). In this paper, we want to consider a serious asymmetry between the players. Firstly, only the leader (player 1) is far-sighted, he knows the dynamics of price and his aim is to maximize

$$\begin{aligned} \varPi _1(q_1, q_2) = \int _0^{\infty } {{\,\mathrm{e}\,}}^{-rt} \big ( p(t) q_1(t) - C(q_1(t)) \big ) dt, \end{aligned}$$
(3)

where \(r > 0\) is a discount factor, while \(p\) is defined by (2).

The follower (player 2) is assumed to be myopic and at each time instant he behaves like in the static Stackelberg duopoly. Hence, given a decision of the leader \(q_1(t)\), he chooses \(q_2(t)\) maximizing

$$\begin{aligned} \pi _2(q_1(t), q_2(t))=(p^{\text {E}}(q_{1}(t), q_{2}(t))-c)q_2(t)-\frac{1}{2}q_2^2(t), \end{aligned}$$
(4)

as in Fujiwara (2006).

There may be several reasons of myopia of the less sophisticated player. Two most obvious ones are related to stronger position of the leader. The first one is when the leader is an established firm at the market and there are unrelated follower entrants at separate time instants, each of the entrants existing for one time instant. The same applies if there is only one firm but not sure whether it is going to exist in the future. This encompasses, among many other cases, the asymmetry between a fashion firm and a counterfeiter or, in a slightly different approach, a company with fishing rights and a poacher. The other obvious explanation assumes that the leader is the one who dictates prices and the follower just does not know the pricing rules of the leader—so there is partly a problem with distorted information as in Wiszniewska-Matyszkiel (2016) and Wiszniewska-Matyszkiel (2017).

We return to those interpretations in Sect. 5 after stating the results.

There is also one more explanation, already examined in the literature: being myopic may be a behavioural choice as in e.g. Benchekroun et al. (2009), which has already been studied in papers on sticky prices Liu et al. (2016) and Liu et al. (2017).

We end the formulation of the problem by recalling that the leader knows the way the follower behaves.

We would like to mention that although we write \(q_1(t)\) and \(q_2(t)\) while defining \(\varPi _1\) and \(\pi _2\), we just do it in order to have a concise notation at this stage, while in the sequel, we consider not only open loop strategies of the leader, but also feedback strategies (dependent on current price only) and strategies of the follower at each stage being a function of the current decision of the leader.

2.1 The behaviour of the follower, the implications for the leader and the static model

Let us consider a time instant \(t\). If we solve the optimization problem of the follower given the decision of the leader \(q_1(t)\), we get the best response of the follower

$$\begin{aligned} q_2(q_1(t)) = \frac{A - c - q_1(t)}{3} \end{aligned}$$
(5)

whenever it is positive, which, as we shall see, holds for all reasonable levels of the leader’s production.

This best response is knowns to the leader and therefore, taken as an input into his optimization problem. So, the optimization problem of the leader reduces to the maximization of

$$\begin{aligned} J(q_1) := \varPi _1(q_1, q_2(q_1)) \end{aligned}$$
(6)

given by (3) with \(p\) is defined by

$$\begin{aligned} {\dot{p}}(t) = \frac{s(2A + c - 3p(t) - 2q_1(t))}{3} \ , \ p(0) = p_0. \end{aligned}$$
(7)

We also need the static Stackelberg model with immediate adjustment of prices for comparison to the results of our dynamic game. In the static Stackelberg model, the leader also maximizes \(\pi _1\) defined analogously to Eq. (4), and the only difference is in information – the leader knows that the strategy of the follower is the best response to \(q_1\), given by (5). So, the leader’s optimization problem is to maximize \({\pi _1(q_1, q_2(q_1)): = (A - q_1 - q_2(q_1))q_1 - C(q_1)}\), where \(q_2(q_1)\) is the best response of the follower. This results in static Stackelberg equilibrium

$$\begin{aligned} p^{SB} = \frac{10A + 11c}{21}, \; q_1^{SB} = \frac{2(A - c)}{7}, \; q_2^{SB} = \frac{5(A - c)}{21}. \end{aligned}$$
(8)

For comparison, the results for static Cournot-Nash equilibrium are

$$\begin{aligned} p^{CN} = \frac{A + c}{2}, \; q_i^{CN} = \frac{A - c}{4} \text { for } i = 1, 2. \end{aligned}$$
(9)

3 The myopic-follower Stackelberg equilibrium for open loop strategies of the leader

We start the analysis from the open loop strategies of the leader, i.e., the strategies of the leader being measurable functions \(q_1 :{\mathbb {R}}_+ \rightarrow {\mathbb {R}}_+\), directly dependent on time, without any dependence on price. The set of such strategies is denoted by \({\mathbb {Q}}_{OL}\). In the case when discontinuity appears, the price adjustment Eq. (7) is required to hold almost everywhere. The reaction of the follower is given by Eq. (5).

We apply the necessary conditions given by Theorem 11.

Lemma 1

For the current value Hamiltonian

$$\begin{aligned} H^{CV}(p, q_1, \lambda ) = p q_1 - c q_1 - \frac{1}{2}q_1^2 + \lambda \frac{s(2A + c - 3p - 2q_1)}{3}, \end{aligned}$$

the following properties hold.

If \(q_1^* \in \mathop {{\mathrm{Argmax}}} _{q_1\in {\mathbb {Q}}_{OL}} J(q_1)\) and \(p\) is the corresponding trajectory of price, then there exists an absolutely continuous costate trajectory \(\lambda :{\mathbb {R}}_+ \rightarrow {\mathbb {R}}\) such that for a.e. \(t\)

$$\begin{aligned}&{\dot{\lambda }}(t)= \lambda r -\frac{\partial H^{\text {CV}}(p(t), q_1(t), \lambda (t))}{\partial p}, \end{aligned}$$
(10)
$$\begin{aligned}&q_1(t) \in \mathop {{\mathrm{Argmax}}}\limits _{q_1 \in {\mathbb {R}}_+} H^{\text {CV}}(p(t), q_1(t), \lambda (t)), \end{aligned}$$
(11)
$$\begin{aligned}&\lim _{t \rightarrow \infty } \lambda (t) {{\,\mathrm{e}\,}}^{-rt}=0 \end{aligned}$$
(12)

and

$$\begin{aligned} \text {for every } t \ge 0, \ \lambda (t) > 0. \end{aligned}$$
(13)

With the derived transversality condition (12), the costate trajectory is calculated backwards, as usually in the reasoning based on the Pontryagin maximum principle. In the sequel, in Theorem 4, we transform the conditions (12) and (13) to an initial condition, which is unique given \(p_0\), analogously to the technique of proof used in Wiszniewska-Matyszkiel et al. (2015).

Proof

The assumptions of Theorem 11 are fulfilled (see appendix A.3). Applying relations of Theorem 11 yields formulae (10) and (11). By the terminal condition given in Theorem 11, \(I(t) = \int _t^{\infty } {{\,\mathrm{e}\,}}^{-r w} {{\,\mathrm{e}\,}}^{-sw} q_1^*(w) dw\) converges absolutely and that \(\lambda\) fulfils

$$\begin{aligned} \lambda (t) {{\,\mathrm{e}\,}}^{-r t} = {{\,\mathrm{e}\,}}^{s t} \int _t^{\infty }{{\,\mathrm{e}\,}}^{-(r + s) w} q_1(w)dw . \end{aligned}$$
(14)

As proven in Appendix A.3, the set of control parameters that can appear in the optimal control is bounded, so

$$\begin{aligned} \lambda (t) {{\,\mathrm{e}\,}}^{-r t} \le {{\,\mathrm{e}\,}}^{s t} \int _t^{\infty } {{\,\mathrm{e}\,}}^{-(r + s) w} q_{\max } dw, \end{aligned}$$

and

$$\begin{aligned} \lambda (t) {{\,\mathrm{e}\,}}^{-r t} = {{\,\mathrm{e}\,}}^{s t} \int _t^{\infty } {{\,\mathrm{e}\,}}^{-(r +s) w} q_1(w) dw \ge 0. \end{aligned}$$

Thus, \(\lambda (t) {{\,\mathrm{e}\,}}^{-r t} \rightarrow 0\) as \(t \rightarrow \infty\) and it is nonnegative.

Suppose that \(\lambda ({\hat{t}}) = 0\) for some \({\hat{t}} > 0\). Since the integral of a nonnegative function can be zero only if the function is 0 almost everywhere, without loss of generality, the optimal control fulfils \(q_1(w) = 0\) for all \(w \ge {\hat{t}}\).

First, we check the case when \(p(w) > c\) for some \(w \ge {\hat{t}}\). Then, by continuity of trajectories, there exist \(\epsilon ,\delta >0\) such that increasing \(q_1\) to \(\epsilon\) on some small interval \([w, w + \delta ]\) (on which the corresponding strategy, \(p_{\epsilon ,\delta }\) fulfils \(p_{\epsilon ,\delta }(t)>c\)) would increase payoff. Indeed,

$$\begin{aligned} \varPi _1(q_1,q_2(q_1))= & {} \int \limits _0^w{{\,\mathrm{e}\,}}^{-rt}(p(t)q_1(t)-C(q_1(t)))dt+\int \limits _{w}^{\infty }{{\,\mathrm{e}\,}}^{-rt}(p(t)q_1(t)-C(q_1(t)))dt\\= & {} \int \limits _0^w{{\,\mathrm{e}\,}}^{-rt}(p(t)q_1(t)-C(q_1(t)))dt+0\\< & {} \int \limits _0^w{{\,\mathrm{e}\,}}^{-rt}(p(t)q_1(t)-C(q_1(t)))dt+\int \limits _{w}^{w+\delta }{{\,\mathrm{e}\,}}^{-rt}(p_{\epsilon ,\delta }(t)\epsilon -C(\epsilon ))dt. \end{aligned}$$

This leads to a contradiction with optimality of the leader’s strategy.

Next, we assume that \(p(w)>c\) does not hold for any \(w\ge {{\hat{t}}}\). So, \(p(w) \le c\) for all \(w \ge {\hat{t}}\).

We recall that the optimal control fulfils \(q_i(w)=0\) for all \(w\ge {{\hat{t}}}\) and note that, by the fact that \(A>c\), Eq. (7) with \(q_1(w)=0\) for \(w \ge {\hat{t}}\) implies \({\dot{p}}(w) > 0\) and the unique steady state of Eq. (7) for such \(q_1\) is greater than \(c\). So, the the price corresponding to such a \(q_1\) grows to this steady state and the trajectory exceeds \(c\) at some finite time, which leads to a contradiction.

Therefore \(\lambda (t) > 0\) for all \(t\). \(\square\)

To maximize the present value Hamiltonian with respect to \(q_1\), we calculate its zero derivative point and we obtain \(q_{1}(t)= p(t) - c - \frac{2}{3}s\lambda (t).\) Taking into account the nonnegativity constraints, this implies

$$\begin{aligned} q_{1}(t)= \left\{ \begin{array}{ll} p(t) - c - \frac{2}{3}s\lambda (t) &{} \quad \text {if }(\lambda (t),p(t))\in \varOmega _{2},\\ 0 &{} \quad \text {if }(\lambda (t),p(t))\in \varOmega _{1}, \end{array} \right. \end{aligned}$$
(15)

where

$$\begin{aligned} \varOmega _{1}=\left\{ (\lambda ,p)\ : \ \lambda>0, \ p>0, \ p\le \frac{2s}{3}\lambda + c\right\} \end{aligned}$$
(16)

and

$$\begin{aligned} \varOmega _{2}=\left\{ (\lambda ,p)\ : \ \lambda>0, \ p>0, \ p> \frac{2s}{3}\lambda + c\right\} . \end{aligned}$$
(17)

Applying Lemma 1 to our problem yields

$$\begin{aligned} \begin{aligned} {\dot{\lambda }}(t) = (s + r) \lambda (t) - q_1(t) . \end{aligned} \end{aligned}$$
(18)

Substituting the follower’s best response (5) yields

$$\begin{aligned} \begin{aligned} {\dot{p}} = \frac{s(2A + c - 3p - 2q_1)}{3} \text { .} \end{aligned} \end{aligned}$$
(19)

Therefore, the optimality of the leader’s strategy implies that the state and the costate variables must fulfil the following system of ODEs.

$$\begin{aligned} \begin{aligned} {\dot{\lambda }}&= \left\{ \begin{array}{l l} \left( \frac{5}{3}s+r\right) \lambda -p+c &{} \quad \text {for }(\lambda ,p)\in \varOmega _{2},\\ (s+r)\lambda &{} \quad \text {for }(\lambda ,p)\in \varOmega _{1}, \end{array} \right. \\ {\dot{p}}&= \left\{ \begin{array}{l l} \frac{4}{9}s^{2}\lambda - \frac{5}{3}sp + s\frac{2A+3c}{3} &{} \quad \text {for }(\lambda ,p)\in \varOmega _{2},\\ -sp + s\frac{2A+c}{3} &{} \quad \text {for }(\lambda ,p)\in \varOmega _{1}. \end{array} \right. \end{aligned} \end{aligned}$$
(20)

Again, \(\lambda\) has the terminal condition given by Eq. (12), while \(p\) the initial condition given as in Eq. (7). So, we have a backward-forward ODE with a mixed terminal-initial condition, which we are going to transform to a joint initial condition. As the first step to do this, we formulate the following Theorem (Fig. 1).

Fig. 1
figure 1

The phase diagram of Eq. (20). The green line with horizontal bars denotes the \(p\)-null-cline, the red line with vertical bars denotes the \(\lambda\)-null-cline. The blue dashed line splits the costate-state space into the sets \(\varOmega _1\) and \(\varOmega _2\). The black solid line represents the stable manifold of the steady state, while the light brown line the unstable manifold

Theorem 2

Let \((\lambda (t),p(t))\) be a solution to Eq. (20) with an initial value \((\lambda _0,p_0)\). Then \(\lambda (t) {{\,\mathrm{e}\,}}^{-r t}>0\) and it converges to 0 as \(t\rightarrow \infty\) if and only if \((\lambda _0,p_0)\in \Gamma\) , where \(\Gamma\) is the stable manifold of the steady state \((\lambda ^*, \mathbf{p }^*_{OL})\) of Eq. (20).

The point \((\lambda ^*, \mathbf{p }^*_{OL})\in \varOmega _2\) (for \(\varOmega _1, \varOmega _2\) defined in (16) and (17)) and

$$\begin{aligned} \lambda ^{*} = \frac{2(A - c)}{5r + 7s}, \quad \mathbf{p }^*_{OL} = \frac{3r(2A+3c)+s(10A+11c)}{3(5r+7s)}. \end{aligned}$$
(21)

The corresponding steady state production of the leader is

$$\begin{aligned} \mathbf{q }^*_{1,OL}=\frac{2(r+s)(A-c)}{5r+7s}. \end{aligned}$$
(22)

Moreover, the curve \(\Gamma\) intersects with the line \(p = \left( \frac{2}{3}s + r\right) \lambda +c\) at the point \(({{\bar{\lambda }}}, {{\bar{p}}})\) with

$$\begin{aligned} {\bar{p}}= & \, {} \frac{(3r(2A+3c)+s(10A+11c))\sqrt{3}\sqrt{3r^{2}+20rs+28s^{2}}-{\bar{p}}_1}{3[(5r+7s)\sqrt{3}\sqrt{3r^{2}+20rs+28s^{2}} -3(14s^{2}+17sr+5r^{2})]},\nonumber \\ {{\bar{p}}}_1= & {} 42(2A+c)s^{2} + 9(10A+7c)sr + 9(2A+3c)r^{2}, \end{aligned}$$
(23)
$$\begin{aligned} {\bar{\lambda }}= & \, {} \frac{(A-c)[(5s+3r)\sqrt{3}\sqrt{3r^{2}+20rs+28s^{2}} -3(14s^{2}+15sr+3r^{2})}{s[(5r+7s)\sqrt{3}\sqrt{3r^{2}+20rs+28s^{2}} -3(14s^{2}+17sr+5r^{2})]}. \end{aligned}$$
(24)

The stable manifold \(\Gamma\) consists of the steady state \(\{ (\lambda ^*, \mathbf{p }^*_{OL}) \}\) and

$$\begin{aligned} \begin{aligned} \Gamma _1&= \Bigg \{ (\lambda ,p) = \Big ( {\bar{\lambda }}{{\,\mathrm{e}\,}}^{(r+s)\zeta } , \frac{2A + c}{3} + ({\bar{p}} - \frac{2A + c}{3}){{\,\mathrm{e}\,}}^{-s\zeta } \Big ) \text { : } \zeta \in \Bigg (\frac{1}{s} \ln \Bigg ( 1 - \frac{3{\bar{p}}}{2A + c} \Bigg ),0\Bigg ] \Bigg \} \subset \varOmega _1,\\ \Gamma _2&= \Big \{ (\lambda ,p) = \Big ( \lambda ^{*} - 3\zeta \Big ( 3r + 10s - \sqrt{3}\sqrt{3r^2 + 20rs + 28s^2} \Big ), \mathbf{p }^*_{OL} - 8\zeta s^{2} \Big ) \text { : } \zeta > 0 \Big \} \cap \varOmega _2,\\ \Gamma _3&= \Big \{ (\lambda ,p) = \Big ( \lambda ^{*} - 3\zeta \Big ( 3r+10s-\sqrt{3}\sqrt{3r^{2}+20rs+28s^{2}} \Big ), \mathbf{p }^*_{OL} - 8\zeta s^{2} \Big ) \text { : } \zeta < 0 \Big \} \subset \varOmega _2. \end{aligned} \end{aligned}$$

Proof

First, we analyse the phase portrait, presented in Fig. 1, to determine the existence of solutions. We can see that for each variable the null-clines are as follows:

$$\begin{aligned} {\dot{p}}= & {} 0 \quad \Longleftrightarrow \quad {\left\{ \begin{array}{ll} p=\frac{4}{15}s\lambda + \frac{2A+3c}{5} &{} \text {for } \lambda < \frac{A-c}{s},\\ p=\frac{2A+c}{3} &{} \text {for } \lambda \ge \frac{A-c}{s}, \end{array}\right. }\\ {\dot{\lambda }}= & {} 0 \quad \Longleftrightarrow \quad {\left\{ \begin{array}{ll} p=\bigg ( \frac{5}{3}s+r \bigg ) \lambda + c &{} \text {for } p > c, \\ \lambda =0 &{} \text {for } p \le c. \end{array}\right. } \end{aligned}$$

As the \(p\)-null-cline has the slope smaller than the line dividing the \((p, \lambda )\) space into \(\varOmega _1\) and \(\varOmega _2\), there exists exactly one solution in the positive quadrant and it corresponds to positive leader’s production. It is easy to calculate (21) and then (22) by substituting to (15).

In \(\varOmega _{1}\), the solution of (20) has the form

$$\begin{aligned} \lambda (t)=\lambda _{0} {{\,\mathrm{e}\,}}^{(r+s)t} \ , \; p(t)=\frac{2A+c}{3} + \bigg (p_{0}-\frac{2A+c}{3}\bigg ) {{\,\mathrm{e}\,}}^{-st}. \end{aligned}$$

In \(\varOmega _{1}\), to the right from the stable manifold, for \(t\rightarrow \infty\), \(\lambda (t)\) asymptotically behaves as \(c_2 {{\,\mathrm{e}\,}}^{(r+s)t}\) and \(p(t)\) as \(c_1 {{\,\mathrm{e}\,}}^{-st}\). Therefore, \(\lim _{t\rightarrow \infty }{{\,\mathrm{e}\,}}^{-rt}\lambda (t) \ne 0\).

We can see from the phase diagram that each solution with the initial condition right to the stable manifold eventually enters \(\varOmega _1\), so the above reasoning applies also to other trajectories right to the stable manifold.

We can also see that for every trajectory with the initial condition left to the stable manifold, \(\lambda (t) \le 0\) from some time instant on.

In \(\varOmega _2\), Eq. (20) reduces to:

$$\begin{aligned} \begin{bmatrix} {\dot{\lambda }}\\ {\dot{p}} \end{bmatrix} =B \begin{bmatrix} \lambda \\ p \end{bmatrix} + C, \end{aligned}$$
(25)

where

$$\begin{aligned} B= \begin{bmatrix} r+\frac{5}{3}s &{} -1 \\ \frac{4}{9}s^{2} &{} -\frac{5}{3}s \\ \end{bmatrix} \quad \text {and} \quad C= \begin{bmatrix} c \\ \frac{s(2A+3c)}{3} \\ \end{bmatrix}. \end{aligned}$$

The determinant \(\det B = -s \big ( \frac{5r+7s}{3} \big ) < 0\), which confirms that the steady state is a saddle point. The eigenvalues of \(B\) are

$$\begin{aligned} \begin{aligned} \mu _{1}&=\frac{1}{6}(3r+\sqrt{3}\sqrt{3r^{2}+20rs+28s^{2}}),\\ \mu _{2}&=\frac{1}{6}(3r-\sqrt{3}\sqrt{3r^{2}+20rs+28s^{2}}), \end{aligned} \end{aligned}$$
(26)

while the corresponding eigenvectors are

$$\begin{aligned} \begin{aligned} v_{1}&= \begin{bmatrix} 3(3r+10s+\sqrt{3}\sqrt{3r^{2}+20rs+28s^{2}})\\ 8s^{2} \end{bmatrix} ,\\ v_{2}&= \begin{bmatrix} 3(3r+10s-\sqrt{3}\sqrt{3r^{2}+20rs+28s^{2}})\\ 8s^{2} \end{bmatrix}. \end{aligned} \end{aligned}$$

This implies the equation describing the part of \(\Gamma\) in \(\varOmega _2\) is \(\Gamma _2\) and \(\Gamma _3\).

Analogously, to get \(\Gamma _1\), we solve Eq. (20) for \((\lambda , p)\) in \(\varOmega _1\) with the condition that for some \(t\), (\(\lambda (t),p(t))= ({{\bar{\lambda }}},{{\bar{p}}})\). \(\square\)

Lemma 3

The leader’s optimal strategy in the open loop form \(\mathbf{q }_{1,OL}\) fulfils for a.e. \(t\)

$$\begin{aligned} \mathbf{q }_{1, OL}(t) = {\left\{ \begin{array}{ll} 0 & \quad p(t) \le {\bar{p}}\\ p(t) - c - \frac{2}{3} s \lambda (t) &{} \quad \text {otherwise,} \end{array}\right. } \end{aligned}$$
(27)

where \({{\bar{p}}}\) is given by (23).

Proof

Immediate by Lemma 1 and Theorem  2. \(\square\)

Theorem 4

There is a unique leader’s optimal strategy in the open loop form.

The equilibrium production is given by

$$\begin{aligned} \begin{aligned} \mathbf{q }_{1, OL}(t)&= {\left\{ \begin{array}{ll} 0 &{} \text {for } 0 \le t< {\bar{t}} \text { and } p_{0}< {\bar{p}}, \\ \mathbf{q }^{*}_{1,OL} + \Big ( {\bar{p}} - c - \frac{2}{3}s{\bar{\lambda }} - \mathbf{q }^{*}_{1,OL} \Big )\cdot {{\,\mathrm{e}\,}}^{\mu _2(t-{\bar{t}})} &{}\text {for } t \ge {\bar{t}} \text { and } p_{0} < {\bar{p}}, \\ \mathbf{q }^{*}_{1,OL} + \Big ( p_{0} - c - \frac{2}{3}s\lambda _{0} - \mathbf{q }^{*}_{1,OL} \Big )\cdot {{\,\mathrm{e}\,}}^{\mu _2 t} &{} \text {for } p_{0} \ge {\bar{p}}, \end{array}\right. } \end{aligned} \end{aligned}$$
(28)

where

$$\begin{aligned} \lambda _{0} = \lambda ^{*} - \frac{3}{8s^2}(\mathbf{p }^* - p_0)(3r + 10s - \sqrt{3}\sqrt{3r^{2} + 20rs + 28s^{2}}), \ {\bar{t}} = \frac{1}{s} \ln \Bigg ( \frac{3p_0 - (2A + c)}{3{\bar{p}} - (2A + c)} \Bigg ), \end{aligned}$$
(29)

\(\mu _2\) is given by (26), \({\bar{p}}\) and \({\bar{\lambda }}\) are given by (23) and (24) and \(\lambda ^*\) , \(\mathbf{p }^*_{OL}\) and \(\mathbf{q }^*_{1,OL}\) by (21) and (22).

The equilibrium price level is given by

$$\begin{aligned} \begin{aligned} \mathbf{p }_{OL}(t)&= {\left\{ \begin{array}{ll} \frac{2A + c}{3}+\Big ( p_{0} - \frac{2A + c}{3} \Big )\cdot {{\,\mathrm{e}\,}}^{-st} &{} \text {for } 0 \le t< {\bar{t}} \text { and } p_{0}< {\bar{p}}, \\ \mathbf{p }^*_{OL} + \Big ( {\bar{p}} - \mathbf{p }^*_{OL} \Big )\cdot {{\,\mathrm{e}\,}}^{\mu _2(t-{\bar{t}})} &{} \text {for } t \ge {\bar{t}} \text { and } p_{0} < {\bar{p}}, \\ \mathbf{p }^*_{OL} + \Big ( p_{0} - \mathbf{p }^*_{OL} \Big )\cdot {{\,\mathrm{e}\,}}^{\mu _2 t} &{} \text {for } p_{0} \ge {\bar{p}}. \end{array}\right. } \end{aligned} \end{aligned}$$
(30)

The steady state \((\mathbf{p }^*_{OL},\mathbf{q }^*_{1,OL})\) is stable with respect to changes of \(p_0\).

Proof

We use Theorem 2 and Lemma 3 and solve the set of equations (20) along the stable manifold of the steady state \(\Gamma\).

\(\lambda _0\) corresponding to \(p_0\) is uniquely defined by the terminal condition and the condition \(\lambda (t)>0\), and it is such that \((\lambda _0,p_0)\) belongs to the stable manifold of the steady state. \(\square\)

We would like to emphasize that, although the steady state \((\lambda ^*,\mathbf{p }^*_{OL})\) is a saddle point, \(\mathbf{p }^*_{OL}\) and \(\mathbf{q }^*_{1,OL}\) are stable with respect to changes of the initial condition \(p_0\). This holds because the terminal condition for \(\lambda\) at infinity together with the positivity condition imply unique initial condition \(\lambda _0\) corresponding to \(p_0\) such that the trajectory is in the stable manifold of the steady state. Besides, the costate variable \(\lambda\) is only an auxiliary variable that has to exist and it shouldn’t be treated in the same way as the actual state variable \(p\).

4 Feedback strategies

The next problem we want to solve is the optimization of the leader for the feedback information structure, i.e. the problem in which the set of controls of the leader is the set of functions \(q_1:{\mathbb {R}} \rightarrow {\mathbb {R}}_+\), with price as the argument, such that Eq. (7) with \(q_1(t)\) replaced by \(q_1(p(t))\) has a unique absolutely continuous solution. In the feedback approach, the problem is solved more generally for arbitrary values of the initial condition.

We recall that for the Cournot duopoly considering the feedback information structure leads to results which are not equivalent to results of considering the open loop information structure and even the steady states are not equivalent (see Cellini and Lambertini 2004; Fershtman and Kamien 1987; Wiszniewska-Matyszkiel et al. 2015).

To calculate the myopic-follower Stackelberg equilibrium assuming the feedback form of leader’s strategies, we use the standard sufficient condition using Bellman or Hamilton-Jacobi-Bellman (HJB) equation (see e.g. Dockner 2000; Fleming and Soner 2006; Zabczyk 2009) which returns the auxiliary value function, i.e., a function \(W:{\mathbb {R}}_+\rightarrow {\mathbb {R}}\) such that for every \(p\), \(W(p)\) is the optimal payoff of the leader if the initial price is \(p\).

In our case, the sufficient condition for a continuously differentiable function \(W\) to be the value function is the Bellman equation for every price \(p\in {\mathbb {R}}_+\)

$$\begin{aligned} rW(p) = \sup \limits _{q_{1}\ge 0} \Bigg \{ pq_{1}-cq_{1}-\frac{1}{2}q_{1}^{2}+ \frac{\partial W(p)}{\partial p} \frac{s(2A+c-3p-2q_1)}{3} \Bigg \} \end{aligned}$$
(31)

with the terminal condition \(\limsup \limits _{t\rightarrow \infty }{{\,\mathrm{e}\,}}^{-rt}W(p(t)) = 0\) for every admissible price trajectory \(p\).

If \(W\) is the value function, then every \(\mathbf{q }_{1,F}\) that maximizes the rhs. of the Bellman equation, i.e., that for every price \(p\in {\mathbb {R}}_+\), fulfils

$$\begin{aligned} \mathbf{q }_{1,F}(p) \in \mathop {{\mathrm{Argmax}}}\limits \limits _{q_1\ge 0} \Bigg \{ pq_{1}-cq_{1}-\frac{1}{2}q_{1}^{2}+ \frac{\partial W(p)}{\partial p} \frac{s(2A+c-3p-2q_1)}{3} \Bigg \}, \end{aligned}$$
(32)

is an optimal control.

Theorem 5

The value function of this optimization problem is defined by

$$\begin{aligned} \begin{aligned} W(p)&= {\left\{ \begin{array}{ll} \frac{\alpha }{2}p^{2} + \beta p + \gamma &{} \text {for } p \ge {\tilde{p}}, \\ \Big ( \frac{2A+c}{3} - p \Big )^{-\frac{r}{s}} \Big ( \frac{2A+c}{3} - {\tilde{p}} \Big )^{\frac{r}{s}} \Big ( \frac{\alpha }{2}{\tilde{p}}^{2} + \beta {\tilde{p}} + \gamma \Big ) &{} \text {for } p < {\tilde{p}}, \end{array}\right. } \end{aligned} \end{aligned}$$
(33)

where

$$\begin{aligned} \begin{aligned} \alpha&=\frac{3(3r+10s - \sqrt{3}\sqrt{3r^{2}+20rs+28s^{2}})}{8s^{2}}>0,\\ \beta&= \frac{3(s(2A+3c)\alpha -3c)}{9r+s(15-4s\alpha )},\\ \gamma&= \frac{9c^{2}+2s\beta (3(2A+3c)+2s\beta )}{18r},\\ {\tilde{p}}&= \frac{3c+2s\beta }{3-2s\alpha }>c, \end{aligned} \end{aligned}$$
(34)

and it is nonnegative, increasing, continuous and continuously differentiable.

The feedback optimal solution is defined by

$$\begin{aligned} \begin{aligned}&\mathbf{q }_{1,F}(p)= {\left\{ \begin{array}{ll} \Big ( 1-\frac{2}{3}s\alpha \Big ) p - c - \frac{2}{3}s\beta & \quad \text {for } p > {\tilde{p}}, \\ 0 & \quad \text {for } p \le {\tilde{p}}, \end{array}\right. } \end{aligned} \end{aligned}$$
(35)

strictly increasing whenever positive. The corresponding price trajectory is defined by

$$\begin{aligned} \begin{aligned}&\mathbf{p }_{F}(t)= {\left\{ \begin{array}{ll} \frac{2A+c}{3}+\Big ( p_{0} - \frac{2A+c}{3} \Big )\cdot {{\,\mathrm{e}\,}}^{-st} &{} \text {for } 0 \le t< {\tilde{t}} \text { and } p_{0}< {\tilde{p}}, \\ \frac{3(2A+3c)+4s\beta }{15-4s\alpha } + \Big ( {\tilde{p}} - \frac{3(2A+3c) +4s\beta }{15-4s\alpha } \Big )\cdot {{\,\mathrm{e}\,}}^{\frac{s}{9}(4s\alpha -15)(t-{\tilde{t}})} &{} \text {for } t \ge {\tilde{t}} \text { and } p_{0} < {\tilde{p}}, \\ \frac{3(2A+3c)+4s\beta }{15-4s\alpha } + \Big ( p_{0} - \frac{3(2A+3c) +4s\beta }{15-4s\alpha } \Big )\cdot {{\,\mathrm{e}\,}}^{\frac{s}{9}(4s\alpha -15)t} &{} \text {for } p_{0} \ge {\tilde{p}}, \end{array}\right. } \end{aligned} \end{aligned}$$
(36)

where

$$\begin{aligned} {\tilde{t}}=\frac{1}{s} \ln \Bigg ( \frac{3p_{0}-(2A+c)}{3{\tilde{p}}-(2A+c)} \Bigg ). \end{aligned}$$
(37)

The price \({{\tilde{p}}}\) is the price at which the leader starts production, while the time instant \({{\tilde{t}}}\) is the time instant at which the optimal price trajectory originating from \(p_0\) attains the level \({{\tilde{p}}}\).

Proof

The methodology of finding the value function is analogous to Wiszniewska-Matyszkiel et al. (2015). First, we solve the analogous dynamic optimization without constraint on \(q_1\). We assume that

$$\begin{aligned} W(p) = W^+(p) := \frac{\alpha }{2}p^{2} + \beta p + \gamma \text { .} \end{aligned}$$
(38)

for some \(\alpha\), \(\beta\), \(\gamma\). We substitute the leader’s production maximizing the right hand size of (31)

$$\begin{aligned} q_1 = p-c-\frac{2s}{3} \frac{\partial W}{\partial p} \end{aligned}$$
(39)

and we obtain

$$\begin{aligned} rW = \frac{1}{2} (p-c)^{2} + \frac{s}{3}(2A+3c-5p) \frac{\partial W}{\partial p} +\frac{2s^{2}}{9}\Bigg ( \frac{\partial W}{\partial p} \Bigg )^{2}. \end{aligned}$$
(40)

This leads to solving the following equation

$$\begin{aligned} \begin{aligned}&\Bigg [ -\frac{1}{2} + \frac{3r+10s}{6} \alpha - \frac{2s^{2}}{9}\alpha ^{2} \Bigg ] \cdot p^{2} \\&\quad + \Bigg [ c - \frac{s}{3}(2A+3c) \alpha + \Big ( r + \frac{5s}{3} - \frac{4s^{2}}{9}\alpha \Big )\beta \Bigg ] \cdot p\\&\quad + r\gamma -\frac{1}{2}c^{2} - \frac{s}{3}(2A+3c)\beta - \frac{2s^{2}}{9}\beta ^{2} = 0. \end{aligned} \end{aligned}$$
(41)

Simple calculations yield (34) and another solution differing by plus sign before the square root in \(\alpha\). We choose \(\alpha\) with minus sign, because the other solution results in \(q_1\) decreasing in price and negative above \({{\tilde{p}}}\), so we cannot treat it as a good candidate for the optimal strategy.

Next, we return to the initial problem. By that moment, we haven’t considered the constraint \(q_1 \ge 0\). We calculate the price \({\tilde{p}}\) at which \(q_1\) given by (39) is equal to zero.

We replace \(q_1(p)\) by 0 for \(p \le {\tilde{p}}\) and we proceed to prove that the modified strategy is optimal.

The resulting trajectory of price for \(p_0 < {\tilde{p}}\) as long as \(p\) stays in this set, is as follows.

$$\begin{aligned} p(t) = \frac{2A+c}{3} + \Bigg ( p_{0} - \frac{2A+c}{3} \Bigg ) {{\,\mathrm{e}\,}}^{-st} \quad \text {for } t \text { such that } p(t) \le {\tilde{p}}. \end{aligned}$$
(42)

Let \({\tilde{t}}\) be the time needed for the price described by equation (42) to reach the level \({\tilde{p}}\). Such \({\tilde{t}}\) exists, because for \(p<{\tilde{p}}\), the price is increasing, since \({\dot{p}}(t)>0\) and immediate transformation of (42) proves that it is as in (37).

If the modified \(q_1\) given by (35) is the optimal control and the value function is as in (38) for \(p \ge {\tilde{p}}\), then the value function for \(p<{\tilde{p}}\) is \(W({\tilde{p}})\) discounted from the time instant \({\tilde{t}}\) to time instant 0.

$$\begin{aligned} W(p) = W^-(p)&:= \int \limits _{0}^{\infty }{{\,\mathrm{e}\,}}^{-rt} \big [(p(t)-c)q_{1}(p(t))-\frac{1}{2}q_{1}^{2}(p(t)) \big ]dt \\&= \int \limits _{0}^{{\tilde{t}}} {{\,\mathrm{e}\,}}^{-rt} \big [(p(t)-c)q_{1}(p(t))-\frac{1}{2}q_{1}^{2}(p(t)) \big ]dt\\&\quad +\int \limits _{{\tilde{t}}}^{\infty }{{\,\mathrm{e}\,}}^{-rt} \big [(p(t)-c)q_{1}(p(t))-\frac{1}{2}q_{1}^{2}(p(t)) \big ]dt \\&= \int \limits _{0}^{{\tilde{t}}}0dt +\int \limits _{{\tilde{t}}}^{\infty }{{\,\mathrm{e}\,}}^{-rt} \big [(p(t)-c)q_{1}(p(t))-\frac{1}{2}q_{1}^{2}(p(t)) \big ]dt \\&= {{\,\mathrm{e}\,}}^{-r{\tilde{t}}}W^+({\tilde{p}}). \end{aligned}$$

Substitution of \({{\tilde{t}}}\) yields \(W^-(p)=\Big ( \frac{2A+c}{3} - p \Big )^{-\frac{r}{s}} \Big ( \frac{2A+c}{3} - {\tilde{p}} \Big )^{\frac{r}{s}} \Big ( \frac{\alpha }{2}{\tilde{p}}^{2} + \beta {\tilde{p}} + \gamma \Big ) .\)

In order to check sufficiency, besides checking the Bellman equation and the terminal condition, we have to prove that \(W(p)= {\left\{ \begin{array}{ll} W^+(p) &{} \text {for } p \ge {\tilde{p}}, \\ W^-(p) &{} \text {otherwise}, \end{array}\right. }\) is continuously differentiable.

Continuity of \(W\) is straightforward as \(W^+({\tilde{p}}) = W^-({\tilde{p}})\). We focus on proving the continuity of the derivative, which is not obvious at \({\tilde{p}}\).

Since the Bellman equation is fulfilled for \({\tilde{p}}\) and \(W\) is continuous, we have

$$\begin{aligned} (W^+)'({\tilde{p}}) = \frac{r}{s} \cdot \frac{3}{2A+c-3p}W^+({\tilde{p}}) = \frac{r}{s} \cdot \frac{3}{2A+c-3p}W^-({\tilde{p}}) = (W^-)'({\tilde{p}}). \end{aligned}$$

To prove monotonicity and nonnegativity, we first prove that \((W^+)'\) is positive for \(p\ge {{\tilde{p}}}\). Since \(\alpha >0\), \((W^+)'\) is strictly increasing, so, it is enough to check it at \({{\tilde{p}}}\). \(W^-\) is positive since it is a discounted value of \(W^+({{\tilde{p}}})\), which is positive, since it is equal to its derivative at this point multiplied by a positive constant \(\frac{r}{s} \cdot \frac{3}{2A+c-3{{\tilde{p}}}}\). Analogously, \((W^-)({{\tilde{p}}})'\) is positive, since it is equal to \(W^-({{\tilde{p}}})\) multiplied by a positive constant \(\frac{s}{r} \cdot \frac{2A+c-3{{\tilde{p}}}}{3}\).

Consider an arbitrary control \(q_1\) and the corresponding trajectory \(p\). Nonnegativity of \(W\) implies that \(\limsup \limits _{t\rightarrow \infty } W(p(t)){{\,\mathrm{e}\,}}^{-rt}\ge 0\). Noting that if \(p(t)>A\), then \(p'(t)<0\) until \(p(t)=A\) implies that \(\limsup \limits _{t\rightarrow \infty } W(p(t)){{\,\mathrm{e}\,}}^{-rt}\le 0\), which completes the proof that \(W\) is the value function.

Since \({\mathbf {q}}_{1,F}\) is defined as the zero-derivative point of the maximized function in the rhs. of the Bellman equation whenever it is nonnegative, zero otherwise, and the maximized function is strictly concave in \(q_1\), it is the optimal control.

Next, we calculate the trajectory corresponding to the optimal control.

The equation defining it whenever \(p(t)\ge {\tilde{p}}\) is

$$\begin{aligned}&{\dot{p}}(t)=\frac{s(2A+c-3p(t)-2\mathbf{q }_{1,F}(p(t)))}{3} =\frac{s}{3} \Big (\frac{4}{3}s\alpha - 5 \Big ) p(t) + \frac{s}{3} \Big ( 2A + 3c + \frac{4}{3}s\beta \Big ). \end{aligned}$$

So, if \(p_{0}>{\tilde{p}}\), then the whole trajectory is given by

$$\begin{aligned} p(t)=\frac{3(2A+3c)+4s\beta }{15-4s\alpha } + \Bigg ( p_{0} - \frac{3(2A+3c)+4s\beta }{15-4s\alpha } \Bigg ) {{\,\mathrm{e}\,}}^{\frac{s}{9}(4s\alpha -15)t}, \end{aligned}$$
(43)

while if we consider a trajectory with \(p_0 <{{\tilde{p}}}\) which reaches \({{\tilde{p}}}\) at time \({{\tilde{t}}}\), then for \(t\ge {{\tilde{t}}}\)

$$\begin{aligned} p(t)=\frac{3(2A+3c)+4s\beta }{15-4s\alpha } + \Bigg ( {{\tilde{p}}} - \frac{3(2A+3c)+4s\beta }{15-4s\alpha } \Bigg ) {{\,\mathrm{e}\,}}^{\frac{s}{9}(4s\alpha -15)(t-{{\tilde{t}}})}. \end{aligned}$$
(44)

For \(p_{0} \le {\tilde{p}}\) and \(t < {{\tilde{t}}}\)

$$\begin{aligned} {\dot{p}}(t) = -sp(t) + \frac{s}{3} \Big ( 2A+c \Big ), \end{aligned}$$

which implies

$$\begin{aligned} p(t)=\frac{2A+c}{3}+\Big ( p_{0} - \frac{2A+c}{3} \Big )\cdot {{\,\mathrm{e}\,}}^{-st}. \end{aligned}$$
(45)

\(\square\)

5 Open loop and feedback myopic-follower equilibria, limitations of the model and self-verification of the follower’s false beliefs in the game

After solving both problems, we present the solutions graphically.

In Fig. 2, we present productions of both firms, compared to their static Stackelberg equilibrium strategies and the static Cournot-Nash equilibrium production level, while in Fig. 3, the equilibrium price compared to static Stackelberg and Cournot-Nash price.

Fig. 2
figure 2

Production trajectory for the model parameters \(A=10\), \(c=1\), \(r=0.15\), \(s=0.5\) and \(p_0=1.1\). The red solid line corresponds to the leader, while the blue dashed line to the follower. The dashed horizontal lines correspond to static equilibria productions—from the bottom: the Stackelberg follower, a Cournot competitor, the Stackelberg leader. The dashed vertical line corresponds to the moment at which the leader starts production, \({{\bar{t}}}={{\tilde{t}}}\)

Fig. 3
figure 3

Price trajectory for the model parameters \(A=10\), \(c=1\), \(r=0.15\), \(s=0.5\) and \(p_0=1.1\)—the solid line. The horizontal lines correspond to the following levels of price, from the bottom: \({{\bar{p}}}={{\tilde{p}}}\), static Stackelberg equilibrium, static Cournot equilibrium. The dashed vertical line corresponds to the moment at which the leader starts production, \({{\bar{t}}}={{\tilde{t}}}\)

As we can see, the open loop and feedback solutions coincide. It is not only a property for a specific set of data, but a general principle. This is different from the situation observed for the analogous dynamic Cournot-Nash equilibrium, in which feedback strategies at the corresponding trajectory are larger than the open loop strategies, with the opposite inequality for prices, as it has been proven in Wiszniewska-Matyszkiel et al. (2015). This coincidence in our model is a result of myopia of the follower. We can formally write the following theorem.

Theorem 6

The open loop and feedback myopic-follower Stackelberg equilibrium trajectories coincide, and the open loop optimal strategy of the leader coincides with his optimal feedback strategy along this trajectory, and the same applies to the follower’s best response and

$$\begin{aligned} {{\tilde{p}}}={{\bar{p}}}< \mathbf{p }^{*}_{F}=\mathbf{p }^{*}_{OL}<p^{SB} . \end{aligned}$$
(46)

Proof

By substitution of the constants from Eqs (34) and (37) to \({\mathbf {p}}_{F}\) from Eq. (36) and meticulous simplifying and then by substitution of those constants and \({\mathbf {p}}_{F}\) to \({\mathbf {q}}_{1,F}\) and simplifying.

The last inequality in (46) simplifies to \(A>c\), the first one has already been proven in Theorem 2. \(\square\)

As we can see from Fig. 2, for small prices, until \({{\bar{t}}}\), the leader does not produce, waiting for the price to grow. At the same time interval, the follower has maximal production. Afterwards, the leader’s production continuously increases, while the follower’s production decreases. They intersect and they converge to their steady states, with the steady state of the leader above his static Stackelberg equilibrium level, and the steady state of the follower below his static Stackelberg equilibrium level. When prices are considered, the steady state of the dynamic equilibrium price is below the static Stackelberg equilibrium price, which can be also confirmed by analytic calculations.

Another interesting thing that can be seen from Fig. 2 is that the model does not behave as everybody can expect for \(p\) close to and below \(c\) (to make it visible, we started from \(p_0\) close to the minimal marginal cost \(c\)). While the leader does not produce, the follower observing the leader’s production, has a large constant level of production.

This leads us to the concept of self-verification. The fact that instead of the total payoff in the dynamic game, the follower maximizes only expected current payoff, may be caused by two different reasons.

  1. 1.

    The less realistic explanation is that the leader has already been at the market, while there are multiple unrelated follower firms—entrants—who do not know the leader’s pricing strategy, i.e., sticky prices with speed of price adjustment \(s\). Each of those follower firms exists only one time instant, at most one at each time instant. After obtaining the profit lower then expected, each follower firm resigns. And the profit is lower then expected, because if the leader offers a lower price, then the entrant has to decrease his price, too.

  2. 2.

    The follower is not conscious that his current choice influences future price. In this case, we have a game with distorted information, in which players have some beliefs on how their current decision influences future aggregates and values of the state variable, non-necessarily consistent with reality. Depending whether the beliefs are deterministic (realizations regarded as possible versus those impossible) or stochastic (a probability distribution on future realizations), there are two corresponding concepts of belief distorted Nash equilibrium, introduced in Wiszniewska-Matyszkiel (2016) and Wiszniewska-Matyszkiel (2017), respectively. A part of those concepts is self-verification of the equilibrium profiles which, briefly speaking, means that the beliefs influence the behaviour of the players such that the beliefs cannot be falsified by subsequent play. Moreover, the correct current value of the state variable and opponent’s behaviour is a part of both equilibrium concepts.

For steady-state initial prices, the follower’s belief of no influence on future prices is self-verifying and the current leader’s behaviour and price is guessed correctly. For lower prices, they do not have this property, which is especially visible for initial prices close to \(c\). This suggests that the analysis of a dynamic optimization model with sticky prices cannot be restricted to the steady-state only and it suggests that further studies are required to derive a model that behaves as expected also for small values of the initial price.

Moreover, we would like to emphasize some limitation of the sticky price models usually not emphasized and therefore, not perceived, since the focus is usually on their nice mathematical behaviour. The economic justification of introducing first sticky prices models was by the fact that prices below the static equilibrium level are often observed at real world markets and it is obtained by gradual increase of prices. Such a situation in a model can happen only if the initial market price does not exceed the steady state price (which in our case is slightly below the static Stackelberg equilibrium).

This assumption is not needed in any of the mathematical results and their proofs, which hold for arbitrary positive initial price.

Nevertheless, in economics, the reverse situation is unrealistic. As we can see from the dynamics of price for the equilibrium strategies in various sticky prices models, above the steady state price, there is a permanent excess supply. Sticky prices approach is related to behaviour of producer facing excess demand but constrained by e.g. menu costs. In reality, permanent excess supply and the resulting need to dispose the excess amount of product would cause qualitative change of behaviour of the producers, e.g. immediate reduction of price in order to sell the excess product.

So, if the initial price is above the steady state, which can happen if e.g. an entrant suddenly appears at a previously monopolistic market, an immediate reduction of price by the ex-monopolist can be expected. So, after the reduction, there will be no excess supply and the new initial price to the sticky prices dynamics will not exceed the steady state price.

6 Dependence on the speed of price adjustment

An interesting question is how the equilibria depend on the speed of price adjustment \(s\). In Fig. 4, we compare production levels for two different \(s\). We can see that increasing \(s\) results in faster switching on production of the leader, and faster growth of production at the beginning, but later convergence to a lower steady state. The opposite inequalities apply to the production of the follower. Analogous comparison of the price for various \(s\) in Fig. 5 reveals that this anomaly of production trajectories is not strong enough to affect anomalies in prices—the price at each time instant is a strictly increasing function of \(s\).

Fig. 4
figure 4

Dependence of productions on \(s\) for the model parameters \(A=10\), \(c=1\), \(r=0.15\) and \(p_0=1.1\). From left: \(s=0.25\), \(s=1\). The red solid line corresponds to the leader, while the blue dashed line to the follower. The dashed horizontal lines correspond to static Stackelberg equilibrium productions—from the bottom: the follower, the leader

Fig. 5
figure 5

Price trajectory for model parameters \(A=10\), \(c=1\), \(r=0.15\) and different values of \(s\). From the bottom \(s=0.25\), \(s=0.5\), \(s=1\)

7 The asymptotic values of the equilibria

In many previous works, e.g. Fershtman and Kamien (1987), Cellini and Lambertini (2004) and Wiszniewska-Matyszkiel et al. (2015), it has been proven that the feedback Cournot-Nash equilibrium does not converge to the static Cournot-Nash equilibrium when \(s\rightarrow \infty\), which corresponds to immediate price adjustment. So, an interesting question is what happens in our model as \(s\) tends to its limits, especially when \(s\rightarrow \infty\).

First, we recall the form of the steady states for the feedback equilibrium.

$$\begin{aligned} \mathbf{p }^{*}_{F}&=\frac{3(2A+3c)+4s\beta }{15-4s\alpha }=\frac{3r(2A+3c) +s(10a+11c)}{3(5r+7s)}=\mathbf{p }^{*}_{OL},\\ \mathbf{q }^{*}_{1,F}&=\Big ( 1 - \frac{2}{3}s\alpha \Big ) \frac{3(2A+3c) +4s\beta }{15-4s\alpha }-c-\frac{2}{3}s\beta =\frac{2(r+s)(A-c)}{5r+7s} =\mathbf{q }^{*}_{1,OL},\\ \mathbf{q }^{*}_{2,F}&=\frac{(A-c)}{3}-\frac{1}{3}\mathbf{q }^{*}_{1,F} =\frac{(3r+5s)(A-c)}{3(5r+7s)}=\mathbf{q }^{*}_{2,OL} \end{aligned}$$

Next, their limits as \(s\rightarrow 0\)

$$\begin{aligned} \lim \limits _{s\rightarrow 0}\mathbf{p }^{*}_{F}&=\lim \limits _{s\rightarrow 0}\mathbf{p }^{*}_{OL} =\lim \limits _{s\rightarrow 0}\frac{3r(2A+3c)+s(10a+11c)}{3(5r+7s)}=\frac{2A+3c}{5},\\ \lim \limits _{s\rightarrow 0}\mathbf{q }^{*}_{1,F}&=\lim \limits _{s\rightarrow 0}\mathbf{q }^{*}_{1,OL} =\lim \limits _{s\rightarrow 0}\frac{2(r+s)(A-c)}{5r+7s}=\frac{2(A-c)}{5},\\ \lim \limits _{s\rightarrow 0}\mathbf{q }^{*}_{2,F}&=\lim \limits _{s\rightarrow 0}\mathbf{q }^{*}_{2,OL} =\lim \limits _{s\rightarrow 0}\frac{(3r+5s)(A-c)}{3(5r+7s)}=\frac{A-c}{5}. \end{aligned}$$

Finally, their limits as \(s\rightarrow \infty\).

$$\begin{aligned} \lim \limits _{s\rightarrow \infty }\mathbf{p }^{*}_{F}&=\lim \limits _{s\rightarrow \infty }\mathbf{p }^{*}_{OL} =\lim \limits _{s\rightarrow 0}\frac{3r(2A+3c)+s(10a+11c)}{3(5r+7s)}=\frac{10A+11c}{21}=p^{SB},\\ \lim \limits _{s\rightarrow \infty }\mathbf{q }^{*}_{1,F}&=\lim \limits _{s\rightarrow \infty }\mathbf{q }^{*}_{1,OL} =\lim \limits _{s\rightarrow 0}\frac{2(r+s)(A-c)}{5r+7s}=\frac{2(A-c)}{7}=q_{1}^{SB},\\ \lim \limits _{s\rightarrow \infty }\mathbf{q }^{*}_{2,F}&=\lim \limits _{s\rightarrow \infty }\mathbf{q }^{*}_{2,OL} =\lim \limits _{s\rightarrow 0}\frac{(3r+5s)(A-c)}{3(5r+7s)}=\frac{5(A-c)}{21}=q_{2}^{SB}. \end{aligned}$$

For \(s<\infty\) \(\mathbf{p }^{*}_{OL}=\mathbf{p }^{*}_{F} < p^{SB}\), \(\mathbf{q }^{*}_{1,OL}=\mathbf{q }^{*}_{1,F}>q_{1}^{SB}\) and \(\mathbf{q }^{*}_{2,OL}=\mathbf{q }^{*}_{2,F}<q_{2}^{SB}\).

As we can see, all the values converge to their static Stackelberg analogues as the speed of adjustment tends to infinity, i.e. the immediate adjustment.

In Fig. 6, we present the steady state of productions of both firms, while in Fig. 7, the steady state of price. As we can see, the steady state production of the leader is decreasing in \(s\) and it converges to the static Stackelberg leader production from above, with the opposite inequalities for the follower, and the steady state price is increasing in \(s\) and it converges to the static Stackelberg price from below.

Fig. 6
figure 6

Asymptotic production levels as a function of the speed of price adjustment \(s\rightarrow \infty\) for the model parameters \(A=10\), \(c=1\), and \(r=0.15\). The red solid line corresponds to the leader, while the blue dashed line to the follower. The dashed horizontal lines correspond to static equilibria productions—from the bottom: the follower, a Cournot competitor, the leader

Fig. 7
figure 7

Asymptotic price level as a function of the speed of price adjustment \(s\rightarrow \infty\) for the model parameters \(A=10\), \(c=1\) and \(r=0.15\). The dashed horizontal lines correspond to static equilibria productions—from the bottom: Stackelberg, Cournot

8 Comparison to the Cournot model

Last, but not least, we want to compare our results with the results of the Cournot oligopoly case. The complete results for the Cournot model with sticky prices has been derived in Wiszniewska-Matyszkiel et al. (2015). We do not cite the exact values of constants, we only present the comparison graphically in Figs. 8 and 9, with a zoomed view of an initial time interval for better readability.

Fig. 8
figure 8

The myopic-follower Stackelberg leader optimal production compared to open loop and feedback equilibrium production of each player in the Cournot doupoly with the parameters \(A=10\), \(c=1\), \(r=0.15\), \(s=0.5\) and \(p_0=1.01\). The red solid line corresponds to the leader, the blue dashed line to the open loop Cournot case, the blue dotted line to the feedback Cournot case. A zoomed view at the right

Fig. 9
figure 9

The myopic-follower Stackelberg equilibrium price compared to the open loop and feedback equilibrium Cournot doupoly price with the parameters \(A=10\), \(c=1\), \(r=0.15\), \(s=0.5\) and \(p_0=1.01\). The red solid line corresponds to the myopic-follower Stackelberg case, the blue dashed line to the open loop Cournot case, the blue dotted line to the feedback Cournot case. A zoomed view at the right

As we can see, at the myopic-follower Stackelberg equilibrium, the leader starts the production later than the Cournot competitors in the feedback case and slightly before the Cournot competitors in the open loop case. Afterwards, his production first grows slower than that in both Cournot cases, then faster and, after intersecting the open loop equilibrium strategy twice and feedback equilibrium strategy once, it converges to a larger steady state. The myopic-follower Stackelberg price first grows slower, but afterwards it intersects the feedback Cournot price trajectory and converges to a steady state between the steady states of the feedback and open loop Cournot equilibrium price.

9 Conclusions

In this paper, we have extensively studied the model of a dynamic Stackelberg type duopoly at a market with price stickiness in which the follower is myopic, first proposed and partially studied by Fujiwara (2006), called myopic-follower Stackelberg model. We have analysed it both with open loop and feedback information structure of the leader. In this model, we have obtained convergence to a stable steady state with the price and follower’s production below while the leader’s production above their static Stackelberg levels. However, an interesting result can be observed for low initial prices, when the leader’s production is below the myopic follower’s production, and, if the initial price is low enough, the leader initially waits in order to increase it, while the follower produces maximally. This waiting time is for a longer time interval than for the feedback Cournot equilibrium. Interesting anomalies can be observed as the speed of adjustment changes, but the limits as it converges to infinity are equal to their static Stackelberg counterparts. Besides, unlike in the Cournot model, open loop and feedback solutions coincide.

The results of this paper concerning the behaviour of the follower for small prices show that the analysis of a dynamic game model with sticky prices cannot be restricted to the steady state only and it suggests that further studies are required to derive a model that behaves as expected also for small values of the initial price.

Therefore, an analogous analysis with a different model of the follower’s behaviour, observing rather the price than the leader’s behaviour is an obvious future continuation of this paper. In such a case, we can introduce more myopic followers, being price takers, which results in ”a cartel and a fringe” models (see e.g. Groot et al. 2003 or Benchekroun and Withagen 2012 for applications of such differential game models).