1 Introduction

In this paper, we develop an economic model of a commodity market where a representative producer interacts with a representative processor who buys the commodity and transforms it into a final product sold to the retail market (think of crude oil production transformed into gasoline or wheat transformed into bread). For the sake of simplicity, the processor will be referred to as consumer from now on. In our model, the production and the consumption rates are described as Itô processes driven each by an independent Brownian motion and whose coefficients are controlled by, respectively, the producer and the consumer. We stress that in our model the producer can control, in particular, the volatility of the production rate (by investing in devices making the production more reliable), and similarly the consumer can control the one of the consumption rate (by investing in storage devices, for instance). Further, the players are risk-averse (see below for details) and they are linked by a financial derivative in the commodity, a plain forward agreement on price and volume exchanged. For some motivations on the control of volatility, we refer the reader to the paper by Aïd et al. [2], which focuses on the interaction between a producer controlling the drift of the spot price and a trader controlling the volatility, and exchanging a quadratic derivative. In that paper, it was shown that when the trader is short in the derivative, he would increase the volatility of the spot price in order to get a higher price of the derivative sold to the producer. In the present setting, we are interested in the joint effect of the costs of controlling the volatility of production or consumption rates and the players’ risk aversion parameters on the “agreement indifference price”. Indeed, when only one player has market power, the effect of the parameters on the forward price is clear. On the other hand, when the two players interact, the joint effect is not obvious. In this paper, we are interested in the outcome of the combined effect on the forward price of the relative risk aversions and the volatility control costs of the producer and the consumer.

Both players have market power on the spot price of the commodity: the spot price depends linearly on production and consumption rates so that the higher the rate of production, the lower the spot price and the higher the rate of consumption, the higher the spot price. Furthermore, they agree to exchange a forward contract with finite maturity T over a certain quantity \(\lambda \) of the commodity that will be determined at equilibrium together with its price F. This setting is inspired from the seminal papers of Allaz [3] and Allaz and Villa [4], where the authors establish the mitigating effect of forward agreement on the exercise of producers market power.

In our framework, since production and consumption rates are driven by two independent Brownian motions and there is only one tradable risky asset, i.e. the commodity spot price, the market is incomplete. Therefore, we define the forward price in the spirit of the indifference pricing approach (see the paper [18] for an overview and [7] for an application to power markets). The players’ goal is to maximize their respective objective functionals, which are expectations of the following main components: the profit from selling, the sourcing costs (only for the consumer), the costs from exerting the controls, the forward contract payoff and, finally, the integrated variance of the market price of the derivative.

The latter component describes the risk aversion both players have towards their financial position. More precisely, in this context where the agents can control the volatility of their state variable, the modelling of their risk aversion using utility functions (e.g. exponential utility) would lead to nonlinear PDEs which are difficult to handle. Hence, for technical convenience we turn to a sort of dynamic mean-variance criterion leading to the objective functionals described above. Mathematically speaking we are dealing with a two-player stochastic differential game with objective functionals of McKean–Vlasov type, i.e. depending on the laws of the state variables. Economically speaking, it means that both players act as speculators on the forward market, as they disconnect their forward position from their production or transformation profit. Although this feature of our model originates from a computational limitation induced by the linear-quadratic McKean–Vlasov game setting, there exists some evidence, documented by a stream of the economic literature, that large commodity players can act as speculators on their markets (see [11] for such evidence and references on the subject of financiarisation of commodity markets).

This modeling approach for the risk aversion has been already investigated and used for portfolio selection by Zhou and Li [31] and more recently by Ismael and Pham [23] and Lefebvre et al. [28]. Moreover, due to the fast development of mean-field games as a new framework to study stochastic differential games for a large number of players since the seminal papers by Lasry and Lions [25,26,27] and Huang et al. [22] (see also [8] for a survey), there has been a regain of interest for control problems of McKean–Vlasov dynamics. The latter, also known as mean field control, corresponds in some way to the limit of a sequence of stochastic control problems for a regulator willing to optimize the average expected payoff of a group of agents interacting through the empirical distribution of their states (see [24] and the two-volume book [9]). In particular, the linear-quadratic case has been treated in Graber [17], Bensoussan et al. [6] and Basei and Pham [5]. Recently, stochastic differential games with both state dynamics and objective functionals of McKean–Vlasov type has been addressed in, e.g., Miller and Pham [30], Cosso and Pham [12] and also Fu and Horst [16] for a Stackelberg game arising from an optimal portfolio liquidation problem. Although a large number of applications in economics and finance have been developed with mean field games and mean field control, the applications of games with finitely many players and McKean–Vlasov dynamics and objective functionals in economics is much more recent, hence less developed (see, e.g., Aïd et al. [1]).

We will analyse the model along the following program: first we will find a Nash equilibrium for a fixed quantity \(\lambda \) of the commodity exchanged through the forward contract with fixed price F; second, we will compute the indifference prices of the forward contract for the two players separately (they are going to depend on \(\lambda \)); third, we will compute the quantity \(\lambda \) such that the two prices are equal, hence making the exchange compatible with the equilibrium found in the first step. This price will be called agreement indifference price.

This framework makes it possible to analyse the formation of the risk premium defined as the difference between the (unitary) agreement indifference price and the expected spot price of the commodity. The question of the determinants of the risk premium on commodity markets goes back (at least) to Keynes’s Treatise on Money, (1930). Keynes formulated the normal backwardation theory, i.e. the claim that forward prices should be lower than expected spot prices because risk-averse producers are willing to sell forward at a premium to avoid price risk. Presently, the hedging pressure theory (see [14, 19,20,21]) provides explanation of the sign of the risk premium depending on the relative size of population types in the market (producers, storers, speculators) and their risk-aversion (see [15] for a complete equilibrium model with mean-variance utility players explaining the different possible sign of the premia).

Mathematical results The main mathematical contribution of the paper (ref. Theorem 3.1) consists in a complete description of a Nash equilibrium in open loop strategies of the two-player stochastic differential game arising from the interaction model described above. More in detail, we adopt the following resolution approach: first, we prove a suitable version of a verification theorem exploiting the weak martingale optimality principle; second, the verification theorem and the linear-quadratic structure of the game allows to provide a semi-explicit form for the best response map; third, a Nash equilibrium is found as a fixed point of the best response map with closed-form expressions for the equilibrium strategies and payoffs of both players up to solving numerically a Riccati system of ODEs. Once we have a Nash equilibrium at our disposal, computing the corresponding agreement indifference price together with the exchanged quantity at equilibrium is a pretty straightforward task.

Economic insights First, we find that the forward agreement indifference price is higher (resp. lower) than the expected spot price when the producer is more (resp. less) risk-averse than the consumer. Because in our model, the players act as speculators on the forward market, a seller requires a higher forward price to enter in the agreement and a buyer asks for a lower price. The presence of market power of both players allows for the formation of an equilibrium. In that sense, our model is consistent with the economic intuition of the hedging pressure theory applied to a market populated with producers and consumers acting as speculators. Second, we observe that producers can achieve the same agreement indifference price and the same trading volume either by having high risk aversion and a low volatility control cost, or a low risk aversion and a high volatility control cost. This effect manifests itself whatever the relative risk aversion of the producer and the consumer or the relative costs of volatility control. Nevertheless, it is more apparent when the volatility control costs are low. Thus, to the list of determinants of the sign of the risk premium of forward commodity price, one could add the costs of reducing the production uncertainty. For commodity where storage is utmost costly like electricity, reducing production uncertainty is highly costly and thus, leads to higher risk premium.

Organization of the paper The paper is organised in the following way. The model is described in Sect. 2.1 together with the definition of a forward agreement indifference price and quantity in Sect. 2.2. The main result on the existence of a Nash equilibrium is given in Sect. 3. The proof of the main result is given is Sect. 4. Numerical results on the comparative static of the risk premium and the joint effect of risk aversion and volatility control costs are given in Sect. 5.

Notations We denote by \({\mathbb {R}}_+\) (respectively \({\mathbb {R}}_-\)) the closed semi-interval \([0, + \infty )\) (respectively \((-\infty , 0]\)). Given a function \(f:{\mathbb {R}} \rightarrow S\), with S a regular space, we denote its first derivative by \(f'.\) The expected value of a random variable X will be equivalently denoted by \({\mathbb {E}}[X]\), as usual, or by \(\bar{X}\), for brevity. Let \((\Omega ,{\mathcal {F}},{\mathbb {P}})\) be a probability space. Given a positive integer d, a strictly positive time horizon T and a filtration \({\mathbb {F}}:=({\mathcal {F}}_t)_{t \in [0,T]}\), we set

$$\begin{aligned} \begin{aligned}&L^2([0,T], {\mathbb {R}}^d) :=\left\{ \varphi : [0,T] \rightarrow {\mathbb {R}}^d,\text { s.t. } \varphi \text { is measurable and }\int _0^T|\varphi _t|^2 dt< \infty \right\} , \\&L^{\infty }([0,T], {\mathbb {R}}^d) :=\left\{ \varphi : [0,T] \rightarrow {\mathbb {R}}^d,\text { s.t. } \varphi \text { is measurable and }\sup _{t \in [0,T]}|\varphi _t|< \infty \right\} , \\&L^2_{{\mathcal {F}}_T}(\Omega , {\mathbb {R}}^d) :=\left\{ \psi : \Omega \rightarrow {\mathbb {R}}^d,\text { s.t. } \psi \text { is } {\mathcal {F}}_T\text {-measurable and }{\mathbb {E}}\left[ |\psi |^2 \right]< \infty \right\} ,\\&L^2_{{\mathbb {F}}}(\Omega \times [0,T], {\mathbb {R}}^d) :=\left\{ \eta : \Omega \times [0,T] \rightarrow {\mathbb {R}}^d,\text { s.t. } \eta \text { is } {\mathbb {F}}\text {-adapted and }{\mathbb {E}}\left[ \int _0^T|\eta _t|^2 dt \right]< \infty \right\} ,\\&S^2_{{\mathbb {F}}}(\Omega \times [0,T], {\mathbb {R}}^d) :=\left\{ \eta : \Omega \times [0,T] \rightarrow {\mathbb {R}}^d,\text { s.t. } \eta \text { is } {\mathbb {F}}\text {-adapted and }{\mathbb {E}}\left[ \sup _{t \in [0,T]}|\eta _t|^2 \right] < \infty \right\} . \end{aligned} \end{aligned}$$

2 The Model

We consider a stochastic game between a representative producer and a representative consumer. While the producer produces a good at a certain rate, the consumer buys the commodity and transforms it into a final good sold in the retail market.

2.1 Market Model

We consider a finite time window [0, T] and a probability space \((\Omega , {\mathcal {F}}, {\mathbb {P}}\)) endowed with a two-dimensional Brownian motion \((W,B)=\{(W_t,B_t)\}_{t \in [0,T]}\) and its natural filtration \({\mathbb {F}}=({\mathcal {F}}_t)_{t \in [0,T]}\) augmented with the \({\mathbb {P}}\)-null sets in \({\mathcal {F}}\). The production rate of the producer \(\{q_t\}_{t \in [0,T]}\) evolves according to a dynamics given by

$$\begin{aligned} dq_t = u_t dt+ z_t dW_t, \quad q_0>0, \end{aligned}$$

where \(\{u_t\}_{t \in [0,T]}\) and \(\{z_t\}_{t \in [0,T]}\) are the producer’s strategies. The associated instantaneous costs are \(\frac{k_p}{2}u_t^2\) and \(\frac{\ell _p}{2}(z_t-\sigma _p)^2\), respectively, with \(k_p, \ell _p \ge 0\) and where \(\sigma _p >0\) represents the nominal uncertainty in production without dedicated effort of the producer to reduce it. In a similar way, the consumption rate (or selling rate to the retail market) of the consumer, \(\{c_t\}_{t \in [0,T]}\), has dynamics given by

$$\begin{aligned} dc_t = v_t dt + y_t dB_t, \quad c_0>0. \end{aligned}$$

Here, \(\{v_t\}_{t \in [0,T]}\) and \(\{y_t\}_{t \in [0,T]}\) are the consumer’s strategies, and the associated instantaneous costs are, respectively, \(\frac{k_c}{2}v_t^2\) and \(\frac{\ell _c}{2}(y_t-\sigma _c)^2\), with \(k_c, \ell _c \ge 0\) and \(\sigma _c >0\). We assume a linear impact on the observed market price, \(\{S_t\}_{t \in [0,T]}\), namely \(\{S_t\}_{t \in [0,T]}\) evolves according to

$$\begin{aligned} S_t := s_0 - \rho _p q_t +\gamma \rho _c c_t, \qquad s_0>0 \end{aligned}$$

with \(\rho _p, \rho _c >0\) and \(\gamma >0\) (the role of \(\gamma \) will be clear in a few lines). The instantaneous profits at time t of the producer \(\pi ^p_t\) and of the consumer \(\pi ^c_t\) are given by:

$$\begin{aligned} \pi ^p_t:&= q_t S_t - \frac{k_p}{2}u_t^2 - \frac{\ell _p}{2}(z_t-\sigma _p)^2, \\ \pi ^c_t:&= c_t ( p_0 + p_1 S_t ) - \gamma c_t ( S_t + \delta ) - \frac{k_c}{2}v_t^2-\frac{\ell _c}{2}(y_t-\sigma _c)^2 , \end{aligned}$$

where \(c_t ( p_0 + p_1 S_t )\) is the income from selling the quantity \(c_t\) at the retail price \(p_0 + p_1 S_t\), a linear function of the commodity price, with \(p_0, p_1 > 0\) and \(\gamma c_t (S_t + \delta )\) represents the sourcing cost of buying the quantity \(\gamma c_t\) (which is used to obtain \(c_t\) to be sold) at price \( S_t\) plus the transformation cost \(\delta \), with \(\gamma , \delta > 0\). We assume \(\gamma > p_1\) to ensure the concavity of the objective functional of the consumer (i.e. the processor cannot charge increasing prices to final consumers without seeing the demand decreasing).

Remark 2.1

Our producer and consumer are large players as their actions have an effect on market prices. This is the reason why we did not impose any constraint on the relation between consumption and production: there could be other small producers and consumers present and so the consumption \(c_t\) might, in principle, be greater than \(q_t\). Moreover, we consider a commodity for which storage has a little effect on the price and in our framework we do not include neither capacity constraints nor consumption/production constraints for technical reasons.

The producer and the consumer exchange a forward contract of \(\lambda \) units of the commodity at a fixed amount of money \(F \in {\mathbb {R}}\). Both players aim at maximizing their respective objective functionals, which have two components: an expected profit term and a penalisation term modelling the player risk aversion (more comments below). In formulae, they are given by

$$\begin{aligned} J_p^{\lambda , F} (u, z;v,y)&:={\mathbb {E}} \big [ P^p_T \big ] - \eta _p \int _0^T {\mathbb {V}} \left[ \lambda S_t \right] dt, \qquad \eta _p >0, \end{aligned}$$
(2.1)
$$\begin{aligned} J_c^{\lambda , F}(v,y; u, z)&:={\mathbb {E}} \big [ P^c_T \big ] - \eta _c \int _0^T {\mathbb {V}} \left[ \lambda S_t \right] dt, \qquad \eta _c >0, \end{aligned}$$
(2.2)

where \({\mathbb {V}}\) stands for the variance and the process \(P^p_T\) (resp. \(P^c_T\)) represents the cumulative profit over the time period [0, T] of the producer (resp. the consumer), i.e.

$$\begin{aligned} P_T^p : = \int _0^T \pi _t^p dt + F - \lambda S_T, \quad P_T^c := \int _0^T \pi _t^c dt - F + \lambda S_T. \end{aligned}$$
(2.3)

The set of admissible strategies for the players is given by \({\mathcal {A}}^2:= {\mathcal {A}} \times {\mathcal {A}}\), where \({\mathcal {A}} = L_{{\mathbb {F}}}^2(\Omega \times [0,T], {\mathbb {R}}^2)\).

The way risk aversion is modelled and the choice of the derivative require two comments. First, a more standard way to take into account the players’ risk aversion would consist in using utility functions. In our case and with an exponential utility function, where players can control the volatility of their production and consumption rates, this approach would lead to Monge-Ampère PDEs, which are difficult to handle. For this reason, we turn to a different way to model risk aversion, which is reminiscent of what is done in mean-variance optimal dynamic portfolio choice (see [31] and more recently by Ismael and Pham [23] and Lefebvre et al. [28]). A similar approach was also previously used for distributed renewable energy development in Aïd et al. [1]. Second, we observe that the variance penalisation term involves only the derivative and not the profit from production or transformation. As already stated in the introduction, this representation of risk aversion transforms players into speculators on the forward market. Indeed, players only care about the variance of their financial position \(\lambda S_t - F\), not about their production or consumption profits. This modeling is motivated by the desire to remain in a framework where tractable solutions can be exhibited. Its sole consequence would be to reverse the sign of the risk premium: producers wish to sell at a lower price than the expected spot price whereas speculators want to sell at a higher price. For the sake of simplicity, we have chosen to consider only a static hedging position with a simple forward contract in order to analyse the risk premium between the forward “agreement indifference price” and the expected price at maturity (see Sect. 2.2 for a definition of the forward agreement indifference price).

To sum up, we deal with a two-player stochastic differential game of McKean–Vlasov linear-quadratic type. Hence, it is natural to look for Nash equilibria according to the following definition.

Definition 2.2

We call the couple \(\left( (u^*,z^*)^\top ,(v^*, y^*)^\top \right) \in {\mathcal {A}} \times {\mathcal {A}}\) a Nash equilibrium if

$$\begin{aligned}&J_p^{\lambda , F}(u^*,z^*;v^*,y^*)\ge J_p ^{\lambda , F}(u,z;v^*,y^*), \qquad \text { for all } (u,z)^\top \in {\mathcal {A}},\end{aligned}$$
(2.4)
$$\begin{aligned}&J_c^{\lambda , F}(v^*,y^*;u^*,z^*)\ge J_c^{\lambda , F}(v,y;u^*,z^*), \qquad \text { for all } (v,y)^\top \in {\mathcal {A}}. \end{aligned}$$
(2.5)

2.2 Equilibrium Forward Agreement

For a Nash equilibrium \((v^*,y^*;u^*,z^*)\), we denote by

$$\begin{aligned} J^* _c (\lambda , F)=J_c^{\lambda , F}(v^*,y^*;u^*,z^*), \qquad J^* _p (\lambda , F)=J_p^{\lambda , F}(u^*,z^*;v^*,y^*), \end{aligned}$$

the corresponding equilibrium payoffs of consumer and producer, respectively. They depend on the number of units \(\lambda \), on which the forward contract is written, and the respective forward price F. Both players determine their prices using the indifference pricing approach, namely the consumer computes \(F_c ^{\lambda ,*}\) as solution of \(J^* _c (\lambda , F) = J^* _c (0,0)\) and analogously for the producer, leading to a price \(F_p ^{\lambda ,*}\) as a solution of \(J^* _p (\lambda , F) = J^* _p (0,0)\). By linearity of the payoffs with respect to F, we get

$$\begin{aligned} J^* _c(\lambda ,F)= J^* _c(\lambda ,0) - F \quad \text {and} \quad J^* _p(\lambda ,F)= J^* _p(\lambda ,0) + F, \end{aligned}$$

yielding

$$\begin{aligned} F^{\lambda ,*} _c= J^* _c(\lambda ,0) - J^*_c (0,0), \quad \text {and} \quad F^{\lambda ,*} _p= J^* _p(0,0) - J^* _p (\lambda ,0). \end{aligned}$$

Thus, \( F_c ^{\lambda ,*}\) represents the maximum amount the consumer is willing to pay, while \(F_p ^{\lambda ,*}\) is the minimum amount the producer is willing to accept for selling a forward contract on \(\lambda \) units of the underlying. As a consequence, trading is possible if and only if

$$\begin{aligned} F_p ^{\lambda ,*} \le F_c ^{\lambda ,*}. \end{aligned}$$
(2.6)

We conclude this section with the definition of agreement indifference price.

Definition 2.3

Let \(\lambda ^*\) be the number of units of the underlying for which the two parties agree on the forward price, namely \( F_p ^{\lambda ^*,*} = F_c ^{\lambda ^*,*} \). We define the agreement indifference price as

$$\begin{aligned} {F}_{{\lambda }^*}^*:=F_p ^{\lambda ^*,*}=F_c ^{\lambda ^*,*}. \end{aligned}$$

In Sect. 5, we will provide some numerical illustrations on how the risk aversion parameters and the volatility control costs of the players might affect the quantity \(\lambda ^*\) as well as the corresponding agreement indifference price \({F}_{{\lambda }^*}^*\).

3 Nash Equilibrium

In this section we state and comment the main result of the paper. In particular we show that a Nash equilibrium exists and we characterize the corresponding strategies and payoffs in a semi-explicit way. Its proof will be given in full detail in the next section.

3.1 Main Result

Let us start with some useful notation: for \(t \in [0,T]\),

$$\begin{aligned}&K_p(t)=-\frac{k_p}{2}\sqrt{\frac{2(\rho _p+\eta _p \lambda ^2 \rho _p^2)}{k_p}}\tanh \left( \sqrt{\frac{2(\rho _p+\eta _p \lambda ^2 \rho _p^2)}{k_p}} (T-t)\right) , \end{aligned}$$
(3.1)
$$\begin{aligned}&\Lambda _p(t)= -\frac{k_p}{2}\sqrt{\frac{2\rho _p}{k_p}}\tanh \left( \sqrt{\frac{2 \rho _p }{k_p}} (T-t)\right) , \end{aligned}$$
(3.2)
$$\begin{aligned}&K_c(t)=-\frac{k_c}{2}\sqrt{\frac{2(\gamma \rho _c(\gamma -p_1)+\eta _c \lambda ^2 \gamma ^2\rho _c^2)}{k_c}}\nonumber \\&\qquad \qquad \qquad \times \tanh \left( \sqrt{\frac{2(\gamma \rho _c(\gamma -p_1)+\eta _c \lambda ^2 \gamma ^2\rho _c^2)}{k_c}} (T-t)\right) , \end{aligned}$$
(3.3)
$$\begin{aligned}&\Lambda _c(t)=-\frac{k_c}{2}\sqrt{\frac{2\gamma \rho _c(\gamma -p_1)}{k_c}}\tanh \left( \sqrt{\frac{2 \gamma \rho _c(\gamma -p_1) }{k_c}} (T-t)\right) , \end{aligned}$$
(3.4)
$$\begin{aligned} \Xi&=\begin{pmatrix} 0 &{} -\rho _p\gamma \rho _c \eta _p \lambda ^2-\frac{\gamma \rho _c}{2} \\ -\rho _p\gamma \rho _c \eta _c \lambda ^2 -\frac{\rho _p(\gamma -p_1)}{2}&{} 0 \end{pmatrix}, \\ \widehat{\Xi }&= \begin{pmatrix} 0 &{} -\frac{\gamma \rho _c}{2} \\ -\frac{\rho _p(\gamma -p_1)}{2}&{} 0 \end{pmatrix}, \quad R =\begin{pmatrix} -\frac{2}{k_p} &{} 0 \\ 0 &{} -\frac{2}{k_c} \end{pmatrix},\\ \Phi (t)&= \begin{pmatrix} -\frac{2}{k_p}K_p(t) &{} 0 \\ 0 &{} -\frac{2}{k_c}K_c(t) \end{pmatrix},\quad \widehat{\Phi }(t) = \begin{pmatrix} -\frac{2}{k_p}\Lambda _p(t) &{} 0 \\ 0 &{} -\frac{2}{k_c}\Lambda _c(t) \end{pmatrix},\\ \Psi&= \begin{pmatrix} -s_0/2 \\ -\frac{p_0+p_1 s_0-\gamma (\delta +s_0)}{2} \end{pmatrix}. \end{aligned}$$

Furthermore, let us introduce the following system of ODEs defined on \(t \in [0,T]\):

$$\begin{aligned} \begin{aligned} \left\{ \begin{array}{ll} \pi '(t)=\Xi +\Phi (t) \pi (t)+\pi (t)\Phi (t)+\pi (t)R\pi (t),&{} \pi (T)=0,\\ \widehat{\pi }'(t)=\widehat{\Xi }+\widehat{\Phi }(t) \widehat{\pi }(t)+ \widehat{\pi }(t)\widehat{\Phi }(t)+\widehat{\pi }(t)R\widehat{\pi }(t),&{} \widehat{\pi }(T)=0, \end{array}\right. \end{aligned} \end{aligned}$$
(3.5)
$$\begin{aligned} \begin{aligned} \begin{array}{ll} dh(t)=\Big \{\big [\widehat{\pi }(t)R+\widehat{\Phi }(t)\big ]h(t)+\Psi \Big \}dt,&h(T)=\frac{1}{2}\lambda (\rho _p,\gamma \rho _c)^\top , \end{array} \end{aligned} \end{aligned}$$
(3.6)

and let us denote by \(T_{max}\) the right end of the maximal interval where the system (3.5) admits a unique solution according to Picard-Lindelöf Theorem (see, e.g., Coddington and Levinson [13, Ch. I, Theorem 2.3], which can be applied by standard time-inversion).

Theorem 3.1

Assume that the following conditions hold:

(A1):

\(T < T_{max}\),

(A2):

\(\ell _p-2(K_p(t)+\pi _{11}(t)) >0\) and \(\ell _c-2(K_c(t)+\pi _{22}(t))>0\), for all \(t \in [0,T]\).

Then,

  1. 1.

    there exists a Nash equilibrium \(((u^*,z^*)^{\top },(v^*,y^*)^\top ) \in {\mathcal {A}}^2\) in the following feedback form

    $$\begin{aligned} u^*_t= & {} \frac{2}{k_p} \Big [(K_p(t)+\pi _{11}(t))(q_t-{\bar{q}}_t)+\pi _{12}(t)(c_t-{\bar{c}}_t)+(\Lambda _p(t)\nonumber \\&+\widehat{\pi }_{11}(t)){\bar{q}}_t+\widehat{\pi }_{12}(t){\bar{c}}_t +h_1(t)\Big ],\nonumber \\ z^*(t)= & {} \frac{\sigma _p \ell _p}{\ell _p-2(K_p(t)+\pi _{11}(t))},\nonumber \\ v^*_t= & {} \frac{2}{k_c} \Big [(K_c(t)+\pi _{22}(t))(c_t-{\bar{c}}_t)+\pi _{21}(t)(q_t-{\bar{q}}_t)+(\Lambda _c(t)\nonumber \\&+\widehat{\pi }_{22}(t)){\bar{c}}_t+\widehat{\pi }_{21}(t){\bar{q}}_t +h_2(t)\Big ],\nonumber \\ y^*(t)= & {} \frac{\sigma _c \ell _c}{\ell _c-2(K_c(t)+\pi _{22}(t))}. \end{aligned}$$
    (3.7)
  2. 2.

    The equilibrium payoffs satisfy

    $$\begin{aligned} J_p^*(\lambda ,F)&= \Lambda _p(0) q_0^2 + 2 {\bar{Y}}^p_0 q_0 + R_p(0) + F - \lambda s_0 - \frac{1}{2} \ell _p \sigma _p^2 T, \end{aligned}$$
    (3.8)
    $$\begin{aligned} J_c^*(\lambda ,F)&= \Lambda _c(0) c_0^2 + 2 {\bar{Y}}^c_0 c_0 + R_c(0) - F + \lambda s_0 - \frac{1}{2} \ell _c \sigma _c^2 T, \end{aligned}$$
    (3.9)
    $$\begin{aligned} {\bar{Y}}^p_t&= \widehat{\pi }_{11} {\bar{q}}_t + \widehat{\pi }_{12} {\bar{c}}_t + h_1, \quad {\bar{Y}}^c_t = \widehat{\pi }_{22} {\bar{c}}_t + \widehat{\pi }_{21} {\bar{q}}_t + h_2, \end{aligned}$$
    (3.10)

    where

    $$\begin{aligned} R_p(0)&=R^{(\lambda )}_p(0)= \int _0^T \left[ \frac{2}{k_p} \mathbb {E}[(Y^p_u)^2] -\eta _p \lambda ^2 \gamma ^2\rho _c^2 \mathbb {V}[c_u]\right. \nonumber \\&\quad \left. +\frac{2\big ( \pi _{11}(u) z^*_u + \frac{\ell _p \sigma _p}{2}\big )^2}{\ell _p-2K_p(u)} \right] du - \lambda \gamma \rho _c \mathbb {E}[c_T], \nonumber \\ R_c(0)&=R^{(\lambda )}_c(0)= \int _0^T \left[ \frac{2}{k_c} \mathbb {E}[(Y^c_u)^2] -\eta _c \lambda ^2 \rho _p^2 \mathbb {V}[q_u]\right. \nonumber \\&\quad \left. + \frac{2\big ( \pi _{22}(u) y^*_u + \frac{\ell _c \sigma _c}{2}\big )^2}{\ell _c-2K_c(u)} \right] du - \lambda \rho _p \mathbb {E}[q_T]. \end{aligned}$$
    (3.11)

See Appendix 1 for the details on the computations of the quantities involved in the definition of \(R_p\) and \(R_c\).

3.2 Comments

  1. 1.

    Although our model is close to the one presented in Miller and Pham [30], it is not possible to directly exploit their results, since their hypotheses (H2)(a) and (H2)(d) are not satisfied in our case. Therefore, in order to be self contained, we decided to prove a suitable verification theorem from scratch.

  2. 2.

    We observe that the functions \(\Lambda _{i}\), \(i \in \{ p, c \}\), do not depend on \(\lambda \). It is also the case for the functions \(\widehat{\pi }_{ij}\). Furthermore, the functions \(h_i\), \(i=1,2\), are linear in \(\lambda \) because they depend on it only by their terminal conditions. Besides, they are also nondecreasing in \(\lambda \). Thus, the average production and consumption rates, \({\bar{q}}_t\) and \({\bar{c}}_t\), which satisfy

    $$\begin{aligned} d{\bar{q}}_t&= {\bar{u}}^*_t dt = \frac{2}{k_p} \Big [ \big ( \Lambda _p(t) +\widehat{\pi }_{11}(t) \big ) {\bar{q}}_t + \widehat{\pi }_{12}(t) {\bar{c}}_t + h_1(t) \Big ]dt , \\ d{\bar{c}}_t&= {\bar{v}}^*_t dt = \frac{2}{k_c} \Big [ \big ( \Lambda _c (t) +\widehat{\pi }_{22} (t) \big ) {\bar{c}}_t + \widehat{\pi }_{21} (t) {\bar{q}}_t + h_2 (t) \Big ]dt, \end{aligned}$$

    are increasing in \(\lambda \). As the terminal conditions of \(h_{i}\), \(i\in \{p,c\}\), depend only on \(\lambda \) and on the parameters \(\rho _p\) and \(\gamma \rho _c\), the resulting effect on the average spot price \({\bar{S}}_t = s_0 - \rho _p {\bar{q}}_t + \gamma \rho _c {\bar{c}}_t\) only depends on the relative market power of the producer and the consumer. Thus, if \(\gamma \rho _c < \rho _p\) (resp. \(\rho _p < \gamma \rho _c\)), when the quantity of the commodity \(\lambda \) of the producer increases, the average spot price decreases (resp. increases).

  3. 3.

    The functions \(\Lambda _p\), \(\Lambda _c\) and the \(\widehat{\pi }_{ij}\) do not depend on the risk aversion parameters \(\eta _p\) and \(\eta _c\), therefore the average production and consumption rates do not depend on them either, as one could expect. Regarding the volatilities, while it is clear that \(K_p\) and \(K_c\) are nondecreasing in \(\eta _p\) and \(\eta _c\), respectively, it is not so obvious what to expect for \(\pi _{11}\) and \(\pi _{22}\), and thus to deduce the effect of risk-aversion on \(z^*\) and \(y^*\). However, one can find numerically that the higher the risk aversions of the players, the lower the volatilities, even in the absence of forward agreement. Nevertheless, it is possible to provide more insight on this issue when the producer has no market power, i.e. \(\rho _p=0\), and the consumer does have some, i.e. \(\gamma \rho _c>0\). In this case, the price process appears as exogenously driven for the producer and as a controlled variable for the consumer. Hence \(K_p = \Lambda _p = 0\) and \(K_c <0\), \(\Lambda _c<0\). Further, if \(\rho _p=0\), then \(\pi _{21} = 0\), leading to \(\pi _{11} = 0\) due to \(K_p = 0\), and it holds also that \(\pi _{22}=0\) and \(\widehat{\pi }_{11}= \widehat{\pi }_{21}= 0\). Thus, \(z^*= \sigma _p\) and the producer does not reduce her volatility. On the other hand, the production does covariate with consumption. Indeed, in Theorem 3.1, the Nash equilibrium consumer’s strategies depend on the state variables only via \(c_t - {\bar{c}}_t\) and \({\bar{c}}_t\):

    $$\begin{aligned}&u^*_t = \frac{2}{k_p}\left\{ \pi _{12}(t) (c_t - {\bar{c}}_t ) + \widehat{\pi }_{12} (t) {\bar{c}}_t + h_1 (t) \right\} , \quad z^*_t = \sigma _p, \\&v^*_t = \frac{2}{k_c} \left\{ K_c (t) (c_t - {\bar{c}}_t) + \Lambda _c (t){\bar{c}}_t + h_2 (t) \right\} , \quad y^*_t = \frac{\sigma _c \ell _c}{\ell _c - 2 K_c(t)} < \sigma _c. \end{aligned}$$

    Finally, since \(K_c(t)\) is nonincreasing in \(\lambda \), the higher the exposure to the financial risk coming from the forward contract, the more the consumer reduces his volatility, as the intuition predicts.

  4. 4.

    Exploiting Theorem 3.1-2., we can specify more precisely the nonlinear equations giving the forward agreement values \(F^*_{{\lambda }^*}\) and \(\lambda ^*\). Indeed, it holds that (see equations (3.8) and (3.9))

    $$\begin{aligned}&J_p^*(\lambda ,F) = \Lambda _p(0) q_0^2 + 2 {\bar{Y}}^{p (\lambda )}_0 q_0 + R_p^{(\lambda )}(0) + F - \lambda s_0 - \frac{1}{2} \ell _p \sigma _p^2 T, \\&J_c^*(\lambda ,F) = \Lambda _c(0) c_0^2 + 2 {\bar{Y}}^{c(\lambda )}_0 c_0 + R_c^{(\lambda )}(0) - F + \lambda s_0 - \frac{1}{2} \ell _c \sigma _c^2 T, \end{aligned}$$

    where the superscript \((\lambda )\) is used to emphasize the dependency on the number of options traded. We can isolate the parts \(j_p(\lambda ,F)\) and \(j_c(\lambda ,F)\) depending on \(\lambda \) and F, defined as

    $$\begin{aligned}&j_p(\lambda ,F) = 2 h_1^{(\lambda )}(0) q_0 + R^{(\lambda )}_p(0) + F - \lambda s_0, \\&j_c(\lambda ,F) = 2 h_2^{(\lambda )}(0) c_0 + R^{(\lambda )}_c(0) - F + \lambda s_0. \end{aligned}$$

    Thus, for a fixed \(\lambda \) the indifference prices \(F_p^{\lambda ,*}\) and \(F_c^{\lambda ,*}\) are given by

    $$\begin{aligned}&2 h_1^{(0)}(0) q_0 + R^{(0)}_p(0) = 2 h_1^{(\lambda )}(0) q_0 + R^{(\lambda )}_p(0) + F_p^{\lambda ,*} - \lambda s_0, \\&2 h_2^{(0)}(0) c_0 + R^{(0)}_c(0) = 2 h_2^{(\lambda )}(0) c_0 + R^{(\lambda )}_c(0) - F_c^{\lambda ,*} + \lambda s_0. \end{aligned}$$

    Thus, if it exists, the equilibrium price should be given by \(F_p^{\lambda ^*,*}\) \(=\) \(F_c^{\lambda ^*,*}={F}_{{\lambda }^*}^*\), i.e.,

    $$\begin{aligned}&2 (h_1^{(0)}(0) -h_1^{(\lambda ^*)}(0)) q_0 + R^{(0)}_p(0) - R^{(\lambda ^*)}_p(0) \\&\quad = 2 (h_2^{(\lambda ^*)}(0) - h_2^{(0)}(0) ) c_0 + R^{(\lambda ^*)}_c(0) - R^{(0)}_c(0), \end{aligned}$$

    or, equivalently,

    $$\begin{aligned}&2 h_1^{(0)}(0) q_0 + 2 h_2^{(0)}(0) c_0 + R^{(0)}_c(0) + R^{(0)}_p(0) \nonumber \\&\quad = 2 h_1^{(\lambda ^*)}(0) q_0 + 2 h_2^{(\lambda ^*)}(0) c_0 + R^{(\lambda ^*)}_c(0) + R^{(\lambda ^*)}_p(0), \end{aligned}$$
    (3.12)

    with \(R^{(\lambda ^*)}_p(0)\) and \(R^{(\lambda ^*)}_c(0)\) defined in Equation (3.11) and \(h^{(\lambda ^*)}\) in Equation (3.6).

The last remark speeds up considerably the computations for the plots that appear in the Sect. 5. Indeed, all the quantities that we need to compute can be obtained by solving numerically the ODEs presented in Appendix 1.

4 Proof of Theorem 3.1

4.1 The Solution Approach

We prove Theorem 3.1 following a methodology based on a combination of a suitable Verification Theorem and of the weak Martingale Optimality Principle. As already stressed in the first comment below Theorem 3.1, despite our model is very close to the class of games studied in Miller and Pham [30], their results cannot be applied directly here, therefore we had to adapt the methodology to our framework. We proceed through the following steps:

  1. 1)

    we compute the best response maps of both players;

  2. 2)

    we check that the system coming from the best response computations has a unique solution;

  3. 3)

    we get a Nash equilibrium as a fixed point of the best response map;

  4. 4)

    we verify that there exists a unique solution to the system characterizing the fixed point found in step 3).

4.2 Preliminary Reformulation of the Problem

For convenience, we introduce the following vector notation for the players’ strategies:

$$\begin{aligned}&\alpha =\left( (\alpha ^p)^\top ,(\alpha ^c)^\top \right) ^\top \in {\mathcal {A}}^2, \quad \alpha ^p:=\begin{pmatrix} u \\ z \end{pmatrix} =\left\{ \begin{pmatrix} u_t \\ z_t \end{pmatrix}\right\} _{t \in [0,T]}\\&\quad \text { and } \quad \alpha ^c:= \begin{pmatrix} v \\ y \end{pmatrix} = \left\{ \begin{pmatrix} v_t \\ y_t \end{pmatrix}\right\} _{t \in [0,T]}, \end{aligned}$$

so that the dynamics of the state variables can be rewritten as

$$\begin{aligned}&dq_t=e_1^{\top } \alpha ^p_t dt + e_2^\top \alpha ^p_t dW_t,\end{aligned}$$
(4.1)
$$\begin{aligned}&dc_t=e_1^\top \alpha ^c_t dt + e_2^\top \alpha ^c_t dB_t, \qquad t \in [0,T], \end{aligned}$$
(4.2)

with \( e_1^\top = (1, 0) \text { and } e_2^\top = (0, 1). \)

The following identity is exploited to get a suitable reformulation of our problem: using the dynamics of \(S_t\) and applying Fubini’s theorem, it is easy to see that

$$\begin{aligned} \int _0^T {\mathbb {V}} \left[ S_t \right] dt&= {\mathbb {E}} \left[ \int _0^T \left\{ \rho _p^2 (q_t - {\mathbb {E}} [q_t])^2 + \gamma ^2\rho _c^2 (c_t - {\mathbb {E}} [c_t])^2\right. \right. \nonumber \\&\quad \left. \left. -2 \rho _p \gamma \rho _c (q_t - {\mathbb {E}} [q_t])(c_t - {\mathbb {E}} [c_t]) \right\} dt\right] . \end{aligned}$$
(4.3)

Rearranging the terms in the expressions of the producer objective functional, we obtain

$$\begin{aligned} \begin{aligned} J_p^{\lambda , F}(u, z; v, y)&= \widetilde{J}_p^{\lambda }(u,z;v,y) +F -\lambda s_0 -\frac{\ell _p \sigma _p^2 T}{2}, \end{aligned} \end{aligned}$$

where

$$\begin{aligned} \begin{aligned} \widetilde{J}_p^{\lambda }(u,z;v,y) :=&\, {\mathbb {E}} \Bigg [ \int _0^T\bigg ( -(\rho _p +\eta _p \lambda ^2 \rho _p^2)(q_t-{\mathbb {E}} [q_t])^2 - \rho _p {\mathbb {E}} [q_t]^2+[s_0+\gamma \rho _c c_t\\&+2\rho _p \gamma \rho _c \eta _p\lambda ^2 (c_t- {\mathbb {E}} [c_t])] q_t - \frac{k_p}{2}u_t^2-\frac{\ell _p}{2}z_t^2\\&+ \ell _p \sigma _p z_t -\eta _p\lambda ^2 \gamma ^2\rho _c^2 (c_t -{\mathbb {E}} [c_t])^2 \bigg ) dt \\&+\lambda \rho _p q_T -\lambda \gamma \rho _c c_T \Bigg ]. \end{aligned} \end{aligned}$$
(4.4)

Then, neglecting the constant terms, we can study without loss of generality the equivalent formulation in which the producer aims at maximizing \(\widetilde{J}_p^{\lambda }(u,z;v,y) \).

Remark 4.1

Fixing a strategy \(\alpha ^p\) for the producer (resp. \(\alpha ^c\) for the consumer) is equivalent, from the perspective of the competitor, to fixing the corresponding state \(q^{\alpha ^p}\) (resp. \(c^{\alpha ^c}\)). Thus, with some abuse of notation we will write simply q (resp. c) when the strategy used is clear from the context. Moreover, to ease the notation, we will also omit the dependence on \({\bar{c}}\) and \({\bar{q}}\).

For a given consumption process \(\{c_t\}_{t \in [0,T]}\), we write

$$\begin{aligned} \begin{aligned} {\widetilde{J}}_p^\lambda (\alpha ^p;\alpha ^c)&={\widetilde{J}}_p^\lambda (\alpha ^p;c)\\&:={\mathbb {E}}\left[ \int _0^T f_p(t,q_t,{\mathbb {E}} [q_t], \alpha ^p_t, {\mathbb {E}}[\alpha ^p_t];c) dt + g_p(q_T,{\mathbb {E}} [q_T];c) \right] , \text { with }\\ f_p(t,q,\bar{q}, a_p, \bar{a}_p;c)&= Q_p (q-\bar{q})^2 + (Q_p+{\widetilde{Q}}_p)\bar{q}^2\\&\quad +2 M^{p}(c)_t q+a_p^\top N_p a_p +2H_p^\top a_p +T^{p}(c)_t, \\ g_p(q,\bar{q};c)&= 2L_pq + {\widetilde{T}}^{p}(c) , \end{aligned} \end{aligned}$$
(4.5)

where

$$\begin{aligned} \begin{aligned} Q_p&:= -\rho _p - \eta _p \lambda ^2 \rho _p^2, \quad {\widetilde{Q}}_p:= \eta _p \lambda ^2 \rho _p^2, \\ M^{p}(c)_t&:= \frac{s_0}{2} + \frac{\gamma \rho _c}{2}c_t + \rho _p \gamma \rho _c \eta _p \lambda ^2 (c_t - {\mathbb {E}} [c_t]),\\ N_p&:= \begin{pmatrix} -\frac{k_p}{2} &{} 0\\ 0 &{} -\frac{\ell _p}{2} \end{pmatrix}, H_p := \begin{pmatrix} 0 \\ \frac{\sigma _p \ell _p}{2} \end{pmatrix}, \\ T^{p}(c)_t&:= -\eta _p \lambda ^2 \gamma ^2\rho _c^2 (c_t-{\mathbb {E}}[c_t])^2, L_p := \frac{\rho _p \lambda }{2},\\ \text { and }\quad {\widetilde{T}}^{p}(c)&:=-{\lambda \gamma \rho _c}c_T. \end{aligned} \end{aligned}$$
(4.6)

Now, let us turn to the objective functional of the consumer. From (2.2) and (4.3), we have

$$\begin{aligned} \begin{aligned} {J}_c^{\lambda , F}&(v, y; u, z) = \widetilde{J}_c^{\lambda }(v,y;u,z)-F + \lambda s_0 -\frac{\ell _c \sigma _c^2T}{2}, \end{aligned} \end{aligned}$$

where

$$\begin{aligned} \begin{aligned}&\widetilde{J}_c^{\lambda }(v,y; u,z)\\&\quad := {\mathbb {E}}\Bigg [ \int _0^T\bigg ( -[\gamma \rho _c(\gamma -p_1)+\eta _c \lambda ^2 \gamma ^2\rho _c^2](c_t-{\mathbb {E}} [c_t])^2 - \gamma \rho _c(\gamma -p_1){\mathbb {E}}[c_t]^2 \\&\qquad +[ (p_0+s_0 p_1 -\gamma \delta -\gamma s_0) + \rho _p(\gamma -p_1) q_t+2 \rho _p \gamma \rho _c \eta _c \lambda ^2 (q_t-{\mathbb {E}} [q_t]) ] c_t \\&\qquad - \frac{k_c}{2}v_t^2-\frac{\ell _c}{2}y_t^2 + \sigma _c \ell _c y_t - \eta _c \lambda ^2 \rho _p^2 (q_t - {\mathbb {E}} [q_t])^2 \bigg ) dt -\lambda \rho _p q_T +\lambda \gamma \rho _c c_T\Bigg ]. \end{aligned} \end{aligned}$$
(4.7)

Analogously as above, let \(\{q_t\}_{t \in [0,T]}\) be a given production rate. We can write

$$\begin{aligned} \begin{aligned} {\widetilde{J}}_c^\lambda (\alpha ^c; \alpha ^p)&={\widetilde{J}}_c^\lambda (\alpha ^c; q) :={\mathbb {E}}\left[ \int _0^Tf_c\big (t,c_t,{\mathbb {E}} [c_t], \alpha ^c_t, {\mathbb {E}}[\alpha ^c_t];q\big ) dt\right. \\&\quad \left. + g_c(c_T,{\mathbb {E}} [c_T];q) \right] , \text { with }\\ f_c(t,c,\bar{c}, a_c, \bar{a}_c;q)&= Q_c (c-\bar{c})^2 + (Q_c+{\widetilde{Q}}_c)\bar{c}^2\\&\quad +2 M^{c}(q)_t c+a_c^\top N_c a_c +2H_c^\top a_c + T^{c}(q)_t, \\ g_c(c,\bar{c};q)&= 2L_cc + {\widetilde{T}}^{c}(q) , \end{aligned} \end{aligned}$$

and

$$\begin{aligned} \begin{aligned} Q_c&:=-\gamma \rho _c(\gamma -p_1)-\eta _c \lambda ^2\gamma ^2 \rho _c^2, \quad {\widetilde{Q}}_c:= \eta _c \lambda ^2 \gamma ^2 \rho _c^2, \quad N_c := \begin{pmatrix} -\frac{k_c}{2} &{} 0\\ 0 &{} -\frac{\ell _c}{2} \end{pmatrix},\\ M^{c}(q)_t&:= \frac{p_0+p_1 s_0-\gamma (s_0+\delta )}{2}+ \frac{\rho _p(\gamma -p_1)}{2}q_t+\rho _p \gamma \rho _c \eta _c \lambda ^2 (q_t - {\mathbb {E}} [q_t]), \\ H_c&:= \begin{pmatrix} 0 \\ \frac{\sigma _c \ell _c}{2} \end{pmatrix}, \quad T^{c}(q)_t:= -\eta _c \lambda ^2 \rho _p^2 (q_t-{\mathbb {E}}[q_t])^2, \quad L_c := \frac{ \lambda \gamma \rho _c}{2},\\ \text { and }\quad {\widetilde{T}}^{c}(q)&:=-{\lambda \rho _p}q_T. \end{aligned} \end{aligned}$$
(4.8)

Finally, we set

$$\begin{aligned}&V_p^\lambda (\alpha ^c) :=\sup _{\alpha ^p \in {\mathcal {A}}}{\widetilde{J}}_p^\lambda (\alpha ^p; \alpha ^c), \qquad \alpha ^c \in {\mathcal {A}}, \end{aligned}$$
(4.9)
$$\begin{aligned}&V_c^\lambda (\alpha ^p) :=\sup _{\alpha ^c \in {\mathcal {A}}}\widetilde{J}_c^\lambda (\alpha ^c; \alpha ^p), \qquad \alpha ^p \in {\mathcal {A}}. \end{aligned}$$
(4.10)

4.3 First Step: Computation of the Best Response Maps

The first step is focused on the computation of the best response map of each player. This is done by exploiting the following version of the Verification Theorem:

Theorem 4.2

(Verification Theorem) Fix a couple of strategies \(\beta ^p, \beta ^c \in {\mathcal {A}}\) for the producer and the consumer, respectively. Let \({\mathcal {W}}^{p,\alpha ^p}_t\) and \({\mathcal {W}}^{c,\alpha ^c}_t\) be defined as

$$\begin{aligned}&{\mathcal {W}}^{p,\alpha ^p}_t= w_t^p(q_t^{\alpha ^p}, {\mathbb {E}}[q_t^{\alpha ^p}]) , \quad {\mathcal {W}}^{c,\alpha ^c}_t= w_t^c(c_t^{\alpha ^c}, {\mathbb {E}}[c_t^{\alpha ^c}]), \quad t \in [0,T], \quad \alpha ^p, \alpha ^c \in {\mathcal {A}}, \end{aligned}$$
(4.11)

where the \({\mathbb {F}}\)-adapted random fields \(\{w_t^p(q,{\bar{q}}), t \in [0,T], q, {\bar{q}} \in {\mathbb {R}}\}\) and \(\{w_t^c(c,{\bar{c}}), t \in [0,T], c, {\bar{c}} \in {\mathbb {R}}\}\) satisfy the following growth conditions: \(\text { for all } t \in [0,T], \text { for all } x,\bar{x} \in {\mathbb {R}},\)

$$\begin{aligned}&|w_t^p(x,{\bar{x}})| \le C_p(\nu _t^p+|x|^2+|\bar{x}|^2), \qquad |w_t^c(x,{\bar{x}})| \le C_c(\nu _t^c+|x|^2+|\bar{x}|^2), \end{aligned}$$
(4.12)

for some constants \(C_p, C_c > 0\) and for some non-negative processes \(\nu ^p\) and \( \nu ^c\) such that

$$\begin{aligned} \sup _{t \in [0,T]}{\mathbb {E}}\left[ \nu _t^p+\nu _t^c\right] < \infty . \end{aligned}$$

Furthermore, we assume that the following conditions are fulfilled:

  1. (i)

    \({\mathbb {E}}[w_T^p(q_T^{\alpha ^p},\bar{q_T}^{\alpha ^p})]= {\mathbb {E}}[g_p(q_T^{\alpha ^p},\bar{q_T}^{\alpha ^p};c^{\beta ^c})]\) and \({\mathbb {E}}[w_T^c(c_T^{\alpha ^c},\bar{c_T}^{\alpha ^c})]= {\mathbb {E}}[g_c(c_T^{\alpha ^c},\bar{c_T}^{\alpha ^c};q^{\beta ^p})]\), for any \(\alpha ^p, \alpha ^c \in {\mathcal {A}}\).

  2. (ii)

    The application \([0,T] \ni t \mapsto {\mathbb {E}}[{\mathcal {S}}^{p,{\alpha ^p}}_t] \Big (\text {resp. } {\mathbb {E}}[{\mathcal {S}}^{c,{\alpha ^c}}_t]\Big )\) is well-defined and nonincreasing, for any \( \alpha ^p \in {\mathcal {A}}\) (resp. for any \( \alpha ^c \in {\mathcal {A}}\)), where:

    $$\begin{aligned} \begin{aligned}&{\mathcal {S}}^{p,{\alpha ^p}}_t = {\mathcal {W}}^{p,\alpha ^p}_t + \int _0^t f_p(s,q_s^{\alpha ^p},{\bar{q}}_s^{\alpha ^p}, \alpha ^p_s, \bar{\alpha }^p_s; c^{\beta ^c} )ds,\\&{\mathcal {S}}^{c,{\alpha ^c}}_t = {\mathcal {W}}^{c,\alpha ^c}_t + \int _0^t f_c(s,c_s^{\alpha ^c},{\bar{c}}_s^{\alpha ^c}, \alpha ^c_s, \bar{\alpha }^c_s; q^{\beta ^p} )ds. \end{aligned} \end{aligned}$$
    (4.13)
  3. (iii)

    For some \(\alpha ^{p,\star } \in {\mathcal {A}}\) and \(\alpha ^{c,\star } \in {\mathcal {A}}\), the application \([0,T] \ni t \mapsto {\mathbb {E}}[{\mathcal {S}}^{p,\alpha ^{p,\star }}_t] \Big (\text {resp. } {\mathbb {E}}[{\mathcal {S}}^{c,\alpha ^{c,\star }}_t]\Big )\) is constant.

Then, the control \(\alpha ^\star =(\alpha ^{p,\star },\alpha ^{c,\star })\) is the best response to the control \((\beta ^p,\beta ^c)\) meaning that

$$\begin{aligned}&\alpha ^{p,\star }= \mathbf {B}_p(\beta ^c):= {{\,\mathrm{arg\,max}\,}}_{\alpha ^p \in {\mathcal {A}}}{\widetilde{J}}_p^\lambda (\alpha ^p;\beta ^c), \nonumber \\&\quad \alpha ^{c,\star }= \mathbf {B}_c(\beta ^p):= {{\,\mathrm{arg\,max}\,}}_{\alpha ^c \in {\mathcal {A}}}{\widetilde{J}}_c^\lambda (\alpha ^c;\beta ^p), \end{aligned}$$
(4.14)

and

$$\begin{aligned} \widetilde{J}_p^\lambda (\alpha ^{p,\star };c^{\beta ^c}) =V^\lambda _p(\beta ^c) = {\mathbb {E}}[{\mathcal {W}}_0^{P, \alpha ^{p}}] \quad and \quad \widetilde{J}_c^\lambda (\alpha ^{c,\star };q^{\beta ^p}) =V^\lambda _c(\beta ^p) = {\mathbb {E}}[{\mathcal {W}}_0^{C, \alpha ^{c}}]. \end{aligned}$$

Finally, if \(\widetilde{\alpha }= (\widetilde{\alpha }^p, \widetilde{\alpha }^c)\) is another best response to the control \((\beta ^p,\beta ^c)\), then condition iii) holds also for \(\widetilde{\alpha }^p\) and \(\widetilde{\alpha }^c.\)

We define the best response map \({\textbf {B}}: {\mathcal {A}}^2 \rightarrow {\mathcal {A}}^2\) as \({\textbf {B}}:=({\textbf {B}}_p, {\textbf {B}}_c)\). The Nash equilibrium we find will be a fixed point of this map.

Once we have fixed the strategies \(\beta ^p \) and \(\beta ^c\) in \({\mathcal {A}}\), the first step can be divided into four sub-steps:

1.1:

Since the players objective functionals are quadratic, we propose a suitable candidate \(({\mathcal {W}}^{p,\alpha ^p}_t,{\mathcal {W}}^{c,\alpha ^c}_t)\) in feedback form.

1.2:

Applying Itô’s formula, we compute \(\frac{d}{dt} {\mathbb {E}} [{\mathcal {S}}^{p,\alpha ^p}_t]\) and \(\frac{d}{dt} {\mathbb {E}} [{\mathcal {S}}^{c,\alpha ^c}_t]\) corresponding to the candidate \(({\mathcal {W}}^{p,\alpha ^p}_t,{\mathcal {W}}^{c,\alpha ^c}_t)\).

1.3:

We postulate that the conditions of Theorem 4.2 are satisfied and get a system of backward SDEs involving the coefficients of the candidate \(({\mathcal {W}}^{p,\alpha ^p}_t,{\mathcal {W}}^{c,\alpha ^c}_t)\).

1.4:

We compute each player’s best response by looking for strategies cancelling the expectation of the drifts of the processes \({\mathcal {S}}^{p,\alpha ^p}_t\) and \({\mathcal {S}}^{c,\alpha ^c}_t\).

Sub-step 1.1

Given the quadratic nature of our objective functional, it seems natural to look for a family of processes \(({\mathcal {W}}^{p,\alpha ^p}_t,{\mathcal {W}}^{c,\alpha ^c}_t)_{t \in [0,T]}\) of the following form: \({\mathcal {W}}^{p,\alpha ^p}_t= w^p_t(q_t^{\alpha ^p}, {\mathbb {E}}[q_t^{\alpha ^p}])\) and \({\mathcal {W}}^{c,\alpha ^c}_t= w^c_t(c_t^{\alpha ^c}, {\mathbb {E}}[c_t^{\alpha ^c}])\), for some parametric adapted random field \(\{w_t^i(x, {\bar{x}}), t \in [0,T], x, {\bar{x}} \in {\mathbb {R}}\}, i \in \{p,c\},\) such that

$$\begin{aligned} w_t^i(x, {\bar{x}})=K_i(t)(x-{\bar{x}})^2+\Lambda _i(t){\bar{x}}^2+2Y^i_t x+ R_i(t), \end{aligned}$$

with \((K_i,\Lambda _i,Y^i,R_i) \in L^{\infty }([0,T], {\mathbb {R}}_-)^2\times S^2_{{\mathbb {F}}}(\Omega \times [0,T], {\mathbb {R}})\times L^{\infty }([0,T], {\mathbb {R}})\), \( i \in \{p,c\}\), solving the systems of ODEs and SDEs:

$$\begin{aligned} \begin{aligned}&\left\{ \begin{array}{lll} dK_p(t)&{}=K_p'(t)dt, &{} K_p(T)=0, \\ d\Lambda _p(t)&{}=\Lambda _p'(t)dt,&{} \Lambda _p(T)=0, \\ dY^p_t&{}=Y^p_t{}' dt+Z^{p,B}_tdB_t+Z^{p,W}_t dW_t,&{} Y^p_T=\frac{\lambda \rho _p}{2}, \\ dR_p(t)&{}=R_p'(t)dt, &{} R_p(T)=-\lambda \gamma \rho _c {\mathbb {E}} [c_T], \end{array}\right. \end{aligned} \end{aligned}$$
(4.15)
$$\begin{aligned} \begin{aligned}&\left\{ \begin{array}{lll} dK_c(t)&{}={K}_c'(t)dt, &{} K_c(T)=0, \\ d\Lambda _c(t)&{}={\Lambda }_c'(t)dt,&{} \Lambda _c(T)=0, \\ dY^c_t&{}={Y^c_t}' dt+Z^{c,B}_tdB_t+Z^{c,W}_tdW_t,&{} Y^c_T=\frac{\lambda \gamma \rho _c}{2}, \\ dR_c(t)&{}={R}_c'(t)dt, &{} R_c(T)= -\lambda \rho _p {\mathbb {E}} [q_T], \end{array}\right. \end{aligned} \end{aligned}$$
(4.16)

for some deterministic processes \(K_i', \Lambda _i', R_i'\) and for some \({\mathbb {F}}\)-adapted processes \({Y^i}', Z^{i,W}, Z^{i,B}\), \(i \in \{p, c\}\).

Sub-step 1.2 For the sake of simplicity, from now on, we explicitly develop only the producer case. The consumer problem can be studied in same way. Let \(t \in [0,T]\) and \(\alpha ^p \in {\mathcal {A}}.\) As in (4.13) in Theorem 4.2 (Verification Theorem), we set

$$\begin{aligned} {\mathcal {S}}^{p,\alpha ^p}_t= w_t^p(q_t^{\alpha ^p}, {\mathbb {E}}[ q_t^{\alpha ^p}])+ \int _0^t f_p(u,q_u^{\alpha ^p}, {\mathbb {E}}[{q_u^{\alpha ^p}}], \alpha _u^p, {\mathbb {E}} [{\alpha _u^p}]; c^{\beta ^c}) du. \end{aligned}$$

In the following, we write simply c instead of \(c^{\beta ^c}\) (resp. q instead of \(q^{\beta ^p}\)), when the strategies are clear from the context (see Remark 4.1). After some computations (see Appendix 1 for details), we obtain

$$\begin{aligned} \begin{aligned} \frac{d}{dt}{\mathbb {E}}[{\mathcal {S}}^{p,\alpha ^p}_t]&={\mathbb {E}} \Big [( {K}_p'(t) + Q_p)(q_t- {\mathbb {E}}[q_t])^2\\&\quad +( {\Lambda }_p'(t) +Q_p+ {\widetilde{Q}}_p){\mathbb {E}}[q_t]^2\\&\quad +2( {Y^p_t}' + M^{p}(c)_t)q_t\\&\quad + {R}_p'(t) + T^{p}(c)_t+ \chi ^p_t(\alpha ^p_t) \Big ], \end{aligned} \end{aligned}$$
(4.17)

where, for any \( t \in [0,T],\) we have set

$$\begin{aligned} \left\{ \begin{array}{l} \chi ^p_t(\alpha ^p_t):= (\alpha ^p_t)^\top S_p(t)\alpha ^p_t +2[U_p(t)(q_t-{\mathbb {E}}[q_t])+V_p(t)q_t+\xi ^p_t+\bar{\xi }^p_t+O_p(t)]^\top \alpha ^p(t) \\ S_p(t):=N_p+e_2K_p(t)e_2^\top \\ U_p(t):=K_p(t)e_1\\ V_p(t):=\Lambda _p(t)e_1\\ O_p(t):=H_p+ e_1{\mathbb {E}}[Y^p_t]+e_2{\mathbb {E}}[Z^{p,W}_t]\\ \xi ^p_t:= H_p+ e_1 Y^p_t+e_2 Z^{p,W}_t\\ \bar{\xi }^p_t:= H_p+ e_1{\mathbb {E}}[Y^p_t]+e_2{\mathbb {E}}[Z^{p,W}_t], \end{array}\right. \end{aligned}$$
(4.18)

where \(Q_p, \widetilde{Q}_p, M^{p}(c), N_p, H_p\) and \(T^{p}(c)\) are defined in Equation (4.6).

Sub-step 1.3 Now, we find conditions granting that assumptions i), ii) and iii) of Theorem 4.2, involving \({\mathcal {S}}^{p,\alpha ^p}\), hold. Suppose that the matrix \(S_p(t)\) is negative definite and thus invertible. We check this later, verifying that \(K_p(t) \le 0, \text { for all } t \in [0,T]\) (see Remark 4.5). We complete the squares and rewrite the Eq. (4.17) as

$$\begin{aligned} \begin{aligned} \frac{d}{dt}{\mathbb {E}}[{\mathcal {S}}^{p,\alpha ^p}_t]&={\mathbb {E}} \Big [ \big ( {K}_p'(t) + Q_p- U_p(t)^\top S_p(t)^{-1} U_p(t)\big )(q_t- {\mathbb {E}}[q_t])^2 \\&\quad + \big ( {\Lambda }_p'(t) +Q_p+ {\widetilde{Q}}_p- V_p(t)^\top S_p(t)^{-1}V_p(t)\big ) {\mathbb {E}}[q_t]^2\\&\quad +2 \big [ {Y^p_t}' + M^{p}(c)_t- U_p(t)^\top S_p(t)^{-1}( \xi ^p_t-\bar{ \xi }^p_t)- V_p(t)^\top S_p(t)^{-1} O_p(t)\big ]q_t\\&\quad + {R}_p'(t) + T^{p}(c)_t - (\xi ^p_t - \bar{\xi }^p_t)^\top S_p(t)^{-1}(\xi ^p_t - \bar{\xi }^p_t) - O_p(t)^\top S_p(t)^{-1} O_p(t)\\&\quad + (\alpha ^p_t-\eta ^p_t)^\top S_p(t)^{-1} (\alpha ^p_t- \eta ^p_t) \Big ], \end{aligned} \end{aligned}$$

where, for all \( t \in [0,T],\) we have defined

$$\begin{aligned} \eta ^p_t:=-S_p(t)^{-1}\left[ U_p(t)(q_t- {\mathbb {E}}[q_t])+V_p(t){\mathbb {E}}[q_t]+(\xi ^p_t-\bar{\xi }^p_t)+ O_p(t)\right] . \end{aligned}$$
(4.19)

Choosing processes \(K_p, \Lambda _p, Y^p\) and \(R_p\), whose existence is shown in the next sub-step, that solve the following system of BSDEs

$$\begin{aligned} \begin{aligned}&\left\{ \begin{array}{l} \begin{array}{ll} {K}_p'(t) + Q_p- U_p(t)^\top S_p(t)^{-1}U_p(t)=0,&{} K_p(T)=0, \end{array}\\ \begin{array}{ll} {\Lambda }_p'(t) +Q_p+ {\widetilde{Q}}_p- V_p(t)^\top S_p(t)^{-1}V_p(t) =0, &{} \Lambda _p(T)=0, \end{array}\\ \begin{array}{ll} dY^p_t=&{} \left[ - M^{p}(c)_t+ U_p(t)^\top S_p(t)^{-1}(\xi ^p_t - \bar{\xi }^p_t)+ V_p(t)^\top S_p(t)^{-1} O_p(t)\right] dt \\ &{}+ Z^{p,B}_tdB_t + Z^{p,W}_t dW_t,\\ Y^p_T=L_p,&{} \end{array}\\ \begin{array}{l} {R}_p'(t) + {\mathbb {E}} [ T^{p}(c)_t- (\xi ^p_t - \bar{\xi }^p_t)^\top S_p(t)^{-1}(\xi ^p_t - \bar{\xi }^p_t) - O_p(t)^\top S_p(t)^{-1} O_p(t) ] =0, \\ R_p(T) = {\mathbb {E}}[{\widetilde{T}}^{p}(c)], \end{array} \end{array}\right. \end{aligned} \end{aligned}$$
(4.20)

we obtain

$$\begin{aligned} \begin{aligned} \frac{d}{dt}{\mathbb {E}}[{\mathcal {S}}^{p,\alpha ^p}_t]&={\mathbb {E}} \Big [ (\alpha ^p_t-\eta ^p_t)^\top S_p(t)^{-1} (\alpha ^p_t-\eta ^p_t) \Big ], \end{aligned} \end{aligned}$$
(4.21)

which is clearly non-positive for all \(t \in [0,T]\), since \(S_p(t)\) (defined in Equation (4.18)) is negative definite for all \( t \in [0,T]\).

Remark 4.3

We stress the fact that the processes \(Y^p, Z^{p,W},Z^{p,B}\) and \(R_p\) depend only on the strategy of the consumer through the state process \(\{c_t\}_{t \in [0,T]}\), with \(c_t=c^{\beta ^c}_t, t \in [0,T]\), which is controlled only by \(\beta ^c.\) Thus, the feedback best response control are functions of different state variables, namely the best response for the producer is feedback in q and its expectation, whereas the best response for the consumer is feedback in c and its expectation.

Sub-step 1.4

Now we combine the results in the previous steps in order to get the best response maps.

Proposition 4.4

The best response maps are given by

$$\begin{aligned} \mathbf {B}_p(\beta ^c)_t&= -(N_p+e_2 K_p(t)e_2^\top )^{-1}[e_1 K_p(t)(q_t-{\mathbb {E}} [q_t])\nonumber \\&\qquad +e_1\Lambda _p(t) {\mathbb {E}} [q_t]+e_1 Y^p_t+e_2 Z^{p,W}_t + H_p ],\nonumber \\ \mathbf {B}_c(\beta ^p)_t&= -(N_c+e_2 K_c(t)e_2^\top )^{-1}[e_1 K_c(t)(c_t-{\mathbb {E}} [c_t])\nonumber \\&\qquad +e_1\Lambda _c(t) {\mathbb {E}} [c_t]+e_1 Y^c_t+e_2 Z^{c,B}_t +H_c], \end{aligned}$$
(4.22)

where the processes \((K_p,\Lambda _p,Y^p,R_p) \) and \((K_c,\Lambda _c,Y^c,R_c)\) above solve the following systems of backward ODEs and SDEs, given \(c_t=c_t^{\beta ^c}\) (respectively, given \(q_t=q_t^{\beta ^p}\)), \(t \in [0,T]\):

$$\begin{aligned} \begin{aligned}&\left\{ \begin{array}{l} \begin{array}{ll} K_p'(t) = -\frac{2}{k_p}K_p(t)^2+\rho _p + \eta _p \lambda ^2 \rho _p^2,&{} K_p(T)=0, \end{array} \\ \begin{array}{ll} \Lambda _p'(t) = -\frac{2}{k_p} \Lambda _p(t)^2 +\rho _p, &{} \Lambda _p(T)=0, \end{array}\\ \begin{array}{ll} dY^p_t&{}=-\left\{ \frac{s_0}{2}+\frac{\gamma \rho _c}{2} c_t +\rho _p \gamma \rho _c \eta _p \lambda ^2 (c_t-{\mathbb {E}} [c_t])+\frac{2}{k_p}\Big [K_p(t)\left( Y^p_t - {\mathbb {E}} [Y^p_t]\right) +\Lambda _p(t){\mathbb {E}}[Y^p_t]\Big ] \right\} dt\\ &{}\quad +Z^{p,B}_tdB_t+Z^{p,W}_tdW_t,\\ Y^p_T&{}=\frac{\lambda \rho _p}{2}, \end{array} \\ \begin{array}{l} R_p'(t) = \eta _p \lambda ^2 \gamma ^2\rho _c^2 {\mathbb {V}} [c_t] -\frac{2}{k_p}({\mathbb {V}} [Y^p_t]+{\mathbb {E}} [Y^p_t]^2)-\frac{2}{\ell _p-2K_p(t)}({\mathbb {V}}[Z^{p,W}_t]+({\mathbb {E}} [Z^{p,W}_t]+\frac{\ell _p \sigma _p}{2})^2), \\ R_p(T)= - \lambda \gamma \rho _c {\mathbb {E}} [c_T], \end{array} \end{array}\right. \end{aligned} \end{aligned}$$
(4.23)

and

$$\begin{aligned} \begin{aligned}&\left\{ \begin{array}{l} \begin{array}{ll} K_c'(t) = -\frac{2}{k_c}K_c(t)^2+\gamma \rho _c(\gamma -p_1)+\eta _c \lambda ^2 \gamma ^2 \rho _c^2=0,&{} K_c(T)=0, \end{array} \\ \begin{array}{ll} \Lambda _c'(t) = -\frac{2}{k_c} \Lambda _c(t)^2 + \gamma \rho _c(\gamma -p_1), &{} \Lambda _c(T)=0, \end{array} \\ \begin{array}{ll} dY^c_t&{}=-\Big \{\frac{p_0+p_1 s_0-\gamma (s_0+\delta )}{2}+\frac{\rho _p(\gamma -p_1)}{2}q_t+\rho _p \gamma \rho _c \eta _c \lambda ^2 (q_t - {\mathbb {E}} [q_t])+\frac{2}{k_c}\Big [K_c(t)(Y^c_t-{\mathbb {E}} [Y^c_t])\\ &{}\quad + \Lambda _c(t){\mathbb {E}}[Y^c_t]\Big ]\Big \}dt + Z^{c,B}_tdB_t+Z^{c,W}_tdW_t,\\ Y^c_T&{}=\frac{\lambda \gamma \rho _c}{2}, \end{array} \\ \begin{array}{ll} R_c'(t) =&{} \eta _c \lambda ^2 \rho _p^2 {\mathbb {V}} [q_t] -\frac{2}{k_c}({\mathbb {V}} [Y^c_t]+{\mathbb {E}} [Y^c_t]^2)-\frac{2}{\ell _c-2K_c(t)}[{\mathbb {V}}[Z^{c,B}_t] +({\mathbb {E}} [Z^{c,B}_t]+\frac{\ell _c \sigma _c}{2})^2], \\ R_c(T)=&{}-\lambda \rho _p {\mathbb {E}} [q_T]. \end{array} \end{array}\right. \end{aligned} \end{aligned}$$
(4.24)

So, we have

$$\begin{aligned} \widetilde{J}_p^\lambda (\mathbf {B}_p(\beta ^c); \beta ^c)=V_p^\lambda (\beta ^c) \text { and } \widetilde{J}_c^\lambda (\mathbf {B}_c(\beta ^p); \beta ^p)=V_c^\lambda (\beta ^p). \end{aligned}$$

Moreover, we have an explicit expression for the Nash equilibrium values which are given by

$$\begin{aligned} \begin{aligned}&V_p^\lambda (\beta ^c)= \Lambda _p(0) q_0^2 + 2 {\mathbb {E}}[Y^p_0] q_0 +R_p(0) \quad \text { and }\\&V_c^\lambda (\beta ^p)= \Lambda _c(0) c_0^2 + 2 {\mathbb {E}}[Y^c_0] c_0 +R_c(0) . \end{aligned} \end{aligned}$$
(4.25)

Remark 4.5

Notice that the first two equations in the systems (4.23) and (4.24) are one-dimensional Riccati differential equations, for which it is known that there exists a unique global solution given by Equation (3.1). In the following we face more complicated Riccati equations (non-symmetric matrix Riccati equations) for which existence of solutions is not guaranteed. The fact that \(K_p(t)\) and \(K_c(t)\) are given by a hyperbolic tangent with a positive argument multiplied by a negative constant yields that \(K_p(t)\le 0\) and \(K_c(t)\le 0\), granting that \(S_p(t)=N_p+e_2 K_p(t)e_2^\top \) and \(S_c(t)=N_c+e_2 K_c(t)e_2^\top \) are negative definite for all \(t \in [0,T],\) hence matching the assumptions made at the beginning of Sub-step 1.3.

Proof

To prove the proposition we need to apply Theorem 4.2. So, let us check that its hypotheses are fulfilled. Fix a couple of strategies \(\beta ^p, \beta ^c \in {\mathcal {A}}\). First of all, condition i) is a consequence of the terminal conditions of systems (4.23) and (4.24). Furthermore, we notice that assumption ii) is verified, for any \( \alpha ^p \in {\mathcal {A}}\) (resp. for any \( \alpha ^c \in {\mathcal {A}}\)), because the fact that the processes \((K_p,\Lambda _p,Y^p,R_p) \) and \((K_c,\Lambda _c,Y^c,R_c)\) solve the systems (4.23) and (4.24) yields that \(\frac{d}{dt}{\mathbb {E}} [{\mathcal {S}}^{p,\alpha ^p}_t]\) and \( \frac{d}{dt}{\mathbb {E}} [{\mathcal {S}}^{c,\alpha ^c}_t]\) are negative and so the monotonicity of the functions \([0,T] \ni t \mapsto E [{\mathcal {S}}^{p,\alpha ^p}_t] (\text {resp. }{\mathbb {E}} [{\mathcal {S}}^{c,\alpha ^c}_t])\). Then, by (4.21), we notice that, given \(\beta ^c \in {\mathcal {A}}\), \(\frac{d}{dt}{\mathbb {E}} [{\mathcal {S}}^{p,\alpha ^p}_t]=0, \text { for all } t \in [0,T],\) if and only if, for all \(t \in [0,T]\), we have

$$\begin{aligned} \alpha ^p_t=\eta ^p_t = -S_p(t)^{-1}\Big [U_p(t)(q_t- {\mathbb {E}}[q_t])-V_p(t){\mathbb {E}}[q_t]-(\xi ^p_t-\bar{\xi }^p_t)- O_p(t)\Big ], \quad {\mathbb {P}}\text {-a.s.}, \end{aligned}$$

and analogously, given \(\beta ^p \in {\mathcal {A}}\), \(0=\frac{d}{dt}{\mathbb {E}} [{\mathcal {S}}^{c,\alpha ^c}_t], \text { for all } t \in [0,T],\) if and only if, for all \(t \in [0,T]\), we have

$$\begin{aligned} \alpha ^c_t=\eta ^c_t =-S_c(t)^{-1}\left[ U_c(t)(c_t- {\mathbb {E}}[c_t])-V_c(t){\mathbb {E}}[c_t]-(\xi ^c_t-\bar{\xi }^c_t)- O_c(t)\right] , \quad {\mathbb {P}}\text {-a.s.} \end{aligned}$$

Hence, the strategies in (4.22) satisfy iii) as well.

Finally, let us check the admissibility of the strategies \(\mathbf {B}_p(\beta ^c)\) and \(\mathbf {B}_c(\beta ^p)\), i.e. \( \mathbf {B}_p(\beta ^c)\in {\mathcal {A}}\) and \(\mathbf {B}_c(\beta ^p) \in {\mathcal {A}}\). We need to verify their square-integrability. Let us check it for \(\mathbf {B}_p(\beta ^c)\), the same can be done for \(\mathbf {B}_c(\beta ^p).\) The state variable \(q=\{q_t\}_{t \in [0,T]}=\{q^{\mathbf {B}_p(\beta ^c)}(t)\}_{t \in [0,T]}\) is the solution of a linear SDE and so it satisfies \({\mathbb {E}}[\sup _{t \in [0,T]}|q_t|^2]<\infty \). Furthermore, \(S_p, U_p, V_p\), defined in (4.18), are bounded, being continuous matrix-valued functions over a finite time-interval, and the process \((O_p,\xi ^p)\) belongs to \(L^2([0,T], {\mathbb {R}}^2)\times L^2_{{\mathbb {F}}}(\Omega \times [0,T], {\mathbb {R}}^2)\). This implies that the feedback control \(\mathbf {B}_p(\beta ^c) \in L^2_{{\mathbb {F}}}(\Omega \times [0,T], {\mathbb {R}}^2) .\) \(\square \)

4.4 Second Step: Well-Posedness of the Best Response Map System

This subsection provides the proof of existence and uniqueness of solutions to the systems in (4.23) and (4.24),

$$\begin{aligned}&K_p,K_c,\Lambda _p \text { and }\Lambda _c \in L^{\infty }([0,T], {\mathbb {R}}_-), \quad R_p \text { and } R_c \in L^{\infty }([0,T], {\mathbb {R}}),\\&(Y^p,Z^{p,W}, Z^{p,B})\text { and }(Y^c,Z^{c,W}, Z^{c,B}) \in S^2_{{\mathbb {F}}}(\Omega \times [0,T], {\mathbb {R}})\times L^2_{{\mathbb {F}}}(\Omega \times [0,T], {\mathbb {R}})^2, \end{aligned}$$

given the state controlled by the other player.

The fact that there exist unique \(K_p,K_c,\Lambda _p,\Lambda _c \in L^{\infty }([0,T], {\mathbb {R}}_+)\) is straightforward (see Remark 4.5). We also have explicit formulae for them (see Equation (3.1)). Moreover the non-positivity of \( K_p\) and \(K_c\) implies that the matrices \(S_p\) and \(S_c \), defined in (4.18), are negative definite.

Now, consider the mean-field BSDE associated to the processes \((Y^p,Z^{p,W},Z^{p,B})\), given \(K_p\) and \(\Lambda _p:\)

$$\begin{aligned} \left\{ \begin{array}{l} dY^p_t =-\left\{ \frac{s_0}{2}+\frac{\gamma \rho _c}{2} c_t +\rho _p \gamma \rho _c \eta _p \lambda ^2 (c_t-{\mathbb {E}} [c_t])+\frac{2}{k_p}\left( K_p(t)(Y^p_t-{\mathbb {E}} [Y^p_t])+\Lambda _p(t){\mathbb {E}}[Y^p_t]\right) \right\} dt\\ +Z^{p,B}_tdB_t+Z^{p,W}_tdW_t,\\ Y^p_T=\frac{\lambda \rho _p}{2}. \end{array}\right. \end{aligned}$$

Existence and uniqueness of the solution \((Y^p,Z^{p,W},Z^{p,B})\in S^2_{{\mathbb {F}}}(\Omega \times [0,T], {\mathbb {R}})\times L^2_{{\mathbb {F}}}(\Omega \times [0,T], {\mathbb {R}})^2\) is a consequence of Li et al., [29, Theorem 2.1] and the fact that \(c \in S^2_{{\mathbb {F}}}(\Omega \times [0,T], {\mathbb {R}})\) by the admissibility of the associated control \(\beta ^c\).

Finally, given \((K_p, \Lambda _p, (Y^p,Z^{p,W},Z^{p,B}))\), the linear ODE associated to \(R_p\) in system (4.23) has a unique solution given by

$$\begin{aligned} \begin{aligned} R_p(t)=&- \lambda \gamma \rho _c {\mathbb {E}} [c_T]+\int _t^T \Bigg [-\eta _p \lambda ^2 \gamma ^2\rho _c^2 {\mathbb {V}} [c_u] +\frac{2}{k_p}\left( {\mathbb {V}} [Y^p_u]+{\mathbb {E}} [Y^p_u]^2\right) \\&+\frac{2}{\ell _p-2K_p(u)}\left( {\mathbb {V}}[Z^{p,W}_u]+\left( {\mathbb {E}} [Z^{p,W}_u]+\frac{\ell _p \sigma _p}{2}\right) ^2\right) \Bigg ]du. \end{aligned} \end{aligned}$$

The same arguments are used to prove existence and uniqueness for the processes \((Y^c,Z^{c,W}, Z^{c,B})\) in \(S^2_{{\mathbb {F}}}(\Omega \times [0,T], {\mathbb {R}})\times L^2_{{\mathbb {F}}}(\Omega \times [0,T], {\mathbb {R}})^2\) and \( R_c \in L^{\infty }([0,T], {\mathbb {R}}) \). This ends the proof of existence and uniqueness for systems (4.23) and (4.24).

4.5 Third Step: Fixed Point of the Best Response Map

Here, we prove the existence of a fixed point of the best response maps in order to get a Nash equilibrium. First of all, for convenience of notation, we rewrite the two-dimensional state variable as \(X_t:=( q_t, c_t)^\top ,\) for all \(t \in [0,T]\), and so its linear dynamics is given by the following SDE

$$\begin{aligned}&dX_t =\begin{pmatrix} dq_t \\ dc_t \end{pmatrix} = \begin{pmatrix} u_t \\ v_t \end{pmatrix}dt + \begin{pmatrix} z_t \\ 0 \end{pmatrix} dW_t + \begin{pmatrix} 0 \\ y_t \end{pmatrix} dB_t, \end{aligned}$$
(4.26)

with a deterministic initial condition \(X_0 = (q_0, c_0)^\top \in {\mathbb {R}}_+^2.\) Then, we have

$$\begin{aligned} dX_t=b \alpha _t dt + \sigma ^W \alpha _t dW_t + \sigma ^B \alpha _t dB_t, \end{aligned}$$

with

$$\begin{aligned} \begin{aligned}&b = \begin{pmatrix} 1 &{} 0 &{} 0 &{} 0 \\ 0 &{} 0 &{} 1 &{} 0 \end{pmatrix},\quad \sigma ^W = \begin{pmatrix} 0 &{} 1 &{} 0 &{} 0\\ 0 &{} 0 &{} 0 &{} 0 \end{pmatrix}, \quad \sigma ^B = \begin{pmatrix} 0 &{} 0 &{} 0 &{} 0 \\ 0 &{} 0 &{} 0 &{} 1 \end{pmatrix}. \end{aligned} \end{aligned}$$

Then, we rewrite explicitly the form that a candidate equilibrium feedback control \(\alpha ^*=((\alpha ^{*,P})^\top ,(\alpha ^{*,C})^\top )^\top \) should have, together with the backward dynamics of the corresponding process \(Y=((Y^p)^\top ,(Y^c)^\top )^\top \) (we write \(Z^W\) for \(((Z^{p,W})^\top ,(Z^{c,W})^\top )^\top \), respectively \(Z^B\) for \(((Z^{p,B})^\top ,(Z^{c,B})^\top )^\top \)),Footnote 1

$$\begin{aligned}&\alpha ^*_t- \bar{\alpha }^*_t=\Delta (t) \left( X_t-{\bar{X}}_t\right) +\Gamma \left( Y_t-{\bar{Y}}_t\right) +H^W(t)\left( Z_t^W-{\bar{Z}}_t^W\right) +H^B(t)\left( Z_t^B-{\bar{Z}}_t^B\right) ,\nonumber \\&\bar{\alpha }^*_t=\widehat{\Delta }(t){\bar{X}}_t+ \Gamma {\bar{Y}}_t+ H^W(t){\bar{Z}}_t^W+ H^B(t) {\bar{Z}}_t^B+ \Theta (t), \end{aligned}$$
(4.27)
$$\begin{aligned} dY_t=\left[ \Xi \left( X_t-{\bar{X}}_t\right) + \Phi (t) \left( Y_t-{\bar{Y}}_t \right) \right] dt + \left[ \widehat{\Xi }{\bar{X}}_t +\widehat{\Phi }(t){\bar{Y}}_t+ \Psi \right] dt+Z_t^BdB_t+Z^W_t dW_t, \end{aligned}$$
(4.28)

with

$$\begin{aligned}&\Delta (t)= \begin{pmatrix} \frac{2}{k_p}K_p(t) &{} 0 \\ 0 &{} 0\\ 0 &{}\frac{2}{k_c}K_c(t) \\ 0 &{} 0 \end{pmatrix}, \quad \widehat{\Delta }(t) = \begin{pmatrix} \frac{2}{k_p}\Lambda _p(t) &{} 0 \\ 0 &{} 0\\ 0 &{}\frac{2}{k_c}\Lambda _c(t) \\ 0 &{} 0 \end{pmatrix},\quad \Gamma = \begin{pmatrix} \frac{2}{k_p} &{} 0 \\ 0 &{} 0\\ 0 &{}\frac{2}{k_c} \\ 0 &{} 0 \end{pmatrix}, \end{aligned}$$
$$\begin{aligned}&\Theta (t)= \begin{pmatrix} 0 \\ \sigma _p(1-2\frac{K_p(t)}{\ell _p})^{-1}\\ 0 \\ \sigma _c(1-2\frac{K_c(t)}{\ell _c})^{-1} \end{pmatrix}, \quad H^W(t)= \begin{pmatrix} 0 &{} 0\\ \frac{2}{\ell _p -2 K_p(t)} &{}0\\ 0 &{} 0\\ 0 &{} 0 \end{pmatrix}, \quad H^B(t)= \begin{pmatrix} 0 &{} 0\\ 0 &{} 0\\ 0 &{} 0\\ 0 &{} \frac{2}{\ell _c -2 K_c(t)} \end{pmatrix}, \end{aligned}$$

and \(\Xi ,\) \(\widehat{\Xi },\) \(\Phi (t)\), \(\widehat{\Phi }(t)\) and \(\Psi \) as defined at the beginning of Sect. 3.1.

Now, as an ansatz for Y, we assume Y linear in the state:

$$\begin{aligned} Y_t=\pi (t)(X_t-{\bar{X}}_t)+\widehat{\pi }(t) {\bar{X}}_t+\zeta _t, \end{aligned}$$
(4.29)

with \(\pi ,\widehat{\pi }\) deterministic \({\mathbb {R}}^{2\times 2}\)-valued processes and \(\zeta \in S^2_{{\mathbb {F}}}(\Omega \times [0,T], {\mathbb {R}}^2)\) satisfying the SDE

$$\begin{aligned} d\zeta _t=\psi _t dt+\phi _t^W dW_t+\phi _t^B dB_t, \quad \zeta _T=\frac{1}{2}\lambda (\rho _p,\gamma \rho _c)^\top , \end{aligned}$$
(4.30)

for some \(\psi , \phi ^B, \phi ^W\) in suitable spaces. The affine term in the expression (4.29) allows Y to have some extra stochasticity apart from the linear dependency on the state. Furthermore, the terminal condition in (4.30) guarantees that Y satisfies its terminal condition.

An application of Itô’s formula to the ansatz (4.29) yields

$$\begin{aligned} \begin{aligned} dY_t=&[\pi '(t)(X_t-{\bar{X}}_t)+\pi (t)b(\alpha _t^*-\bar{\alpha }_t^*)+\psi _t-\bar{\psi }_t]dt +(\widehat{\pi }'(t){\bar{X}}_t+\widehat{\pi }(t)b\bar{\alpha }_t^*+ \bar{\psi }_t)dt\\&+(\pi (t)\sigma ^W\alpha ^*_t+\phi ^W_t)dW_t+(\pi (t)\sigma ^B\alpha ^*_t+\phi ^B_t)dB_t. \end{aligned} \end{aligned}$$
(4.31)

If we match the two dynamics of Y in Eqs. (4.28) and (4.31), and then replace Y with its ansatz (4.29) and \(\alpha ^*\) with its feedback form (4.27), we get the following system of equations:

$$\begin{aligned} \begin{aligned}&\left\{ \begin{array}{ll} \pi '(t)(X_t-{\bar{X}}_t)+\pi (t)b(\varvec{I} -H^W(t) \pi (t)\sigma ^W- H^B(t) \pi (t)\sigma ^B)^{-1}[(\Delta (t)\\ \quad +\Gamma \pi (t))(X_t-{\bar{X}}_t)+\Gamma (\zeta _t-\bar{\zeta }_t)+H^W(t)(\phi _t^W-\bar{\phi }_t^W)+H^B(t)(\phi _t^B-\bar{\phi }_t^B)]\\ \quad +\psi _t-\bar{\psi }_t = \Xi (X_t-{\bar{X}}_t)+ \Phi (t) (\pi (t)(X_t-{\bar{X}}_t)+\zeta _t-\bar{\zeta }_t) \\ \\ \widehat{\pi }'(t){\bar{X}}_t+\widehat{\pi }(t)b(\varvec{I} -H^W(t) \pi (t) \sigma ^W-H^B(t) \pi (t)\sigma ^B)^{-1}[(\widehat{\Delta }(t)+\Gamma \widehat{\pi }(t)){\bar{X}}_t +\Gamma \bar{\zeta }_t\\ \quad +\Theta (t) + H^W(t)\bar{\phi }_t^W+H^B(t)\bar{\phi }_t^B] + \bar{\psi }_t = \widehat{\Xi }{\bar{X}}_t +\widehat{\Phi }(t)(\widehat{\pi }(t) {\bar{X}}_t + \bar{\zeta }_t)+ \Psi \\ \\ Z_t^B=\pi (t)\sigma ^W\alpha ^*_t+\phi ^W_t \\ Z^W_t=\pi (t)\sigma ^B\alpha ^*_t+\phi ^B_t. \end{array}\right. \end{aligned} \end{aligned}$$
(4.32)

Finally, exploiting the fact that:

$$\begin{aligned} b\left( \varvec{I} -H^W(t) \pi (t)\sigma ^W- H^B(t) \pi (t)\sigma ^B\right) ^{-1}=b, \end{aligned}$$
(4.33)

we find the equations that the coefficients \((\pi ,\widehat{\pi }, \psi , \phi ^W,\phi ^B)\) in the ansatz for Y should solve in order to provide a fixed point of the best response map:

$$\begin{aligned} \begin{aligned}&\left\{ \begin{array}{l} \begin{array}{ll} \pi '(t)=\Xi +\Phi (t) \pi (t)+\pi (t)\Phi (t)+\pi (t)R\pi (t),&{} \pi (T)=0, \end{array} \\ \begin{array}{ll} \widehat{\pi }'(t)=\widehat{\Xi }+\widehat{\Phi }(t) \widehat{\pi }(t)+ \widehat{\pi }(t)\widehat{\Phi }(t)+\widehat{\pi }(t)R\widehat{\pi }(t),&{} \widehat{\pi }(T)=0, \end{array} \\ \begin{array}{ll} d\zeta _t=\psi _t dt+\phi _t^W dW_t+\phi _t^B dB_t,&{} \zeta _T=\frac{1}{2}\lambda (\rho _p, \gamma \rho _c)^\top , \end{array} \\ \psi _t=\psi _t-\bar{\psi }_t+\bar{\psi }_t=\Big (\pi (t)R+\Phi (t)\Big )(\zeta _t-\bar{\zeta }_t)+\left( \widehat{\pi }(t)R +\widehat{\Phi }(t)\right) \bar{\zeta }_t+\Psi , \end{array}\right. \end{aligned} \end{aligned}$$
(4.34)

where \(R= \begin{pmatrix} - 2/k_p &{} 0 \\ 0 &{} -2/k_c \end{pmatrix}. \) In fact, inserting (YZ) from the ansatz and Equation (4.32) into the best response given by Equations (4.27) provides an equilibrium strategy \(\alpha ^*\) in feedback form which is computed in details in the next step.

Remark 4.6

To obtain explicit expressions for \(\alpha ^*\) and Z, we have used Assumption (A2) in Theorem 3.1. Indeed, such a condition is needed for the invertibility of the matrices \(D(t):=(\varvec{I} -H^W(t) \pi (t)\sigma ^W-H^B(t) \pi (t)\sigma ^B)\), \(t \in [0,T]\), that appear in

$$\begin{aligned}&Z^W_t =\phi _t^W+\pi (t)\sigma ^W\alpha ^*_t, \qquad Z^B_t =\phi _t^B+\pi (t)\sigma ^B\alpha ^*_t, \end{aligned}$$
(4.35)

where

$$\begin{aligned}&\alpha ^*_t= D(t)^{-1}[(\Delta (t)+\Gamma \pi (t))(X_t-{\bar{X}}_t) + (\widehat{\Delta }(t)\\&\quad +\Gamma \widehat{\pi }(t)){\bar{X}}_t+\Gamma \zeta _t +H^W(t)\phi _t^W+H^B(t)\phi _t^B + \Theta (t)]. \end{aligned}$$

4.6 Fourth Step: Nash Equilibrium Strategies

In order to complete the proof of the main theorem, we are left with showing that the system (4.34) has a unique solution over the finite time interval [0, T]. The equations associated to \(t \mapsto (\pi (t), \widehat{\pi }(t))\) are non-symmetric matrix Riccati equations for which there is no general condition ensuring the global existence of solutions. Nevertheless, the regularity of the coefficients and the Picard-Lindelöf Theorem ensure the local existence and uniqueness of solutions over a compact interval \([0,T_{max}]\)Footnote 2. Thus, we recover the existence and uniqueness condition in Assumption (A1) of Theorem 3.1 choosing a time horizon T small enough, namely \(T < T_{max}\). Then, for a given \((\pi ,\widehat{\pi })\), the process \((\zeta , \psi , \phi ^W,\phi ^B)\) evolves according to the following linear mean field BSDE:

$$\begin{aligned} \begin{aligned}&d\zeta _t=\psi _t dt+\phi ^W_t dW_t+\phi ^B_t dB_t,\quad \zeta _T=\frac{1}{2}\lambda (\rho _p, \gamma \rho _c)^\top ,\\&\psi _t=\psi _t-\bar{\psi }_t+\bar{\psi }_t=(\pi (t)R+\Phi (t))(\zeta _t-\bar{\zeta }_t)+(\widehat{\pi }(t)R+\widehat{\Phi }(t))\bar{\zeta }_t+\Psi . \end{aligned} \end{aligned}$$
(4.36)

Exploiting once more [29, Theorem 2.1], we have a unique solution \((\zeta ,\phi ^W, \phi ^B) \in S^2_{{\mathbb {F}}}(\Omega \times [0,T], {\mathbb {R}}^2)\times L^2_{{\mathbb {F}}}(\Omega \times [0,T], {\mathbb {R}}^2)^2.\) Furthermore, we notice that the drift \(\psi \) in the system (4.36) does not depend on \(\phi ^W\) and \(\phi ^B\) and all the coefficients involved in the second line of (4.36) are deterministic. Moreover, the terminal condition is also deterministic. Thus, the unique solution \((\zeta , \phi ^W, \phi ^B)\) to this system is given by (h, 0, 0), where \(h:[0,T] \rightarrow {\mathbb {R}}^2\) is the unique (deterministic) solution to the following backward linear ODE:

$$\begin{aligned} \begin{aligned}&\left\{ \begin{array}{l} dh(t)=\left\{ \left[ \widehat{\pi }(t)R+\widehat{\Phi }(t)\right] h(t)+\Psi \right\} dt,\\ h(T)=\frac{1}{2}\lambda (\rho _p,\gamma \rho _c)^\top . \end{array}\right. \end{aligned} \end{aligned}$$
(4.37)

So, the system of ODEs and SDEs in (4.34) reduces to the one made up of Equations (3.5) and (3.6).

We write the Nash equilibrium strategies \(\alpha ^*=((\alpha ^{*,P}),(\alpha ^{*,C}))^\top = ((u^*,z^*)^{\top }, (v^*,y^*)^\top )^\top \) explicitly as

$$\begin{aligned} \alpha ^*_t&= D(t)^{-1}(\Delta (t)+\Gamma \pi (t))(X_t-\bar{X}_t)+D(t)^{-1}(\widehat{\Delta }(t)\nonumber \\&\quad +\Gamma \widehat{\pi }(t))\bar{X}_t + D(t)^{-1}(\Gamma h(t)+ \Theta (t)), \end{aligned}$$
(4.38)

that is

$$\begin{aligned} u^*_t&= \frac{2}{k_p}\left[ (K_p(t)+\pi _{11}(t))(q_t-{\bar{q}}_t)+\pi _{12}(t)(c_t -{\bar{c}}_t)+(\Lambda _p(t)\right. \\&\quad \left. +\widehat{\pi }_{11}(t)){\bar{q}}_t+\widehat{\pi }_{12}(t){\bar{c}}_t +h_1(t)\right] ,\\ z^*(t)&= \frac{\sigma _p \ell _p}{\ell _p-2(K_p(t)+\pi _{11}(t))},\\ v^*_t&= \frac{2}{k_c}\left[ (K_c(t)+\pi _{22}(t))(c_t-{\bar{c}}_t)+\pi _{21}(t)(q_t -{\bar{q}}_t)+(\Lambda _c(t)\right. \\&\quad \left. +\widehat{\pi }_{22}(t)){\bar{c}}_t+\widehat{\pi }_{21}(t){\bar{q}}_t +h_2(t)\right] ,\\ y^*(t)&= \frac{\sigma _c \ell _c}{\ell _c-2(K_c(t)+\pi _{22}(t))}, \end{aligned}$$

where \(K_p, K_c, \Lambda _p, \Lambda _c\) are defined in (3.1) and \(\pi , \widehat{\pi }\) and h are respectively the solutions to the systems (3.5), (3.6).

Finally, we derive the corresponding equilibrium dynamics for the state

$$\begin{aligned} \begin{aligned} dX_t=&\Bigg \{ \begin{pmatrix} \frac{2}{k_p}(K_p(t)+\pi _{11}(t)) &{} \frac{2}{k_p}\pi _{12}(t) \\ \frac{2}{k_c}\pi _{21}(t) &{} \frac{2}{k_c}(K_c(t)+\pi _{22}(t)) \end{pmatrix} (X_t - \bar{X}_t)\\&+\begin{pmatrix} \frac{2}{k_p}(\Lambda _p(t)+\widehat{\pi }_{11}(t)) &{} \frac{2}{k_p}\widehat{\pi }_{12}(t) \\ \frac{2}{k_c}\widehat{\pi }_{21}(t) &{} \frac{2}{k_c}(\Lambda _c(t)+\widehat{\pi }_{22}(t)) \end{pmatrix} \bar{X}_t +\begin{pmatrix} \frac{2}{k_p}h_1(t) \\ \frac{2}{k_c}h_2(t) \end{pmatrix} \Bigg \} dt\\&+\begin{pmatrix} \frac{\sigma _p \ell _p}{\ell _p-2(K_p(t)+\pi _{11}(t))} \\ 0 \end{pmatrix} dW_t + \begin{pmatrix} 0 \\ \frac{\sigma _c \ell _c}{\ell _c-2(K_c(t)+\pi _{22}(t))} \end{pmatrix} dB_t, \quad t \in [0,T], \end{aligned} \end{aligned}$$

which is a linear mean-field SDE, hence admitting a unique solution.

Fig. 1
figure 1

(a) and (b) \(\ell _p=\ell _c=5\), (c) and (d) \(\ell _p=\ell _c=0.7\)

5 Numerics

We consider the following parameters setting \(T=1\), \(k_p=\) \(k_c=5\), \(\sigma _p=\) \(\sigma _c=10\), \(q_0=\) \(c_0=100\), \(s_0=50\), \(\rho _p=\) \(\gamma \rho _c=0.5\) and \(\gamma =1.2\), \( \delta =5\), \(p_0=\) \(2 s_0+\gamma \delta \), and \(p_1=\gamma -1\). With this parametrisation, the players are symmetric in the sense that they have the same absolute effect on the price and they share the same costs of average production rate or consumption rate. Moreover, if they shared the same risk aversion parameters (\(\eta _p=\eta _c\)) and the same costs of volatility control (\(\ell _p = \ell _c\)), then the strategies of the producer \((u^*,z^*)\) and of the consumer \((v^*,y^*)\) would be identical. The initial conditions have been chosen to be close to a long-run stationary equilibrium that we observe when we take large T, which allows avoiding potential transitory effects.

In the next sub-sections, we illustrate first the effect of the risk aversion parameters on the forward agreement indifference price when every other parameter is fixed. Second, we show how different combinations of risk aversions and volatility control costs can lead to the same forward agreement indifference price and volume.

5.1 The Effect of Risk Aversion

Figure 1 presents the unitary forward agreement indifference price \({f}^{\lambda ^*,*} := {F}^*_{{\lambda }^*}/\lambda ^*\) and the volume that the players agreed upon when the costs of volatility control are high (Fig. 1a and b) and when they are low (Fig. 1c and d). We find that \({f}^{\lambda ^*,*}\) is higher (resp. lower) than the expected spot price when the producer is more (resp. less) risk-averse than the consumer, which is consistent with both the economic intuition and the hedging pressure theory, once recalled that in our model players act as speculators on the forward market. In hedging pressure theory (see [14, 15]), the risk premium is determined by the relations between risk aversions of producers, consumers, storers and speculators. It extends Keynes’s normal backwardation theory which claims that in commodity markets, the forward price should be lower than the expected spot price because the producer would be ready to pay a premium to avoid being exposed to price risk on his production. In our case, the most risk-averse speculator obtains the appropriate premium to enter into the agreement. This property holds whatever the level of volatility control costs. We see on Fig. 1 that the producer is requiring a positive premium to accept the risk coming from his financial position. Regarding the exchanged volume, we observe that it can be both nonincreasing or nondecreasing in the risk aversion parameters of the players, depending on the costs of volatility control. When the volatility manipulation costs are high for both players, there is a low trading volume even when both players have a high risk aversion. On the other side, when the volatility manipulation costs are low, there is a low trading volume when only one of the player has a high risk aversion but the trading volume is huge when both players have a high risk aversion. This could be explained by the fact that in the latter case the players can act on their volatilities (almost costlessly) to stabilize the spot price and hence they would be willing to trade more.

5.2 Joint Effect of Risk Aversion and Volatility Control Cost

We freeze now the risk aversion parameter and the cost for controlling the volatility of the consumer at \(\eta _c=0.01\) and \(\ell _c=5\), and observe the agreement price, the traded volume, the per unit agreement indifference price and the equilibrium payoff at the agreement of the producer. Results are provided in Fig. 2, when the producer’s risk aversion parameter \(\eta _p\) and his volatility manipulation cost \(\ell _p\) vary. The vertical and horizontal lines in each graph are set to the values of \(\ell _c\) and \(\eta _c\).

Fig. 2
figure 2

Level lines of (a) the forward agreement price \({F}^{*}_{\lambda ^*}\), (b) the traded quantity \(\lambda ^*\), (c) the per unit agreement price \({f}^{\lambda ^*,*}={F}^*_{{\lambda }^*}/\lambda ^*\), (d) the value of the producer’s equilibrium payoff \( J_p^*(\lambda ^*,{F}^*_{{\lambda }^*})\)

We observe a sort of “substitution effect” between \(\eta _p\) and \(\ell _p\) in the sense that for a producer with a given combination of risk aversion and volatility control cost, we can find another producer trading at the same agreement price with a higher risk aversion and a low volatility control cost (Fig. 2a). We observe that this phenomenon occurs also for the traded quantity (Fig. 2b). This substitution makes sense in our model where volatility represents a cost for the producer that can be mitigated either by requiring a payment to bear this volatility or by paying the cost to reduce it. We note that for a fixed value of \(\eta _p\), the lower the value of \(\ell _p\), the larger the forward agreement price and the traded volume. The Fig. 2c gives the resulting unitary agreement forward price. The volatility control cost has little effect on the per unit forward price compared to the risk aversion parameter. This figure is a way of showing that when the volatility control costs are high, the producer has little alternative than asking for a premium to enter in forward agreement, and thus, the price is basically determined by his risk-aversion.

To conclude, we note that the producer’s equilibrium payoff is independent of the value of \(\eta _p\) (Fig. 2d) because, by definition of the agreement forward price, it is always equal to \(J_p^*(0,0)\), which is independent of \(\eta _p\).