1 Introduction

Optimal stopping problems are widely used in economic and financial applications. One of the most notable application is the real options approach to investment planning, for a textbook treatment of the topic, see, e.g., Dixit and Pindyck (1994). Since the late 1970’s, this approach has been used to study broad variety of economic planning problems, such as corporate strategy (Myers 1977; Ross 1978; Dixit and Pindyck 1995; Panayi and Trigeorgis 1998; Alvarez and Stenbacka 2001), land and real estate development (Grenadier 1996; Titman 1985; Williams 1993; Capozza and Li 1994), and natural resources investment (Brennan and Schwartz 1985; Paddock et al. 1988; Schwartz 1997). In many of these applications, the option mechanism in the real investment opportunity is the same as in financial options and can be studied by modeling it as an optimal stopping problem. The task is then to determine an investment rule which maximizes the expected present value of the total investment revenue. In the classical treatment of these problems, see, e.g., McDonald and Siegel (1986), Dixit and Pindyck (1994), it is assumed that once the investment decision is made, the project starts to deliver cash flows immediately. In practice, this is often not the case as most capital projects take a significant time to complete, Bar-Ilan and Strange (1996), Kydland and Prescott (1982). This completion period is usually called time-to-build, implementation delay or delivery lag.

The effect of a implementation delay to optimal policy is broadly studied in the literature. The most straightforward way to model the implementation delay is to assume that it is simply a constant, see Aïd et al. (2015), Bar-Ilan and Strange (1996), Bar-Ilan and Sulem (1995). The paper (Bar-Ilan and Strange 1996) is concerned with a classical timing problem of irreversible investment problem in the style of Dixit (1989), whereas Aïd et al. (2015) analyses capacity expansion subject to time to build under the objective of social surplus maximisation. The paper (Bar-Ilan and Sulem 1995) analyzes a continuous-time inventory system with fixed delivery lag. In Alvarez and Keppo (2002) the delivery lag is modelled as a function of the state variable at the time of the investment. Here, a lump sum is paid when the investment is made and the investment revenue accrues once the delivery lag depending on the value of the state variable at the time of the investment has elapsed. Investment timing subject to inter-temporally controllable investment rate is studied in Majd and Pindyck (1987). More precisely, the investor is appraising the real option with respect to the uncertain future revenue and chooses whether to engage the investment. It is assumed that the investment rate is uniformly bounded, this causes the time to build. Now, the time to build is a random but endogenous in the sense that it is a result of the implemented investment policy. The effect of time to build for a levered firm is studied in Sarkar and Zhang (2015). It is assumed in this paper, in the style of Margsiri et al. (2008), that a fixed proportion of the investment cost is payed at time when the investment is engaged and the rest is paid once the implementation period is elapsed. The length of the period is determined by the revenue process, it is the time that it takes the process to reach the the level that is a fixed percentage higher than the state at which the investment is made. The implementation period is random and endogenous also in this case.

The implementation delay can also be modelled as an exogenous random variable. This approach is taken in Lempa (2012b), where the implementation delay is a random variable independent of the state process and exponentially distributed. This allows the usage of resolvent formalism which can be applied in a general Markov setting. The current study is an extension of Lempa (2012b). Indeed, we assume that the implementation delay is again independent but has now a general phase-type distribution. Phase-type distributions have been broadly applied in different fields such as survival analysis (Aalen 1995), healthcare systems modelling (Fackrell 2009), insurance applications (Bladt 2005), queuing theory (Breure and Baum 2005), and population genetics (Hobolth et al. 2018). These distributions are a class of matrix exponential distributions that have a Markovian realisation: they can be identified as the absorption times of certain continuous time Markov chains. We use this connection in our analysis as follows: once the state process is stopped, the exogenous continuous time Markov chain is initiated and the payoff paid out at the time of absorption of this chain depends on the value of the state process at that time. This analysis, which is new to our best knowledge, generalises significantly the results of Lempa (2012b).

Phase-type distributions offer a flexible and convenient model for random time lags. Indeed, it is well known, see, e.g., Breure and Baum (2005), that phase-type distributions are dense in the class of probability distributions on \({\mathbb {R}}_+\). Furthermore, there exists well-developed methodology for estimating the parameters of phase-type distributions, see, e.g., He (2014). As we will see, the Markov realisation of the phase-type distribution allows us to work the Markov theory of diffusion to derive closed form characterisations of the optimal solution. All of these aspects of the paper at hand add to the applicability of the results.

The remainder of the paper is organised as follows. In Sect. 2 we present the optimal stopping problem. In Sect. 3 our main result on the solvability of the problem is presented. We study the case of Coxian distribution in more detail in Sect. 4. The Coxian distribution case is solved in Sect. 5 with scalar diffusion dynamics. Section 6 wraps up the study with two explicit examples.

2 The optimal stopping problem

In this section we present the optimal stopping problem. Let \((\Omega ,{\mathcal {F}},{\mathbb {F}},{\mathbf {P}})\) be a complete filtered probability space satisfying the usual conditions, where \({\mathbb {F}}=\{{\mathcal {F}}_t\}_{t\ge 0}\), see Borodin and Salminen (2015), p. 2. We assume that the underlying X is a strong Markov process defined on \((\Omega ,{\mathcal {F}},{\mathbb {F}},{\mathbf {P}})\) and taking values in \(E\subseteq {\mathbf {R}}^d\) for some \(d\ge 1\) with the initial state \(x\in E\). We take \(E=(a_1,b_1)\times \dots \times (a_d,b_d)\), where \(-\infty \le a_i<b_i\le \infty \) for all \(i=1,\dots ,d\). As usual, we augment the state space E with a topologically isolated element \(\Delta \) if the process X is non-conservative. Then the process X can be made conservative on the augmented state space \(E^{\Delta }:=E\cup \{\Delta \}\), see Borodin and Salminen (2015), p. 4. In what follows, we drop the superscript \(\Delta \) from the notation. By convention, we augment the definition of functions g on E with \(g(\Delta )=0\).

Denote as \({\mathbf {P}}_x\) the probability measure \({\mathbf {P}}\) conditioned on the initial state x and as \({\mathbf {E}}_x\) the expectation with respect to \({\mathbf {P}}_x\). The process X is assumed to evolve under \({\mathbf {P}}_x\) and the sample paths are assumed to be right-continuous and left-continuous over stopping times meaning the following: if the sequence of stopping times \(\tau _n\uparrow \tau \), then \(X_{\tau _n}\rightarrow X_\tau \) \({\mathbf {P}}_x\)-almost surely as \(n\rightarrow \infty \). There is a well-established theory of standard optimal stopping for this class of processes, see Peskir and Shiryaev (2006).

For \(r>0\), we denote by \(L_1^r\) the class of real valued measurable functions f on E satisfying the integrability condition \({\mathbf {E}}_x\left[ \int _0^\infty e^{-rt} \left| f(X_t)\right| dt \right] <\infty \) for all \(x\in E\). For a function \(f\in L_1^r\), the resolvent\(R_rf:E\rightarrow {\mathbf {R}}\) is defined as \((R_rf)(x)={\mathbf {E}}_x \left[ \int _0^\infty e^{-rs} f(X_s) ds \right] \) for all \(x \in E\). Denote p repeated applications of \(R_r\) to the function g as \((R_r^{(p)}g)\).

Under this probabilistic setting, we study the following optimal stopping problem. We want to find a stopping time \(\tau ^*\) which maximizes the expected discounted value of the payoff \(\tau \mapsto g(X_{\tau +\zeta })\). Here, the function g is the payoff function and the variable \(\zeta \) is a phase-type distributed random time - specific assumptions on g are made later. We define next phase-type distributions. Phase-type distributions are particular cases of matrix-exponential distributions, see, e.g., He (2014), which admit a Markovian representation. To be more precise, let Y be a continuous time Markov chain defined on \((\Omega ,{\mathcal {F}},{\mathbf {P}};{\mathbf {F}})\) and taking values values on the set \((0,1,2,\dots ,p)\). The states \(1,\dots ,p\) are transient and the state 0 is absorbing. Then Y has an intensity matrix of the form

$$\begin{aligned} \varvec{\Lambda }= \left( \begin{matrix} 0 &{}\quad {\mathbf {0}} \\ {\mathbf {t}} &{}\quad {\mathbf {T}} \end{matrix}\right) , \end{aligned}$$

where \({\mathbf {T}}\) is a p-dimensional real square matrix (the subgenerator of Y), \({\mathbf {t}}\) is a p-dimensional column vector and \({\mathbf {0}}\) is a p-dimensional row vector of zeros. Since the intensities of rows must sum to zero, we find that \({\mathbf {t}}=-{\mathbf {T}}{\mathbf {e}}\), where \({\mathbf {e}}\) is a column vector of 1’s. Let \(\varvec{\pi }=(\pi _1,\dots ,\pi _p)\) denote the initial distribution of Y over the transient states only. Then we say that the time of absorption \(\zeta =\inf \left\{ t \ge 0 : X_t = 0 \right\} \) has a phase-type distribution and write \(\zeta \sim {\text {PH}}(\varvec{\pi },{\mathbf {T}})\).

Denote as \({\mathbf {P}}_{x,\varvec{\pi }}\) the probability measure \({\mathbf {P}}\) conditioned on the initial state x and initial distribution \(\varvec{\pi }\). The expectation with respect to \({\mathbf {P}}_{x,\varvec{\pi }}\) in denoted as \({\mathbf {E}}_{x,\varvec{\pi }}\). Now, the optimal stopping problem can be expressed as follows:

$$\begin{aligned} V(x)=\sup _{\tau } {\mathbf {E}}_{x,\varvec{\pi }}\left[ e^{-r(\tau +\zeta )}g(X_{\tau +\zeta }){\mathbf {1}}_{\{ \tau < \infty \}} \right] , \end{aligned}$$
(2.1)

where \(\tau \) varies over \({\mathbf {F}}\)-stopping times and r is the discount rate. We denote an optimal stopping time as \(\tau ^*\). Probabilistically, the problem (2.1) can be interpreted as follows. At the initial time \(t=0\), we choose choose a stopping rule described by the stopping time \(\tau \). When \(\tau \) is realized, the Markov chain Y is initiated from the distribution \(\varvec{\pi }\) and the payoff is realized when Y is absorbed. The payoff is thus uncertain and we can regard Y as an additional source of noise driving the payoff.

3 The main result

In this section, we prove our main result on the solvability of the optimal stopping problem (2.1). We make the following assumption on the structure of the matrix \({\mathbf {T}}\).

Assumption 3.1

The eigenvalues of subgenerator \({\mathbf {T}}\) are real and strictly negative

The Assumption 3.1 covers a variety of interesting cases of \({\mathbf {T}}\). For instance, the matrix \({\mathbf {T}}\) can be a triangular matrix with strictly negative diagonal entries. Thus it covers, for example, the following distributions of \(\zeta \):

  1. 1.

    Exponential and mixtures of exponentials with mutually distinct rates \(\lambda _i\),

  2. 2.

    Hyperexponential and hypoexponential distribution,

  3. 3.

    Coxian distribution,

  4. 4.

    Erlang-distribution,

see, e.g., Stewart (2009). We prove first an auxiliary result.

Lemma 3.2

Let \(g \in L_1^r\) and \(m=0,1,2,\dots \). Then

$$\begin{aligned} {\mathbf {E}}_x\left[ \int _0^\infty e^{-rs}s^m g(X_s) ds \right] = m! (R_r^{(m+1)}g)(x). \end{aligned}$$

Proof

We establish the case \(m=2\), the claim follows by induction. By changing the order of integration and invoking Chapman–Kolmogorov, we obtain

$$\begin{aligned}&\int _0^\infty e^{-rs} s \int _E g(y) p(x,y;s) ds \\&\quad = \int _0^\infty e^{-rs} \int _0^s du \int _E g(y) p(x,y;s) ds \\&\quad = \int _0^\infty \int _u^\infty e^{-rs} \int _E g(y) p(x,y;s) ds du \\&\quad = \int _0^\infty \int _0^\infty e^{-r(v+u)} \int _E g(y) p(x,y;v+u) dv du \\&\quad = \int _0^\infty \int _0^\infty e^{-r(v+u)} \int _E g(y) \int _E p(x,dz;u)p(z,dy;v) dv du \\&\quad = \int _0^\infty e^{-ru}\int _E p(x,dz;u)\left( \int _0^\infty e^{-rv} \int _E g(y) p(z,dy;v) dv \right) du, \end{aligned}$$

which concludes the proof. \(\square \)

The next theorem is the main result of this section.

Theorem 3.3

Let Assumption 3.1 hold. In addition, assume that the payoff \(g:E\rightarrow {\mathbf {R}}\) is in \(L^1_r\), satisfies the condition \(S^+:=\{x \ : \ g(x)>0 \}\ne \emptyset \) and the process X reaches a point \(y_x\in S^+\) with positive probability for all initial states x. Furthermore, assume that there exist an r-harmonic function \(h:E\rightarrow {\mathbf {R}}_+\) such that the function \(x\mapsto \frac{g(x)}{h(x)}\) is bounded. Then the value function V exists and can be identified as the least r-excessive majorant of the function \({\mathfrak {g}}_{\varvec{\pi }}:x\mapsto \sum _{i,j,k} \pi _i (R_{r+\mu _j}g)(x) \alpha ^{i}_{jk} t_k\) for some coefficients \(\alpha \), where \(\varvec{\pi }=(\pi _1,\dots ,\pi _p)\), \({\mathbf {t}}=(t_1,\dots ,t_p)\), and the elements \(-\mu _j<0\) are the eigenvalues of \({\mathbf {T}}\). Furthermore, an optimal stopping time \(\tau ^*_\lambda \) exists and can be expressed as \(\tau ^*_\lambda =\inf \{t\ge 0 \ : \ X_t\in \Gamma ^*_\lambda \}\) where \(\Gamma ^*_\lambda =\{x \ : \ V(x)= \sum _{i,j,k} \pi _i (R_{r+\mu _j}g)(x) \alpha ^{i}_{jk} t_k\}\).

Proof

It is known from Bladt (2005), that the density f of \(\zeta \) reads as \(f(s)=\varvec{\pi }e^{{\mathbf {T}}s}{\mathbf {t}}\), where \({\mathbf {t}}=-{\mathbf {T}}{\mathbf {e}}\). Denote the eigenvalues \({\mathbf {T}}\) as \(-\mu _1,\dots ,-\mu _k\), \(k \le p\), with corresponding multiplicities \(a_1,\dots ,a_k\), where \(a_1+\cdots +a_k=p\). There exists a linear change of coordinates \({\mathbf {S}}\) such that

$$\begin{aligned} {\mathbf {T}} = {\mathbf {S}} \underbrace{\left( \begin{matrix} {\mathbf {A}}_1 &{}\quad {\mathbf {0}} &{}\quad \cdots &{}\quad {\mathbf {0}} \\ {\mathbf {0}} &{}\quad {\mathbf {A}}_2 &{}\quad \cdots &{}\quad {\mathbf {0}} \\ \vdots &{}\quad \vdots &{}\quad \ddots &{}\quad \vdots \\ {\mathbf {0}} &{}\quad \cdots &{}\quad \cdots &{}\quad {\mathbf {A}}_k \end{matrix} \right) }_{:={\mathbf {A}}} {\mathbf {S}}^{-1}, \end{aligned}$$

see Teschl (2012), Ch. 3. Here, the diagonal blocks

$$\begin{aligned} {\mathbf {A}}_i = \left( \begin{matrix} \mu _i &{}\quad 1 &{}\quad 0 &{}\quad \cdots &{}\quad 0 \\ 0 &{}\quad \mu _i &{}\quad 1 &{}\quad \cdots &{}\quad 0 \\ \vdots &{}\quad \vdots &{}\quad \ddots &{}\quad \ddots &{}\quad \vdots \\ 0 &{}\quad 0 &{}\quad \ddots &{}\quad \mu _i &{}\quad 1 \\ 0 &{}\quad 0 &{}\quad 0 &{}\quad \cdots &{}\quad \mu _i \\ \end{matrix} \right) = \mu _i {\mathbf {I}} + {\mathbf {N}}_i \end{aligned}$$

are \(a_i\times a_i\)-matrices, and the matrix \({\mathbf {N}}_i\) satisfies \({\mathbf {N}}_i^{a_i}={\mathbf {0}}\), for \(i=1,\dots ,k\). Furthermore, the matrix exponent \(e^{{\mathbf {T}}s} = {\mathbf {S}} e^{{\mathbf {A}}s} {\mathbf {S}}^{-1}\). We readily verify that

$$\begin{aligned} e^{{\mathbf {A}}s} = \left( \begin{matrix} e^{{\mathbf {A}}_1} &{}\quad {\mathbf {0}} &{}\quad \cdots &{}\quad {\mathbf {0}} \\ {\mathbf {0}} &{}\quad e^{{\mathbf {A}}_2} &{}\quad \cdots &{}\quad {\mathbf {0}} \\ \vdots &{}\quad \vdots &{}\quad \ddots &{}\quad \vdots \\ {\mathbf {0}} &{}\quad \cdots &{}\quad \cdots &{}\quad e^{{\mathbf {A}}_k} \end{matrix} \right) , \text { where } e^{{\mathbf {A}}_i} = e^{-\mu _i s} \sum _{m=0}^{a_i-1} \frac{{\mathbf {N}}_m^k s^m}{m!}. \end{aligned}$$

Therefore the entries of the matrix \(e^{{\mathbf {T}}s}\) are linear combinations of the functions \(s\mapsto e^{-\mu _j s}s^m\), \(j=1,\dots ,k\) and \(m=0,\dots ,a_j-1\), that is, \(\left( e^{{\mathbf {T}}s} \right) _{in} = \sum _{j=1}^k \sum _{m=0}^{a_j-1} s^m +e^{-\mu _j s} \alpha ^{i}_{jmn}\). Finally, the density f can be expressed as

$$\begin{aligned} f(s) = \sum _{i,n=1}^p \sum _{j=1}^k \sum _{m=0}^{a_j-1} \pi _i e^{-\mu _j s} s^m \alpha ^{i}_{jmn} t_n. \end{aligned}$$
(3.1)

Since X is strong Markov, the identity

$$\begin{aligned} {\mathbf {E}}_x\left[ e^{-r(\tau +s)}g(X_{\tau +s})\right] = {\mathbf {E}}_x\left[ e^{-r\tau }{\mathbf {E}}_{X_\tau }\left[ e^{-rs}g(X_{s})\right] \right] , \end{aligned}$$

holds for all \(s\ge 0\). By integrating this expression over the positive reals with respect to s with weight f given by (3.1) we obtain

$$\begin{aligned}&{\mathbf {E}}_{x,\varvec{\pi }}\left[ e^{-r(\tau +\zeta )}g(X_{\tau +\zeta }) \right] \\&\quad ={\mathbf {E}}_{x}\left[ e^{-r\tau }{\mathbf {E}}_{X_\tau , \varvec{\pi }}\left[ \int _0^\infty e^{-rs}g(X_{s})f(s)ds\right] \right] \\&\quad ={\mathbf {E}}_{x}\left[ e^{-r\tau } \sum _{i,n=1}^p \pi _i \sum _{j=1}^k \sum _{m=0}^{a_j-1} {\mathbf {E}}_{X_\tau }\left[ \int _0^\infty s^m e^{-(r+\mu _j)s}g(X_{s})ds\right] \alpha ^{i}_{jmn} t_n \right] . \end{aligned}$$

By Lemma 3.2

$$\begin{aligned} {\mathbf {E}}_x\left[ \int _0^\infty s^m e^{-(r+\mu _j)s}g(X_s)ds \right] = m! (R_{r+\mu _j}^{(m+1)} g)(x). \end{aligned}$$

By lumping the factorial coefficients together with constants \(\alpha \) and still denoting the resulting constants by \(\alpha \), we obtain

$$\begin{aligned} {\mathbf {E}}_{x,\varvec{\pi }}\left[ e^{-r(\tau +\zeta )}g(X_{\tau +\zeta })\right] ={\mathbf {E}}_{x}\left[ e^{-r\tau } \sum _{i,1=1}^p \pi _i \sum _{j=1}^k \sum _{m=1}^{a_j} (R_{r+\mu _j}^{(m)}g)(X_\tau ) \alpha ^{i}_{jmn} t_n \right] . \end{aligned}$$

Since the numbers \(\mu _j\) are strictly negative, we can rewrite the resolvent \((R_{r+\mu _j}^{(m)}g)\) as

$$\begin{aligned} (R_{r+\mu _j}^{(m)}g)(x)&={\mathbf {E}}_x\left[ \int _0^\infty e^{-\mu _j t} \frac{(R_{r+\mu _j}^{(m-1)}g)(X_t)}{h(X_t)} e^{-rt}\frac{h(X_t)}{h(x)}dt \right] h(x) \\&=\left( R^h_{\mu _j}\frac{(R_{r+\mu _j}^{(m-1)}g)}{h}\right) (x)h(x), \end{aligned}$$

where \(R^h\) is the resolvent of the Doob’s h-transform \(X^h\), see Borodin and Salminen (2015), p. 34. Using this, we rewrite our optimal stopping problem as

$$\begin{aligned} V(x)&=\sup _{\tau }{\mathbf {E}}_x^h\left[ \sum _{i,n=1}^p \pi _i \sum _{j=1}^k \sum _{m=0}^{a_j-1} \frac{(R_{r+\mu _j}^{(m)}g)(X_\tau )}{h(X_\tau )} \alpha ^{i}_{jmn} t_n {\mathbf {1}}_{\{\tau<\infty \}} \right] h(x)\\ {}&=\sup _{\tau }{\mathbf {E}}_x^h\left[ \sum _{i,n=1}^p \pi _i \sum _{j=1}^k \sum _{m=0}^{a_j-1} \left( R^h_{\mu _j}\frac{(R_{r+\mu _j}^{(m-1)}g)}{h}\right) (X_\tau ) \alpha ^{i}_{jk} t_k {\mathbf {1}}_{\{\tau <\infty \}} \right] h(x). \end{aligned}$$

Since \(-\mu _j<0\) for all j and the family \((R^h_\lambda )_{\lambda >0}\) is a contraction resolvent, we find that

$$\begin{aligned} \left\| \left( R^h_{\mu _j}\frac{(R_{r+\mu _j}^{(m-1)}g)}{h}\right) \right\| _u&\le \frac{1}{\mu _j}\left\| \frac{(R_{r+\mu _j}^{(m-1)}g)}{h} \right\| _u \\&= \frac{1}{\mu _j}\left\| \left( R^h_{\mu _j}\frac{(R_{r+\mu _j}^{(m-2)}g)}{h}\right) \right\| _u \le \cdots \le \frac{1}{\mu _j^m}\left\| \frac{g}{h} \right\| _u <\infty . \end{aligned}$$

where \(\Vert \cdot \Vert _u\) denotes the sup-norm, i.e., \(\Vert f\Vert _u=\sup _{x\in E}|f(x)|\). Thus the payoff \(x \mapsto \sum _{i,n=1}^p \pi _i \left( R^h_{\mu _j}\frac{(R_{r+\mu _j}^{(m-1)}g)}{h}\right) (x) \alpha ^{i}_{jk} t_k\) is uniformly bounded and, in particular, continuous. The claim follows now from Peskir and Shiryaev (2006), Thrm. I.2.7. \(\square \)

Theorem 3.3 gives a weak set of conditions under which the optimal stopping problem (2.1) has a well-defined solution and an optimal stopping time exists. These conditions essentially mean that the optimal stopping region is not empty and that the payoff function g cannot grow too fast, even though it can be unbounded. Note that by Theorem 3.3, we can rewrite the optimal stopping problem (2.1) as

$$\begin{aligned} V_{\varvec{\pi }}(x)=\sup _{\tau }{\mathbf {E}}_x\left[ e^{-r\tau } {\mathfrak {g}}_{\varvec{\pi }}(X_{\tau }){\mathbf {1}}_{\{ \tau <\infty \}}\right] . \end{aligned}$$
(3.2)

This is the form of the problem we will study in what follows. We point out that the payoff function \({\mathfrak {g}}_{\varvec{\pi }}\) can be expressed as

$$\begin{aligned} {\mathfrak {g}}_{\varvec{\pi }}(x) = {\mathbf {E}}_x\left[ e^{-r\zeta } g(X_\zeta ) \right] , \end{aligned}$$
(3.3)

where \(\zeta \sim {\text {PH}}(\varvec{\pi },{\mathbf {T}})\).

4 Case study I: Coxian distribution

The purpose of this section is to analyze the optimal stopping problem (3.2) in more detail when the absorption time has a Coxian distribution. More precisely, we compute a representation for the payoff function (3.3) that lends itself to explicit computations in the next section. To this end, assume that the number of transient states of Y is p. The process Y is then started from the state 1, therefore the initial distribution \(\varvec{\pi }=(1,0,\dots ,0)\). The subgenerator of Y is then written as

$$\begin{aligned} {\mathbf {T}}= \left( \begin{matrix} -(\lambda _1+\lambda _{12}) &{}\quad \lambda _{12} &{}\quad 0 &{}\quad \cdots &{}\quad 0 \\ 0 &{}\quad -(\lambda _{2}+\lambda _{23}) &{}\quad \lambda _{23} &{}\quad \cdots &{}\quad 0 \\ \vdots &{}\quad \vdots &{}\quad \vdots &{}\quad \ddots &{}\quad \vdots \\ 0 &{}\quad 0 &{}\quad 0 &{}\quad \cdots &{}\quad - \lambda _p \end{matrix} \right) . \end{aligned}$$
(4.1)

The quantities \(\lambda _i\) (resp. \(\lambda _{i,i+1}\)) are the absorption intensities (resp. transition intensities from state i to \(i+1\)). The time of absorption \(\zeta \) has a now Coxian distribution, see, e.g., Bladt (2005). We remark that \({\mathbf {t}}=(\lambda _1,\dots ,\lambda _p)^\intercal \). Furthermore, the eigenvalues \(-\mu _i=-(\lambda _i+\lambda _{i,i+1})\) for \(i=1,\dots ,p-1\), and \(-\mu _p=-\lambda _p\).

We study the function

$$\begin{aligned} {\mathfrak {g}}_{\varvec{\pi }}: x \mapsto \sum _{j,k} (R_{r+\mu _j}g)(x)\alpha _{jk}\lambda _k, \end{aligned}$$

where the coefficients \(\alpha _{jk}\) are implicitly defined in the proof of Theorem 3.3. To determine these constants, we first find that the matrix \({\mathbf {S}}\) of the eigenvectors of \({\mathbf {T}}\) reads as

$$\begin{aligned} {\mathbf {S}} = \left( \begin{matrix} 1 &{}\quad \frac{\lambda _{12}}{\mu _{1}-\mu _{2}} &{}\quad \cdots &{}\quad {\prod }_{i=1}^{p-2}\frac{\lambda _{i,i+1}}{\mu _{i}-\mu _{p-1}} &{}\quad {\prod }_{i=1}^{p-1} \frac{\lambda _{i,i+1}}{\mu _{i}-\mu _p} \\ 0 &{}\quad 1 &{}\quad \cdots &{}\quad {\prod }_{i=2}^{p-2}\frac{\lambda _{i,i+1}}{\mu _{i}-\mu _{p-1}} &{}\quad {\prod }_{i=2}^{p-1}\frac{\lambda _{i,i+1}}{\mu _i-\mu _p} \\ \vdots &{}\quad \vdots &{}\quad \ddots &{}\quad \vdots &{}\quad \vdots \\ 0 &{}\quad 0 &{}\quad \cdots &{}\quad 1 &{}\quad \frac{\lambda _{p-1,p}}{\mu _{p-1}-\mu _{p}} \\ 0 &{}\quad 0 &{}\quad \cdots &{}\quad 0 &{}\quad 1 \end{matrix} \right) , \end{aligned}$$

where the ith column is the eigenvector corresponding to the eigenvalue \(-\mu _i\). Thus

$$\begin{aligned} {\mathbf {S}}e^{{\mathbf {D}}s}=\left( \begin{matrix} e^{-\mu _1s} &{}\quad \frac{\lambda _{12}}{\mu _{1}-\mu _{2}}e^{-\mu _{2}s} &{}\quad \cdots &{}\quad {\prod }_{i=1}^{p-2}\frac{\lambda _{i,i+1}}{\mu _{i}-\mu _{p-1}}e^{-\mu _{p-1}s} &{}\quad {\prod }_{i=1}^{p-1} \frac{\lambda _{i,i+1}}{\mu _{i}-\mu _p}e^{-\mu _{p}s} \\ 0 &{}\quad e^{-\mu _{2}s} &{}\quad \cdots &{}\quad {\prod }_{i=2}^{p-2}\frac{\lambda _{i,i+1}}{\mu _{i}-\mu _{p-1}} e^{-\mu _{p-1}s} &{}\quad {\prod }_{i=2}^{p-1}\frac{\lambda _{i,i+1}}{\mu _i-\mu _p} e^{-\mu _ps} \\ \vdots &{}\quad \vdots &{}\quad \ddots &{}\quad \vdots &{}\quad \vdots \\ 0 &{}\quad 0 &{}\quad \cdots &{}\quad e^{-\mu _{p-1}s} &{}\quad \frac{\lambda _{p-1,p}}{\mu _{p-1}-\mu _{p}} e^{-\mu _ps} \\ 0 &{}\quad 0 &{}\quad \cdots &{}\quad 0 &{}\quad e^{-\mu _ps} \end{matrix} \right) , \end{aligned}$$

where \({\mathbf {D}}={\text {diag}}(-\mu _1,\dots ,-\mu _p)\). Since \({\mathbf {S}}\) is upper unitriangular, the inverse \({\mathbf {S}}^{-1}\) is also upper unitriangular. More specifically, we readily verify that

$$\begin{aligned} {\mathbf {S}}^{-1} = \left( \begin{matrix} 1 &{}\quad \frac{-\lambda _{12}}{\mu _{1}-\mu _{2}} &{}\quad \cdots &{}\quad {\prod }_{i=1}^{p-2}\frac{-\lambda _{i,i+1}}{\mu _{1}-\mu _{i+1}} &{}\quad {\prod }_{i=1}^{p-1} \frac{-\lambda _{i,i+1}}{\mu _{1}-\mu _{i+1}} \\ 0 &{}\quad 1 &{}\quad \cdots &{}\quad {\prod }_{i=2}^{p-2}\frac{-\lambda _{i,i+1}}{\mu _{2}-\mu _{i+1}} &{}\quad {\prod }_{i=2}^{p-1}\frac{-\lambda _{i,i+1}}{\mu _{2}-\mu _{i+1}} \\ \vdots &{}\quad \vdots &{}\quad \ddots &{}\quad \vdots &{}\quad \vdots \\ 0 &{}\quad 0 &{}\quad \cdots &{}\quad 1 &{}\quad \frac{-\lambda _{p-1,p}}{\mu _{p-1}-\mu _{p}} \\ 0 &{}\quad 0 &{}\quad \cdots &{}\quad 0 &{}\quad 1 \end{matrix} \right) . \end{aligned}$$

Therefore the matrix exponential

$$\begin{aligned} e^{{\mathbf {T}}s}={\mathbf {S}}e^{{\mathbf {D}}s}{\mathbf {S}}^{-1}= \left( \begin{matrix} e^{-\mu _1s} &{}\quad b_{12}(s) &{}\quad \cdots &{}\quad b_{1,p-1}(s) &{}\quad b_{1p}(s) \\ 0 &{}\quad e^{-\mu _2s} &{}\quad \cdots &{}\quad b_{2,p-1}(s) &{}\quad b_{2p}(s) \\ \vdots &{}\quad \vdots &{}\quad \ddots &{}\quad \vdots &{}\quad \vdots \\ 0 &{}\quad 0 &{}\quad \cdots &{}\quad e^{-\mu _{p-1}s} &{}\quad b_{p-1,p}(s) \\ 0 &{}\quad 0 &{}\quad \cdots &{}\quad 0 &{}\quad e^{-\mu _ps} \end{matrix} \right) , \end{aligned}$$

where the element

$$\begin{aligned} b_{mn}(s)=\prod _{i=m}^{n-1} \lambda _{i,i+1} \sum _{j=m}^{n} \frac{e^{-\mu _j s}}{\prod _{\begin{array}{c} k=m \\ k \ne j \end{array}}^{n}(\mu _k-\mu _j)}, \ m<n, \end{aligned}$$

for \(m=2,\dots ,p\). By substituting this to the definition of the density f, we find

$$\begin{aligned} f(s)&=\varvec{\pi }e^{{\mathbf {T}}s}{\mathbf {t}} = e^{-\mu _1 s} \lambda _1 + \sum _{j=2}^p b_{1j}(s)\lambda _{j} \\&=e^{-\mu _1 s}\lambda _1 + \sum _{k=2}^p\left( \prod _{i=1}^{k-1} \lambda _{i,i+1} \sum _{j=1}^k \frac{e^{-\mu _j s}}{\prod _{\begin{array}{c} l=1 \\ l \ne j \end{array}}^k(\mu _l-\mu _j)} \right) \lambda _{k}. \end{aligned}$$

From this expression, we obtain the coefficients \(\alpha \):

$$\begin{aligned} \alpha _{jk}&= \prod _{i=1}^{k-1} \lambda _{i,i+1}\left( \prod _{\begin{array}{c} l=1 \\ l \ne j \end{array}}^k(\mu _l-\mu _{j}) \right) ^{-1}, \ j= 1,\dots ,p, \ k \ge \max (j,2), \nonumber \\ \alpha _{11}&= 1. \end{aligned}$$
(4.2)

Having the coefficients \(\alpha \) at our disposal, we proceed with the derivation of the payoff function \({\mathfrak {g}}_{\varvec{\pi }}\). Let \({\mathfrak {A}}=(\alpha _{jk})\) and denote the vector of resolvents \((R_{r+\mu _j}g)\), \(j=1,\dots ,p\), as \({\mathbf {r}}\). Then the analysis above implies that

$$\begin{aligned} {\mathfrak {g}}_{\varvec{\pi }}(x)&={\mathbf {r}}^\intercal {\mathfrak {A}}{\mathbf {t}}(x) \nonumber \\&=\lambda _1(R_{r+\mu _1}g)(x) + \sum _{k=2}^p\left( \prod _{i=1}^{k-1} \lambda _{i,i+1} \sum _{j=1}^k \frac{(R_{r+\mu _j}g)(x)}{\prod _{\begin{array}{c} l=1 \\ l \ne j \end{array}}^k(\mu _l-\mu _j)} \right) \lambda _{k}. \end{aligned}$$
(4.3)

for all \(x \in E\). We use the following lemma; this result can be regarded as a generalization of the resolvent equation.

Lemma 4.1

For each \(k=2,\dots ,p\), the following holds:

$$\begin{aligned} \sum _{j=1}^k \frac{(R_{r+\mu _j}g)(x)}{\prod _{\begin{array}{c} l=1 \\ l \ne j \end{array}}^k(\mu _l-\mu _j)} = (R_{r+\mu _1} R_{r+\mu _2} \cdots R_{r+\mu _k} g)(x), \end{aligned}$$

for all \(x \in E\).

Proof

For all \(j=1,\dots ,k\), let \(U_j \sim {\text {Exp}}(\mu _j)\). Let \(S_k = \sum _{j=1}^k U_j\). Since the elements \(\mu _j\) are distinct, we find by using the formula for the distribution of \(S_k\) from Ross (2010), p. 309 that

$$\begin{aligned} {\mathbf {E}}_x\left[ e^{-r S_k} g(X_{S_k})\right]&= {\mathbf {E}}_x\left[ \int _0^\infty e^{-rt} g(X_t) \prod _{i=1}^k \mu _i \sum _{j=1}^k \frac{e^{-\mu _j t}}{\prod _{\begin{array}{c} l=1 \\ l \ne j \end{array}}^k(\mu _l-\mu _j)} dt \right] \\&= \prod _{i=1}^k \mu _i \sum _{j=1}^k \frac{(R_{r+\mu _j}g)(x)}{\prod _{\begin{array}{c} l=1 \\ l \ne j \end{array}}^k(\mu _l-\mu _j)}. \end{aligned}$$

On the other hand, since X is strong Markov, we find that

$$\begin{aligned} {\mathbf {E}}_x\left[ e^{-r S_k} g(X_{S_k})\right]&= {\mathbf {E}}_x\left[ e^{-r S_{k-1}} {\mathbf {E}}_{X_{S_{k-1}}} \left[ e^{-r U_k} g(X_{U_k}) \right] \right] \\&= {\mathbf {E}}_x \left[ e^{-r S_{k-1}} \mu _k (R_{r+\mu _k} g) (X_{S_{k-1}}) \right] \\&= {\mathbf {E}}_x \left[ e^{-r S_{k-2}} \mu _k \mu _{k-1} (R_{r+\mu _{k-1}} R_{r+\mu _k} g) (X_{S_{k-2}}) \right] \\&= \dots \\&= \prod _{i=1}^k \mu _i (R_{r+\mu _1} R_{r+\mu _2} \cdots R_{r+\mu _k} g)(x), \end{aligned}$$

proving the claim. \(\square \)

We can now rewrite the expression (4.3) using Lemma 4.1 to obtain the following result.

Proposition 4.2

Assume that subgenerator \({\mathbf {T}}\) of Y is given by (4.1). Then

$$\begin{aligned} {\mathfrak {g}}_{\varvec{\pi }}(x)&= \lambda _1(R_{r+\mu _1}g)(x) + \sum _{k=2}^p \prod _{i=1}^{k-1} \lambda _{i,i+1} \lambda _{k} (R_{r+\mu _1} R_{r+\mu _2} \cdots R_{r+\mu _k} g)(x), \end{aligned}$$
(4.4)

for all \(x \in E\).

The expression (4.4) has a natural interpretation. Denote the time of absorption from state i as \(A_i\) and time of transition from state i to state \(i+1\) as \(C_i\). For each k, define the random time \(T_k = \sum _{i=1}^{k-1} C_i + A_k\) on the set where the process Y is absorbed from state k - for \(k=1\), obviously \(T_1 = A_1\). Then

$$\begin{aligned} {\mathfrak {g}}_{\varvec{\pi }}(x)&= \frac{\lambda _1}{\mu _1} \mu _1(R_{r+\mu _1}g)(x) \\&\quad + \sum _{k=2}^p \prod _{i=1}^{k-1} \frac{\lambda _{i,i+1}}{\mu _i} \frac{\lambda _{k}}{\mu _k} (\mu _1R_{r+\mu _1} \mu _2R_{r+\mu _2} \cdots \mu _k R_{r+\mu _k} g)(x) \\&= \sum _{k=1}^p {\mathbf {E}}_x\left[ e^{-rT_k} g(X_{T_k}) \right] \prod _{i=1}^{k-1} {\mathbf {P}}(C_i < A_i) {\mathbf {P}}(C_k > A_k). \end{aligned}$$

This representation of the value is in line with the law of total probability. Indeed, the payoff is obtained as the sum over all paths of the process Y where the payoff corresponding to each realization is expected present value of the variable g(X) sampled at the time of absorption \(T_k\) determined by this particular realization. The times \(T_k\) are sums of mutually independent exponentially distributed random times conditional on a particular realization.

5 Case study II: Coxian distribution with scalar diffusion dynamics

In this section, we elaborate the results of the previous section in the case where the process X follows a scalar diffusion. More precisely, we assume that the state process X evolves on \({\mathbf {R}}_+\) and follows the regular linear diffusion given as the weakly unique solution of the Itô equation \(dX_t = \mu (X_t)dt+\sigma (X_t)dW_t\), \(X_0=x\). Here, W is a Wiener process on \((\Omega ,{\mathcal {F}},{\mathbb {F}},{\mathbf {P}})\) and the real valued functions \(\mu \) and \(\sigma >0\) are assumed to be continuous. Using the terminology of Borodin and Salminen (2015), the boundaries 0 and \(\infty \) are either natural, entrance-not-exit, exit-not-entrance or non-singular. In the case a boundary is non-singular, it is assumed to be either killing or instantaneously reflecting, see Borodin and Salminen (2015), pp. 18–20. As usually, we denote as \({\mathcal {A}}= \frac{1}{2}\sigma ^2(x)\frac{d^2}{dx^2}+\mu (x)\frac{d}{dx}\) the second order linear differential operator associated to X. Furthermore, we denote as, respectively, \(\psi _r>0\) and \(\varphi _r>0\) the increasing and decreasing solution of the ODE \({\mathcal {A}}u=ru\), where \(r>0\), defined on the domain of the characteristic operator of X. By posing appropriate boundary conditions depending on the boundary classification of the diffusion X, the functions \(\psi _r\) and \(\varphi _r\) are defined uniquely up to a multiplicative constant and can be identified as the minimal r-excessive functions, see (Borodin and Salminen 2015, pp. 18 – 20). Finally, we define the speed measure m and the scale function S of X via the formulaæ \(m'(x)=\frac{2}{\sigma ^2(x)}e^{B(x)}\) and \(S'(x)= e^{-B(x)}\) for all \(x \in {\mathbf {R}}_+\), where \(B(x):=\int ^x \frac{2\mu (y)}{\sigma ^2(y)}dy\), see Borodin and Salminen (2015), p. 17.

We know from the literature that for a given \(f\in L_1^r\) the resolvent \(R_rf\) can be expressed as

$$\begin{aligned} (R_rf)(x)= & {} B_r^{-1}\varphi _r(x)\int _0^x \psi _r(y)f(y)m'(y)dy \nonumber \\&+B_r^{-1}\psi _r(x)\int _x^\infty \varphi _r(y)f(y)m'(y)dy, \end{aligned}$$
(5.1)

for all \(x \in {\mathbf {R}}_+\), where \(B_r=\frac{\psi _r'(x)}{S'(x)}\varphi _r(x)-\frac{\varphi _r'(x)}{S'(x)}\psi _r(x)\) denotes the Wronskian determinant, see Borodin and Salminen (2015), p. 19.

We consider the optimal stopping problem

$$\begin{aligned} V_{\varvec{\pi }}(x)=\sup _{\tau }{\mathbf {E}}_x\left[ e^{-r\tau } {\mathfrak {g}}_{\varvec{\pi }}(X_{\tau }){\mathbf {1}}_{\{ \tau <\infty \}}\right] . \end{aligned}$$
(5.2)

where the payoff

$$\begin{aligned} {\mathfrak {g}}_{\varvec{\pi }}(x)&= \frac{\lambda _1}{\mu _1} \mu _1(R_{r+\mu _1}g)(x) \nonumber \\&\quad + \sum _{k=2}^p \prod _{i=1}^{k-1} \frac{\lambda _{i,i+1}}{\mu _i} \frac{\lambda _{k}}{\mu _k} (\mu _1R_{r+\mu _1} \mu _2R_{r+\mu _2} \cdots \mu _k R_{r+\mu _k} g)(x). \end{aligned}$$
(5.3)

The following proposition is the main result of this section.

Proposition 5.1

Let the assumptions of Theorem 3.3 be met. In addition, assume that

  1. (A)

    the function g is stochastically\(C^2\), that is, continuous and twice continuously differentiable outside of a countable set D that has no accumulation points,

  2. (B)

    the function \(\frac{g(x)}{\psi _r(x)}\) attains a maximum at an interior point \({\hat{x}}\) ,

  3. (C)

    the function \(({\mathcal {A}}-r)g\) is non-increasing,

  4. (D)

    the limiting condition \(({\mathcal {A}}-r)(R_{r+\mu }g)(0+):=\lim _{x\rightarrow 0+}({\mathcal {A}}-r)(R_{r+\mu }g)(x)>0\) holds for all \(\mu >\min \{\mu _i\}\), where \(\mu _i\) are the eigenvalues of the subgenerator \({\mathbf {T}}\).

Then there is a unique \(x^*\) which maximizes the function \(x \mapsto \frac{{\mathfrak {g}}_{\varvec{\pi }}(x)}{\psi _r(x)}\). The state \(x^*\) is the optimal stopping threshold for the optimal stopping problem (5.2) and the value function \(V_{\varvec{\pi }}\) can be expressed as

$$\begin{aligned} V_{\varvec{\pi }}(x) = {\left\{ \begin{array}{ll} {\mathfrak {g}}_{\varvec{\pi }}(x), &{} x \ge x^*, \\ \frac{{\mathfrak {g}}_{\varvec{\pi }}(x^*)}{\psi _r(x^*)}\psi _r(x), &{} x \le x^*. \end{array}\right. } \end{aligned}$$
(5.4)

Remark 5.2

By assumption (A), we have

$$\begin{aligned} \frac{d}{dx}\left( \frac{g(x)}{\psi _r(x)}\right) = \frac{S'(x)}{\psi _r^2(x)} \int _0^x \psi _r(z)({\mathcal {A}}-r)g(z)m'(z)dz. \end{aligned}$$

Thus, by assumption (B), a threshold \(x_0<{\hat{x}}\) such that \(({\mathcal {A}}-r)g(x) \gtreqqless 0\), when \(x \lesseqqgtr x_0\).

Example 5.3

We present a set of suffients conditions for the assumptions (A)–(D) of Proposition 5.1. Assume that the function g is

  1. (1)

    non-negative and non-decreasing with \(g(0+)=0\),

  2. (2)

    piecewise linear with a finite number corner points.

These assumptions cover various option-like payoffs such as \(g(x)=(x-K)^+\). Now the assumption (A) is clearly satisfied. The function \(\psi _r\) is known to be increasing. If it is furthermore assumed to convex, as it is for instance for GBM (see Alvarez (2003) for general conditions for convexity of \(\psi _r\)), then the assumption (B) is also satisfied. With regard to assumption (C), we find by assuming sufficient regularity that

$$\begin{aligned} l(x) = ({\mathcal {A}}-r)g(x) = \mu (x)g'(x) - r g(x) = \mu (x) c_k - r g(x), \end{aligned}$$

for some constant \(c_k>0\). Thus \(l'(x) = c_k(\mu '(x)-r)<0\) whenever \(\mu '(x)<r\) for all x, that is, the growth rate of the drift function must be uniformly bounded by the rate of discounting. This condition is not very severe and is satisfied for reasonable parameter configurations by GBM and various mean-reverting diffusions such as Cox–Ingersoll–Ross and Verhulst–Pearl processes. Finally, since \(({\mathcal {A}}-r)(R_{r+\mu }g)(x) = \mu (R_{r+\mu }g)(x) - g(x)\), we find that the condition of assumption (D) is satisfied in this case.

An even simpler payoff structure \(g(x)=x-K\) is also covered by (A)-(D). In comparison to (1) and (2), \(g(0+)\) is now negative and we cannot argue as above for (D) to hold. Assume now, that the function \(\mu \) is non-negative at origin. Then \(({\mathcal {A}}-(r+\mu ))g(0+)= \mu (0+) - (r+\mu )g(0+) >0\) for all \(\mu >0\) and, consequently, the assumption (D) holds.

Remark 5.4

Proposition 5.1 gives sufficient conditions for the existence of a one-sided optimal stopping rule. Analogous conditions that would result in two-sided rules could most likely be provided. Indeed, as Remark 5.2 points out, the function g is r-subharmonic for on \((0,x_0)\). On the other hand, if we assume that g is r-subharmonic on an interval (ab), where \(0<a<b<\infty \), then we would be likely to work out a set of assumptions such the resulting optimal continuation region is \((z^*,y^*)\), where \(0<z^*<y^*<\infty \). These assumptions would most likely include boundedness and monotonicity assumptions of the functions \(\frac{g}{\psi _r}\) and \(\frac{g}{\varphi _r}\), see Lempa (2010). However, this generalization is beyond the scope of this paper.

Lemma 5.5

Let the assumptions of Proposition 5.1 hold. Then, for all \(\mu > 0\), the the function \(({\mathcal {A}}-r)(R_{r+\mu }g)\) is decreasing.

Proof

Since \((R_{r+\mu }({\mathcal {A}}-(r+\mu ))g) = - g = ({\mathcal {A}}-(r+\mu )(R_{r+\mu }g))\), we find that \(({\mathcal {A}}-r)(R_{r+\mu }g) = (R_{r+\mu }({\mathcal {A}}-r)g)\). Now, ordinary differentiation yields

$$\begin{aligned} B_{r+\mu }(R_{r+\mu }({\mathcal {A}}-r)g)'(x)&= \varphi _{r+\mu }'(x)\int _0^x \psi _{r+\mu }(z)({\mathcal {A}}-r)g(z)m'(z)dz \nonumber \\&\quad + \psi _{r+\mu }'(x)\int _x^\infty \varphi _{r+\mu }(z)({\mathcal {A}}-r)g(z)m'(z)dz. \end{aligned}$$
(5.5)

We observe from (5.5) that this derivative is negative at \(x_0\). Let \(x < x_0\). Then we can rewrite (5.5) as

$$\begin{aligned}&B_{r+\mu }(R_{r+\mu }({\mathcal {A}}-r)g)'(x) \\&\quad = \varphi _{r+\mu }'(x)\int _0^{x} \psi _{r+\mu }(z)({\mathcal {A}}-r)g(z)m'(z)dz + \psi _{r+\mu }'(x)\\&\qquad \left( \int _{x}^{x_0} \varphi _{r+\mu }(z)({\mathcal {A}}-r)g(z)m'(z)dz + \int _{x_0}^\infty \varphi _{r+\mu }(z)({\mathcal {A}}-r)g(z)m'(z)dz \right) . \end{aligned}$$

The assumed boundary classification of 0 implies that

$$\begin{aligned}&\varphi _{r+\mu }'(x)\int _0^{x} \psi _{r+\mu }(z) ({\mathcal {A}}-r)g(z)m'(z)dz \\&\qquad + \psi _{r+\mu }'(x)\int _{x}^{x_0} \psi _{r+\mu }(z) ({\mathcal {A}}-r)g(z)m'(z)dz \\&\quad< ({\mathcal {A}}-r)g(x)\left( \varphi _{r+\mu }'(x)\int _0^{x} \psi _{r+\mu }(z)m'(z)dz + \psi _{r+\mu }'(x)\int _{x}^{x_0} \varphi _{r+\mu }(z)m'(z)dz \right) \\&\quad = \frac{({\mathcal {A}}-r)g(x)}{r+\mu }\left( \psi _{r+\mu }'(x) \frac{\varphi _{r+\mu }'(x_0)}{S'(x_0)} - \varphi _{r+\mu }'(x) \lim _{x \rightarrow 0+} \frac{\psi _{r+\mu }'(x)}{S'(x)} \right) < 0, \end{aligned}$$

proving that the derivative (5.5) is negative on \((0,x_0)\). The interval \((x_0,\infty )\) is treated similarly. \(\square \)

Lemma 5.6

Let the assumptions of Proposition 5.1 hold. Then, for all \(\mu >0\), the function \(x \mapsto \frac{(R_{r+\mu }g)(x)}{\psi _r(x)}\) is decreasing for all \(x \ge {\hat{x}}\).

Proof

Following Remark 5.2, we find by changing the order of integration that

$$\begin{aligned}&\frac{\psi _r^2(x)B_{r+\mu }}{S'(x)}\frac{d}{dx} \left( \frac{(R_{r+\mu }g)(x)}{\psi _r(x)} \right) \nonumber \\&\quad = B_{r+\mu }\int _0^x \psi _r(z)({\mathcal {A}}-r)(R_{r+\mu }g)(z)m'(z)dz \nonumber \\&\quad = \int _0^x \psi _r(y) \varphi _{r+\mu }(y) \int _0^y \psi _{r+\mu }(z)({\mathcal {A}}-r)g(z)m'(z)dy m'(y)dy \nonumber \\&\qquad + \int _0^x \psi _r(y) \psi _{r+\mu }(y) \int _y^x \varphi _{r+\mu }(z)({\mathcal {A}}-r)g(z)m'(z)dy m'(y)dy \nonumber \\&\quad = \int _0^x\left( \psi _{r+\mu }(z) \int _z^x \varphi _{r+\mu }(y)\psi _r(y)m'(y)dy \right) ({\mathcal {A}}-r)g(z)m'(z)dz \nonumber \\&\qquad + \int _x^\infty \varphi _{r+\mu }(z)\int _0^x \psi _{r+\mu }(y)\psi _r(y)m'(y)dy ({\mathcal {A}}-r)g(z)m'(z)dz. \end{aligned}$$
(5.6)

By the virtue of Lemma 5.5, it is enough to show that the differential (5.6) is negative at \({\hat{x}}\). To this end, define the function \({\check{\psi }}_r^{x}(y) = \psi _r(y){\mathbf {1}}_{\{ y \le x \}}\). Since the upper boundary \(\infty \) is natural for the diffusion X, it follows from Lempa (2012a), Lemma 2.1 and the fact that the function \(\psi _r\) is non-negative that \(\mu (R_{r+\mu }{\check{\psi }}_r^{x})(y) \le \mu (R_{r+\mu }\psi _r)(y) = \psi _r(y)\), for all \(y\in (0,\infty )\). By evaluating the right hand side at the point \({\hat{x}}\), we then find that

$$\begin{aligned}&\int _0^{{\hat{x}}}\left( \psi _{r+\mu }(z) \int _z^{{\hat{x}}} \varphi _{r+\mu }(y)\psi _r(y)m'(y)dy \right) ({\mathcal {A}}-r)g(z)m'(z)dz \\&\qquad + \int _{{\hat{x}}}^\infty \varphi _{r+\mu }(z)\int _0^{{\hat{x}}} \psi _{r+\mu }(y)\psi _r(y)m'(y)dy ({\mathcal {A}}-r)g(z)m'(z)dz \\&\quad< B_{r+\mu } \int _0^{{\hat{x}}} \psi _{r}(z)({\mathcal {A}}-r)g(z)m'(z)dz \\&\qquad +\int _{{\hat{x}}}^\infty \varphi _{r+\mu }(z)\int _0^{{\hat{x}} } \psi _{r+\mu }(y)\psi _r(y)m'(y)dy ({\mathcal {A}}-r)g(z)m'(z)dz < 0. \end{aligned}$$

This proves the claim. \(\square \)

Proof of Proposition 5.1

Let \(k \in \{2,\dots ,p \}\) and consider the function

$$\begin{aligned} x \mapsto \prod _{i=1}^{k-1} \frac{\lambda _{i,i+1}}{\mu _i} \frac{\lambda _{k}}{\mu _k} (\mu _1R_{r+\mu _1} \mu _2R_{r+\mu _2} \cdots \mu _k R_{r+\mu _k} g)(x). \end{aligned}$$

We start by analyzing the properties of \(\mu _k (R_{r+\mu _k} g)\) and show that it satisties the same assumptions as g. Under the assumption (A), the function \(\mu _k (R_{r+\mu _k} g)\) is obviously stochastically \(C^2\). Lemmas (5.5) and (5.6) coupled with the assumption (D) imply that \(\mu _k (R_{r+\mu _k} g)\) satisfies also the assumptions (B) and (C). To see that the condition (D) is also satisfied, assume that parameters \(\eta _1,\eta _2>\min \{\mu _i\}\). Without loss of generality, we cas assume that \(\eta _1>\eta _2\). Then the resolvent equation implies that

$$\begin{aligned}&({\mathcal {A}}-r)(R_{r+\eta _1}R_{r+\eta _2}g)(0+) \\&\quad = \frac{1}{\eta _1-\eta _2}\left( (R_{r+\eta _2} ({\mathcal {A}}-r)g(0+) -(R_{r+\eta _1}({\mathcal {A}}-r) g(0+) \right) >0. \end{aligned}$$

We can now use the same procedure iteratively through the entire sum in (5.3) to conclude that the function \({\mathfrak {g}}_{\varvec{\pi }}\) satisfies also the same assumptions as g. The claim follows now from Alvarez (2001), Thrm. 3. \(\square \)

Remark 5.7

We observe from the proof of Proposition 5.1, in particular, from Lemma 5.6, that the optimal stopping threshold \(x^*\) is dominated by the state \({\hat{x}}\). This state is the optimal stopping threshold for the same problem in absence of the implementation delay. In other words, we observe that the introduction of the exercise lag accelerates the optimal exercise of the option to stop.

Remark 5.8

We observe from the proof of Proposition 5.1 also that we can modify the function (5.3) and allow for different payoffs for absorption from different phases. To elaborate, say that we have a collection of functions \((g_k)_{k=1}^p\) which all satisfy the assumptions of Proposition 5.1. Then we modify (5.3) such that if the absorption of Y occurs from the state i, the resulting payoff is given by the function \(g_i\), that is,

$$\begin{aligned} {\mathfrak {g}}_{\varvec{\pi }}(x)&= \frac{\lambda _1}{\mu _1} \mu _1(R_{r+\mu _1}g_1)(x) \\&\quad + \sum _{k=2}^p \prod _{i=1}^{k-1} \frac{\lambda _{i,i+1}}{\mu _i} \frac{\lambda _{k}}{\mu _k} (\mu _1R_{r+\mu _1} \mu _2R_{r+\mu _2} \cdots \mu _k R_{r+\mu _k} g_k)(x). \end{aligned}$$

Then we can do a similar analysis as in the proof of Proposition 5.1 to conclude that the optimal stopping threshold is given the global maximum of the function \(x\mapsto \frac{{\mathfrak {g}}_{\varvec{\pi }}(x)}{\psi _r(x)}\).

To see why this is interesting, consider the following example. In real option applications, the investor often pays a lump sum cost K either when the investment option is exercised or when project is completed. The basic form of the exercise payoff is \(x \mapsto {\mathbf {E}}_{x}\left[ e^{-r\zeta } g(X_\zeta ) \right] \), where the function g can, for instance, be \(x \mapsto x - K\). Now the lump sum is paid at the completion time. To illustrate how to shift this payment to the start of the project, consider the following. We assume for brevity that the lag variable \(\zeta \) has a two-phase Cox distribution. Furthermore, denote the constant function \(x \mapsto 1\) as \({\mathbb {1}}\). Then

$$\begin{aligned}&{\mathbf {E}}_x\left[ e^{-r\zeta } g(X_\zeta ) \right] - K \nonumber \\&\quad =\frac{\lambda _1}{\mu _1}\mu _1(R_{r+\mu _1}g)(x) + \frac{\lambda _{12}}{\mu _1}\mu _1(R_{r+\mu _1}\lambda _2R_{r+\lambda _2}g)(x) - K \nonumber \\&\quad = \frac{\lambda _1}{\mu _1}\left( \mu _1(R_{r+\mu _1}g)(x) - K \right) +\frac{\lambda _{12}}{\mu _1}\left( \mu _1(R_{r+\mu _1}\lambda _2R_{r+\lambda _2}g)(x) - K \right) \nonumber \\&\quad = \frac{\lambda _1}{\mu _1}\left( \mu _1(R_{r+\mu _1}g)(x) - \mu _1\left( R_{r+\mu _1}((r+\mu _1)-{\mathcal {A}})\frac{K}{\mu _1}{\mathbb {1}} \right) (x) \right) \nonumber \\&\qquad + \frac{\lambda _{12}}{\mu _1}\left( \mu _1(R_{r+\mu _1} \lambda _2R_{r+\lambda _2}g)(x)-\mu _1\left( R_{r+\mu _1}((r+\mu _1) -{\mathcal {A}})\frac{K}{\mu _1}{\mathbb {1}} \right) (x)\right) \nonumber \\&\quad = \frac{\lambda _1}{\mu _1}\left( \mu _1\left( R_{r+\mu _1}\left( g - \frac{K(r+\mu _1)}{\mu _1} {\mathbb {1}} \right) \right) (x) \right) \nonumber \\&\qquad + \frac{\lambda _{12}}{\mu _1}\left( \mu _1\left( R_{r+\mu _1}\left( \lambda _2(R_{r+\lambda _2}g) - \frac{K(r+\mu _1)}{\mu _1} {\mathbb {1}} \right) \right) (x) \right) \nonumber \\&\quad = \frac{\lambda _1}{\mu _1}\left( \mu _1\left( R_{r+\mu _1}\left( g - \frac{K(r+\mu _1)}{\mu _1} {\mathbb {1}} \right) \right) (x) \right) \nonumber \\&\qquad + \frac{\lambda _{12}}{\mu _1}\left( \mu _1 \left( R_{r+\mu _1} \lambda _2\left( R_{r+\lambda _2}\left( g - \frac{K(r+\mu _1)(r+\lambda _2)}{\mu _1\lambda _2} {\mathbb {1}} \right) \right) \right) (x) \right) , \end{aligned}$$
(5.7)

for a sufficiently nice function g. Define the functions

$$\begin{aligned} g_1(x)&= g(x) - \frac{K(r+\mu _1)}{\mu _1}, \\ g_2(x)&= g(x) - \frac{K(r+\mu _1)(r+\lambda _2)}{\mu _1\lambda _2}. \end{aligned}$$

If the functions \(x \mapsto \frac{g_i(x)}{\psi _r(x)}\) satisfy the assumptions of Proposition 5.1, the conclusion of 5.1 holds for the payoff (5.7). This payoff corresponds to the case where the lump sum is paid at time when the project is initiated.

6 Two examples

The purpose of this section is to illustrate the previous results with explicit examples. We assume throughout the section that the process Y has the subgenerator

$$\begin{aligned} {\mathbf {T}}= \left( \begin{matrix} -(\lambda _1+\lambda _{12}) &{} \lambda _{12} \\ 0 &{} -\lambda _{2} \end{matrix} \right) . \end{aligned}$$

This implies that the time of absorption \(\zeta \) has a two-phase Coxian distribution.

6.1 Geometric Brownian motion

Assume that the process X is given by a geometric Brownian motion, that is, the solution of the stochastic differential equation \(dX_t=\mu X_t dt + \sigma X_t dW_t\). Here, W is a Wiener process and the constants \(\mu \) and \(\sigma >0\) satisfy the conditions \(\mu <r\) and \(\mu -\frac{1}{2}\sigma ^2>0\). Then the optimal stopping time is almost surely finite. The scale density \(S'\) reads as \(S'(x)=x^{-\frac{2\mu }{\sigma ^2}}\) and the speed density \(m'\) as \(m'(x )=\frac{2}{\sigma ^2x^2}x^{\frac{2\mu }{\sigma ^2}}\). The differential operator \({\mathcal {A}}=\frac{1}{2}\sigma ^2x^2\frac{d^2}{dx^2}+\mu x \frac{d}{dx}\) and the functions \(\psi _\cdot \) and \(\varphi _\cdot \) can be written as \(\psi _{r+\lambda }(x)=x^{\beta _{r+\lambda }}, \ \varphi _{r+\lambda }(x)=x^{\alpha _{r+\lambda }}\), where the constants

$$\begin{aligned} {\left\{ \begin{array}{ll} \beta _{r+\lambda }=\left( \frac{1}{2}-\frac{\mu }{\sigma ^2} \right) +\sqrt{\left( \frac{1}{2}-\frac{\mu }{\sigma ^2} \right) ^2+\frac{2(r+\lambda )}{\sigma ^2}}>1, \\ \alpha _{r+\lambda }=\left( \frac{1}{2}-\frac{\mu }{\sigma ^2} \right) -\sqrt{\left( \frac{1}{2}-\frac{\mu }{\sigma ^2} \right) ^2+\frac{2(r+\lambda )}{\sigma ^2}}<0. \\ \end{array}\right. } \end{aligned}$$

It is a simple computation to show that the Wronskian \(B_{r+\lambda }=2\sqrt{\left( \frac{1}{2}-\frac{\mu }{\sigma ^2} \right) ^2+\frac{2(r+\lambda )}{\sigma ^2}}\).

6.1.1 A problem with an explicit solution

We follow Remark 5.8 and consider the exercise payoff \({\mathfrak {g}}_{\varvec{\pi }}: x \mapsto {\mathbf {E}}_x\left[ e^{-r\zeta } X_\zeta \right] - K\), where \(K>0\). Using the identity (5.7), we obtain

$$\begin{aligned}&{\mathbf {E}}_x\left[ e^{-r\zeta } X_\zeta \right] - K \\&\quad = \frac{\lambda _1}{\mu _1}\left( \mu _1\left( R_{r+\mu _1}\left( {\text {id}}-\frac{K(r+\mu _1)}{\mu _1} {\mathbb {1}} \right) \right) (x) \right) \\&\qquad + \frac{\lambda _{12}}{\mu _1}\left( \mu _1 \left( R_{r+\mu _1} \lambda _2\left( R_{r+\lambda _2}\left( {\text {id}}- \frac{K(r+\mu _1)(r+\lambda _2)}{\mu _1\lambda _2} {\mathbb {1}} \right) \right) \right) (x) \right) , \end{aligned}$$

where \({\text {id}}\) is the identity function. Let

$$\begin{aligned} g_1(x)&= x - \frac{K(r+\mu _1)}{\mu _1}, \\ g_2(x)&= x - \frac{K(r+\mu _1)(r+\lambda _2)}{\mu _1\lambda _2}. \end{aligned}$$

We readily verify that the functions \(g_i\) satisfy the assumptions of Proposition 5.1. Next, we study the function \(x \mapsto \frac{{\mathfrak {g}}_{\varvec{\pi }}(x)}{\psi _r(x)}\). Since

$$\begin{aligned} {\mathbf {E}}_x\left[ e^{-r\zeta } X_\zeta \right] = \frac{\lambda _1}{\mu _1}\mu _1(R_{r+\mu _1}{\text {id}})(x) + \frac{\lambda _{12}}{\mu _1}\mu _1(R_{r+\mu _1}\lambda _2R_{r+\lambda _2}{\text {id}})(x), \end{aligned}$$

we start by computing the required resolvents. First, we find that

$$\begin{aligned} \mu _1(R_{r+\mu _1}{\text {id}})(x)&= \frac{2\mu _1}{B_{r+\mu _1}\sigma ^2}\left( x^{\alpha _{r+\mu _1}} \int _0^x y^{-{\alpha _{r+\mu _1}}}dy + x^{\beta _{r+\mu _1}} \int _x^\infty y^{-\beta _{r+\mu _1}}dy \right) \\&= \frac{2\mu _1(\beta _{r+\mu _1}-\alpha _{r+\mu _1})x}{B_{r+\mu _1}\sigma ^2(1-\alpha _{r+\mu _1})(\beta _{r+\mu _1}-1)} = \frac{\mu _1 x}{r+\mu _1-\mu } \end{aligned}$$

This yields

$$\begin{aligned} \mu _1(R_{r+\mu _1}\lambda _2R_{r+\lambda _2}{\text {id}})(x) = \frac{\mu _1\lambda _2 x}{(r+\mu _1-\mu )(r+\lambda _2-\mu )}. \end{aligned}$$

We can then write the expectation

$$\begin{aligned} {\mathbf {E}}_x\left[ e^{-r\zeta } X_\zeta \right]&=\underbrace{\frac{1}{r-\mu +\mu _1}\left( \lambda _1 + \frac{\lambda _{12}\lambda _2}{r-\mu +\lambda _2} \right) }_{:= C} x, \end{aligned}$$
(6.1)

and, consequently, express the exercise payoff as \({\mathfrak {g}}_{\varvec{\pi }}(x) = Cx - K\). The solution to this optimal stopping problem is well known, see, e.g., McKean (1965). The optimal stopping threshold, which can be identified as the global maximum of the function \(x \mapsto \frac{{\mathfrak {g}}_{\varvec{\pi }}(x)}{\psi _r(x)}\), is the level \(x^* = \frac{K\beta _r}{C(\beta _r-1)}\). The value function is

$$\begin{aligned} V_{\varvec{\pi }}(x) = {\left\{ \begin{array}{ll} Cx - K, &{} x \ge x^*, \\ \left( \frac{C}{\beta _r}\right) ^{\beta _r}\left( \frac{\beta _r-1}{K}\right) ^{\beta _r-1}x^{\beta _r}, &{} x \le x^*. \end{array}\right. } \end{aligned}$$

6.1.2 Sensitivity analysis

We study the sensitivities of the trigger threshold \(x^*\) with respect to the jump intensities of the process Y. We first observe that

$$\begin{aligned} \frac{dx^*}{d\lambda } = -\frac{K\beta _r}{(\beta _r-1)C^2} \frac{dC}{d\lambda }, \end{aligned}$$

here, \(\lambda \) denotes any of the rate variables in the coefficient C. For brevity, denote \(\delta =r-\mu \). Then elementary computation yields

$$\begin{aligned} \frac{dC}{d\lambda _1}&= \frac{\delta (\delta +\lambda _2) (\delta +\lambda _{12}+ \lambda _2)}{(\delta + \lambda _1+\lambda _{12})^2(\delta +\lambda _2)^2}>0, \\ \frac{dC}{d\lambda _{12}}&= \frac{\delta (\delta +\lambda _2) (\lambda _2-\lambda _1)}{(\delta +\lambda _1+\lambda _{12})^2 (\delta +\lambda _2)^2} \lesseqqgtr 0, \\ \frac{dC}{d\lambda _2}&= \frac{\delta \lambda _{12}(\delta + \lambda _{11}+ \lambda _2)}{(\delta +\lambda _1+ \lambda _{12})^2(\delta +\lambda _2)^2}>0. \end{aligned}$$

These results are natural. Indeed, if we increase either the rate \(\lambda _1\) or \(\lambda _2\), we increase the net absorption rate in the process Y. This decreases the mean absorption time which, in turn accelerates the optimal exercise. The sensitivity with respect \(\lambda _{12}\) depends on relation of \(\lambda _1\) and \(\lambda _2\). Say that \(\lambda _2>\lambda _1\). Then, when we increase the rate \(\lambda _{12}\), we are again increasing the net absorption rate of Y. Increased \(\lambda _{12}\) means that it becomes likely that Y jumps from state 1 to 2 and since \(\lambda _2>\lambda _1\), Y is more likely to be absorbed from state 2 than 1. In the complementary case \(\lambda _1>\lambda _2\), we reason similarly that increased \(\lambda _{12}\) decreased the net absorption rate.

6.1.3 On the effect of increased uncertainty

Conventional wisdom in options theory says that increased risk (measured by the volatility \(\sigma \)) postpones the optimal exercise of option. Next we study whether we can draw some similar conclusion with respect to the risk emerging from the random exercise lag. We do this as follows: by moving along the level curve

$$\begin{aligned} {\bar{\zeta }} = {\mathbf {E}}[\zeta ] = \frac{\lambda _{12}+\lambda _2}{(\lambda _1+\lambda _{12})\lambda _2} \end{aligned}$$
(6.2)

of the expected absorption time, we study whether an increase in variance of the absorption time, that is, in

$$\begin{aligned} {\text {Var}}(\zeta ) = \frac{\lambda _2^2 + 2 \lambda _{12}\lambda _1 + \lambda _{12}^2}{(\lambda _1+\lambda _{12})^2\lambda _2^2} \end{aligned}$$
(6.3)

leads to an increase in the value of C defined in (6.1). First, we solve for \(\lambda _1\) in (6.2) and obtain

$$\begin{aligned} \lambda _1 = \frac{\lambda _{12}+\lambda _2-\lambda _{12}\lambda _2 {\bar{\zeta }}}{\lambda _{2}{\bar{\zeta }}}. \end{aligned}$$
(6.4)

Then substitution to the expression (6.3) and simplification yields

$$\begin{aligned} {\text {Var}}(\zeta ) = {\bar{\zeta }} \frac{{\bar{\zeta }}\lambda _2(\lambda _2- \lambda _{12})+2\lambda _{12}}{\lambda _2(\lambda _2+\lambda _{12})}. \end{aligned}$$

It is a matter of elementary differentiation to see that the gradient

$$\begin{aligned} \nabla {\text {Var}}(\zeta ) = \left( \frac{2{\bar{\zeta }}(1-{\bar{\zeta }} \lambda _2)}{(\lambda _{12}+\lambda _2)^2}, - \frac{2{\bar{\zeta }}\lambda _{12}(\lambda _{12}+ \lambda _2(2-{\bar{\zeta }}\lambda _2))}{\lambda _{2}^2(\lambda _{12}+\lambda _2)^2} \right) := (a_1,a_2). \end{aligned}$$
(6.5)

Next we do the same analysis for the coefficient C. After substituting (6.4) to the definition of C, a round of simplification and differentiation yields

$$\begin{aligned} \begin{aligned} \nabla C&=\left( \frac{\delta ^2{\bar{\zeta }}\lambda _2(1- {\bar{\zeta }}\lambda _2)}{(\delta +\lambda _2)(\lambda _{12}+ \lambda _2(1+\delta {\bar{\zeta }}))^2},-\frac{\delta ^2{\bar{\zeta }} \lambda _{12}(\delta +\lambda _{12}+\lambda _2(2-{\bar{\zeta }} \lambda _2))}{(\delta +\lambda _2)(\lambda _{12}+\lambda _2 (1+\delta {\bar{\zeta }}))^2} \right) \\&:= (b_1,b_2), \end{aligned} \end{aligned}$$
(6.6)

where \(\delta = r - \mu >0\). We identify now the directions to which the variance is increasing. Assume that \(\lambda _2 > {\bar{\zeta }}^{-1}\), the complementary case is studied similarly. Then the coefficient \(a_1<0\) and, consequently, the directional derivative of the variance to the direction \({\bar{u}}=(u_1,u_2)\), that is,

$$\begin{aligned} \nabla {\text {Var}}(\zeta ) \cdot {\mathbf {u}} > 0 \text { when } u_1 < -\frac{a_2}{a_1} u_2 = \frac{\lambda _{12}(\lambda _{12}+ \lambda _2(2-{\bar{\zeta }}\lambda _2))}{\lambda _2^2(1-{\bar{\zeta }}\lambda _2)}u_2. \end{aligned}$$
(6.7)

Assume now that \(u_1<\frac{\lambda _{12}(\lambda _{12}+ \lambda _2(2-{\bar{\zeta }} \lambda _2))}{\lambda _2^2(1-{\bar{\zeta }}\lambda _2)}u_2\) and that the coefficient

$$\begin{aligned} \frac{\lambda _{12}(\lambda _{12}+\lambda _2(2-{\bar{\zeta }} \lambda _2))}{\lambda _2^2(1-{\bar{\zeta }}\lambda _2)}<0, \end{aligned}$$
(6.8)

the case complementary to (6.8) is studied similarly. By studying the sign of the numerator and combining the result with the condition \(\lambda _2>{\bar{\zeta }}^{-1}\), we find that if

$$\begin{aligned} \lambda _2 \in \left( \frac{1}{{\bar{\zeta }}}, \ \frac{1+\sqrt{1+\lambda _{12}{\bar{\zeta }}}}{{\bar{\zeta }}} \right) , \end{aligned}$$
(6.9)

then (6.8) holds.

The next task is to look at the directional derivatives of the coefficient C. We notice that under the assumption (6.9), the coefficient \(b_1\) in (6.6) is negative. Thus

$$\begin{aligned} \nabla C \cdot {\mathbf {u}} > 0 \text { when } u_1 < -\frac{b_2}{b_1} u_2 = \frac{\lambda _{12}(\delta + \lambda _{12}+ \lambda _2(2- {\bar{\zeta }}\lambda _2))}{\lambda _2(\delta +\lambda _2)(1-{\bar{\zeta }}\lambda _2)} u_2. \end{aligned}$$
(6.10)

Define the function f as

$$\begin{aligned} f(\delta )= \frac{\lambda _{12}(\delta +\lambda _{12}+ \lambda _2(2- {\bar{\zeta }}\lambda _2))}{\lambda _2(\delta +\lambda _2)(1-{\bar{\zeta }}\lambda _2)} \end{aligned}$$

on the positive real numbers. Elementary differentiation yields

$$\begin{aligned} f'(\delta ) = - \frac{\lambda _{12}(\lambda _{12}+\lambda _2 (1-{\bar{\zeta }}\lambda _2))}{\lambda _2(\delta +\lambda _2)^2(1-{\bar{\zeta }}\lambda _2)}. \end{aligned}$$

We obtain by a simple computation

$$\begin{aligned} f'(\delta ) \gtreqqless 0 \text { when } \lambda _2 \lesseqqgtr \frac{\frac{1}{2}+\sqrt{\frac{1}{4}+\lambda _{12}{\bar{\zeta }}}}{{\bar{\zeta }}} < \frac{1+\sqrt{1+\lambda _{12}{\bar{\zeta }}}}{{\bar{\zeta }}}. \end{aligned}$$
(6.11)

Using the function f, we observe that the condition (6.8) can be written as \(f(0)<0\) and that the sufficient condition in (6.7) can be written as \(u_1 < f(0) u_2\). Furthermore, the sufficient condition in (6.10) can be written as \(u_1 < f(\delta ) u_2\). Assume that \(u_2<0\), thus \(f(0)u_2>0\). Using the condition (6.11), we find that if

$$\begin{aligned} \lambda _2 \in \left( \frac{1}{{\bar{\zeta }}}, \ \frac{\frac{1}{2}+\sqrt{\frac{1}{4}+\lambda _{12}{\bar{\zeta }}}}{{\bar{\zeta }}} \right) , \end{aligned}$$
(6.12)

then \(0<f(0)u_2 < f(\delta )u_2\). In other words, if the condition (6.12) is satisfied and the variance is increasing in the direction in a direction where \(u_2\) is negative, then the coefficient C and, consequently, the optimal stopping threshold \(x^*\) is also increasing in that same direction. By reasoning similarly, we obtain the same conclusion in case \(u_2>0\) when

$$\begin{aligned} \lambda _2 \in \left( \frac{\frac{1}{2}+ \sqrt{\frac{1}{4}+ \lambda _{12}{\bar{\zeta }}}}{{\bar{\zeta }}}, \ \frac{1+\sqrt{1+\lambda _{12}{\bar{\zeta }}}}{{\bar{\zeta }}} \right) . \end{aligned}$$
(6.13)

Summarizing, we have the following partial result: the conditions (6.12) and (6.13) identify, in their corresponding cases, parts of the level curve (6.2) where increased exercise lag risk (measured in terms of the variance of the exercise lag) postpones the optimal exercise.

6.2 Square-root diffusion

Let X be a square root diffusion known as the Cox–Ingersoll–Ross process, for the following properties of X, we refer to Campolieti and Makarov (2012), Sect. 5.2. The infinitesimal generator of X reads as

$$\begin{aligned} {\mathcal {A}} = (a-bx)\frac{d}{dx} + \frac{1}{2} \sigma ^2 x \frac{d^2}{dx^2}. \end{aligned}$$

Introduce the parameters

$$\begin{aligned} \mu = \frac{2a}{\sigma ^2} - 1 , \ \kappa = \frac{2b}{\sigma ^2}. \end{aligned}$$

The scale function and speed measure are now given by \(S'(x)=x^{-\mu -1}e^{\kappa x}\) and \(m'(x)=\frac{2}{\sigma ^2}x^{\mu }e^{-\kappa x}\) for all \(x \in (0,\infty )\). The upper boundary \(\infty \) is natural. We assume that \(\mu >0\), then the lower boundary 0 is entrance-not-exit.

For an arbitrary \(\rho >0\), the functions \(\psi _\rho \) and \(\varphi _\rho \) can be represented as

$$\begin{aligned} \psi _\rho (x) = M\left( \frac{\rho }{b}, \frac{2a}{\sigma ^2}, \frac{2b}{\sigma ^2}x \right) , \ \varphi _\rho (x) = U\left( \frac{\rho }{b}, \frac{2a}{\sigma ^2}, \frac{2b}{\sigma ^2}x \right) \end{aligned}$$

where M and U are, respectively, the confluent hypergeometric functions of first and second type, for the definitions of M and U, for the properties of these functions, see Borodin and Salminen (2015), p. 647. Finally, the Wronskian determinant can then be expressed as \(B_\rho = \kappa ^{-\mu } \frac{\Gamma (\mu +1)}{\Gamma \left( \frac{\rho }{b} \right) }\).

We consider the exercise payoff \({\mathfrak {g}}_{\varvec{\pi }}: x \mapsto {\mathbf {E}}_x\left[ e^{-r\zeta } \sqrt{X_\zeta } \right] - K\), where \(K>0\). Denote the square root function as h. Then we write using the resolvent equation

$$\begin{aligned} {\mathbf {E}}_x\left[ e^{-r\zeta } \sqrt{X_\zeta } \right]&= \frac{\lambda _1}{\mu _1}\mu _1(R_{r+\mu _1}h)(x) + \frac{\lambda _{12}}{\mu _1}\mu _1(R_{r+\mu _1}\lambda _2R_{r+\lambda _2}h)(x) \\&= \left( \lambda _1 + \frac{\lambda _{12}\lambda _2}{\lambda _2-\mu _1} \right) (R_{r+\mu _1}h)(x) - \frac{\lambda _{12}\lambda _2}{\lambda _2-\mu _1}(R_{r+\lambda _2}h)(x). \end{aligned}$$

We cannot find explicit expression for these resolvents, therefore we resort to numerical solution. To this end, R-packages fAsianOptions and numDeriv were employed to handle the Kummer functions numerically and to numerically determine the zero (\(x^*\), that is) of the derivative of \(\frac{{\mathfrak {g}}_\pi (x)}{\psi _r(x)}\), respectively. Using these, the value function were obtained numerically from Proposition 5.1.

Fig. 1
figure 1

Square root diffusion the solid black curve represents the value function and the grey dashed line represents the exercise payoff \(x \mapsto {\mathbf {E}}_x\left[ e^{-r\zeta } \sqrt{X_\zeta } \right] - K\). The black dashed line indicates the position of the optimal exercise threshold \(x^* = 1.701597\)

In Fig. 1 we illustrate the solution of the problem. The solid black curve represents the value function and the grey dashed line represents the exercise payoff \(x \mapsto {\mathbf {E}}_x\left[ e^{-r\zeta } \sqrt{X_\zeta } \right] - K\) under the parameter configuration \(a=0.03\), \(b=0.05\), \(\sigma =0.2\), \(r=0.06\), \(K=1\), \(\lambda _1=0.1\), \(\lambda _{12}=0.2\), and \(\lambda _2=0.1\). The black dashed line indicates the position of the optimal exercise threshold \(x^* = 1.701597\). As the general theory suggests, the figure indicates that the value function is smooth over the optimal exercise boundary \(x^*\).