Some results on optimal stopping under phase-type distributed implementation delay

We study optimal stopping of strong Markov processes under random implementation delay. By random implementation delay we mean the following: the payoff is not realised immediately when the process is stopped but rather after a random waiting period. The distribution of the random waiting period is assumed to be phase-type. We prove first a general result on the solvability of the problem. Then we study the case of Coxian distribution both in general and with scalar diffusion dynamics in more detail. The study is concluded with two explicit examples.


Introduction
Optimal stopping problems are widely used in economic and financial applications. One of the most notable application is the real options approach to investment planning, for a textbook treatment of the topic, see, e.g., Dixit and Pindyck (1994). Since the late 1970's, this approach has been used to study broad variety of economic planning problems, such as corporate strategy (Myers 1977;Ross 1978;Dixit and Pindyck 1995;Panayi and Trigeorgis 1998;Alvarez and Stenbacka 2001), land and real estate development (Grenadier 1996;Titman 1985;Williams 1993;Capozza and Li 1994), and natural resources investment (Brennan and Schwartz 1985;Paddock et al. 1988;Schwartz 1997). In many of these applications, the option mechanism in the real investment opportunity is the same as in financial options and can be studied by modeling it as an optimal stopping problem. The task is then to determine an investment rule which maximizes the expected present value of the total investment revenue. In the classical treatment of these problems, see, e.g., McDonald and Siegel (1986), Dixit and Pindyck (1994), it is assumed that once the investment decision is made, the project starts to deliver cash flows immediately. In practice, this is often not the case as most capital projects take a significant time to complete, Bar-Ilan and Strange (1996), Kydland and Prescott (1982). This completion period is usually called time-to-build, implementation delay or delivery lag.
The effect of a implementation delay to optimal policy is broadly studied in the literature. The most straightforward way to model the implementation delay is to assume that it is simply a constant, see Aïd et al. (2015), Bar-Ilan and Strange (1996), Bar-Ilan and Sulem (1995). The paper (Bar-Ilan and Strange 1996) is concerned with a classical timing problem of irreversible investment problem in the style of Dixit (1989), whereas Aïd et al. (2015) analyses capacity expansion subject to time to build under the objective of social surplus maximisation. The paper (Bar-Ilan and Sulem 1995) analyzes a continuous-time inventory system with fixed delivery lag. In Alvarez and Keppo (2002) the delivery lag is modelled as a function of the state variable at the time of the investment. Here, a lump sum is paid when the investment is made and the investment revenue accrues once the delivery lag depending on the value of the state variable at the time of the investment has elapsed. Investment timing subject to inter-temporally controllable investment rate is studied in Majd and Pindyck (1987). More precisely, the investor is appraising the real option with respect to the uncertain future revenue and chooses whether to engage the investment. It is assumed that the investment rate is uniformly bounded, this causes the time to build. Now, the time to build is a random but endogenous in the sense that it is a result of the implemented investment policy. The effect of time to build for a levered firm is studied in Sarkar and Zhang (2015). It is assumed in this paper, in the style of Margsiri et al. (2008), that a fixed proportion of the investment cost is payed at time when the investment is engaged and the rest is paid once the implementation period is elapsed. The length of the period is determined by the revenue process, it is the time that it takes the process to reach the the level that is a fixed percentage higher than the state at which the investment is made. The implementation period is random and endogenous also in this case.
The implementation delay can also be modelled as an exogenous random variable. This approach is taken in Lempa (2012b), where the implementation delay is a random variable independent of the state process and exponentially distributed. This allows the usage of resolvent formalism which can be applied in a general Markov setting. The current study is an extension of Lempa (2012b). Indeed, we assume that the implementation delay is again independent but has now a general phase-type distribution. Phase-type distributions have been broadly applied in different fields such as survival analysis (Aalen 1995), healthcare systems modelling (Fackrell 2009), insurance applications (Bladt 2005), queuing theory (Breure and Baum 2005), and population genetics (Hobolth et al. 2018). These distributions are a class of matrix exponential distributions that have a Markovian realisation: they can be identified as the absorption times of certain continuous time Markov chains. We use this connection in our analysis as follows: once the state process is stopped, the exogenous continuous time Markov chain is initiated and the payoff paid out at the time of absorption of this chain depends on the value of the state process at that time. This analysis, which is new to our best knowledge, generalises significantly the results of Lempa (2012b).
Phase-type distributions offer a flexible and convenient model for random time lags. Indeed, it is well known, see, e.g., Breure and Baum (2005), that phase-type distributions are dense in the class of probability distributions on R + . Furthermore, there exists well-developed methodology for estimating the parameters of phase-type distributions, see, e.g., He (2014). As we will see, the Markov realisation of the phasetype distribution allows us to work the Markov theory of diffusion to derive closed form characterisations of the optimal solution. All of these aspects of the paper at hand add to the applicability of the results.
The remainder of the paper is organised as follows. In Sect. 2 we present the optimal stopping problem. In Sect. 3 our main result on the solvability of the problem is presented. We study the case of Coxian distribution in more detail in Sect. 4. The Coxian distribution case is solved in Sect. 5 with scalar diffusion dynamics. Section 6 wraps up the study with two explicit examples.

The optimal stopping problem
In this section we present the optimal stopping problem. Let ( , F, F, P) be a complete filtered probability space satisfying the usual conditions, where F = {F t } t≥0 , see Borodin and Salminen (2015), p. 2. We assume that the underlying X is a strong Markov process defined on ( , F, F, P) and taking values in As usual, we augment the state space E with a topologically isolated element if the process X is non-conservative. Then the process X can be made conservative on the augmented state space E := E ∪ { }, see Borodin and Salminen (2015), p. 4. In what follows, we drop the superscript from the notation. By convention, we augment the definition of functions g on E with g( ) = 0.
Denote as P x the probability measure P conditioned on the initial state x and as E x the expectation with respect to P x . The process X is assumed to evolve under P x and the sample paths are assumed to be right-continuous and left-continuous over stopping times meaning the following: if the sequence of stopping times τ n ↑ τ , then X τ n → X τ P x -almost surely as n → ∞. There is a well-established theory of standard optimal stopping for this class of processes, see Peskir and Shiryaev (2006).
For r > 0, we denote by L r 1 the class of real valued measurable functions f on E satisfying the integrability condition E Denote p repeated applications of R r to the function g as (R ( p) r g). Under this probabilistic setting, we study the following optimal stopping problem. We want to find a stopping time τ * which maximizes the expected discounted value of the payoff τ → g(X τ +ζ ). Here, the function g is the payoff function and the variable ζ is a phase-type distributed random time -specific assumptions on g are made later. We define next phase-type distributions. Phase-type distributions are particular cases of matrix-exponential distributions, see, e.g., He (2014), which admit a Markovian representation. To be more precise, let Y be a continuous time Markov chain defined on ( , F, P; F) and taking values values on the set (0, 1, 2, . . . , p). The states 1, . . . , p are transient and the state 0 is absorbing. Then Y has an intensity matrix of the form where T is a p-dimensional real square matrix (the subgenerator of Y ), t is a pdimensional column vector and 0 is a p-dimensional row vector of zeros. Since the intensities of rows must sum to zero, we find that t = −Te, where e is a column vector of 1's. Let π = (π 1 , . . . , π p ) denote the initial distribution of Y over the transient states only. Then we say that the time of absorption ζ = inf {t ≥ 0 : X t = 0} has a phase-type distribution and write ζ ∼ PH(π, T). Denote as P x,π the probability measure P conditioned on the initial state x and initial distribution π. The expectation with respect to P x,π in denoted as E x,π . Now, the optimal stopping problem can be expressed as follows: where τ varies over F-stopping times and r is the discount rate. We denote an optimal stopping time as τ * . Probabilistically, the problem (2.1) can be interpreted as follows. At the initial time t = 0, we choose choose a stopping rule described by the stopping time τ . When τ is realized, the Markov chain Y is initiated from the distribution π and the payoff is realized when Y is absorbed. The payoff is thus uncertain and we can regard Y as an additional source of noise driving the payoff.

The main result
In this section, we prove our main result on the solvability of the optimal stopping problem (2.1). We make the following assumption on the structure of the matrix T.
Assumption 3.1 The eigenvalues of subgenerator T are real and strictly negative The Assumption 3.1 covers a variety of interesting cases of T. For instance, the matrix T can be a triangular matrix with strictly negative diagonal entries. Thus it covers, for example, the following distributions of ζ : 1. Exponential and mixtures of exponentials with mutually distinct rates λ i , 2. Hyperexponential and hypoexponential distribution, 3. Coxian distribution, 4. Erlang-distribution, see, e.g., Stewart (2009). We prove first an auxiliary result.
which concludes the proof.
The next theorem is the main result of this section.
Theorem 3.3 Let Assumption 3.1 hold. In addition, assume that the payoff g : E → R is in L 1 r , satisfies the condition S + := {x : g(x) > 0} = ∅ and the process X reaches a point y x ∈ S + with positive probability for all initial states x. Furthermore, assume that there exist an r -harmonic function h : E → R + such that the function h(x) is bounded. Then the value function V exists and can be identified as the least r -excessive majorant of the function g π : x → i, j,k π i (R r +μ j g)(x)α i jk t k for some coefficients α, where π = (π 1 , . . . , π p ), t = (t 1 , . . . , t p ), and the elements −μ j < 0 are the eigenvalues of T. Furthermore, an optimal stopping time τ * λ exists and can be expressed as Proof It is known from Bladt (2005), that the density f of ζ reads as f (s) = π e Ts t, where t = −Te. Denote the eigenvalues T as −μ 1 , . . . , −μ k , k ≤ p, with corresponding multiplicities a 1 , . . . , a k , where a 1 + · · · + a k = p. There exists a linear change of coordinates S such that Here, the diagonal blocks are a i ×a i -matrices, and the matrix N i satisfies N a i i = 0, for i = 1, . . . , k. Furthermore, the matrix exponent e Ts = Se As S −1 . We readily verify that Therefore the entries of the matrix e Ts are linear combinations of the functions s → e −μ j s s m , j = 1, . . . , k and m = 0, . . . , Finally, the density f can be expressed as holds for all s ≥ 0. By integrating this expression over the positive reals with respect to s with weight f given by (3.1) we obtain By lumping the factorial coefficients together with constants α and still denoting the resulting constants by α, we obtain Since the numbers μ j are strictly negative, we can rewrite the resolvent (R (m) r +μ j g) as where R h is the resolvent of the Doob's h-transform X h , see Borodin and Salminen (2015), p. 34. Using this, we rewrite our optimal stopping problem as Since −μ j < 0 for all j and the family (R h λ ) λ>0 is a contraction resolvent, we find that ⎛ where · u denotes the sup-norm, i.e., continuous. The claim follows now from Peskir and Shiryaev (2006), Thrm. I.2.7.
Theorem 3.3 gives a weak set of conditions under which the optimal stopping problem (2.1) has a well-defined solution and an optimal stopping time exists. These conditions essentially mean that the optimal stopping region is not empty and that the payoff function g cannot grow too fast, even though it can be unbounded. Note that by Theorem 3.3, we can rewrite the optimal stopping problem (2.1) as This is the form of the problem we will study in what follows. We point out that the payoff function g π can be expressed as where ζ ∼ PH(π, T).

Case study I: Coxian distribution
The purpose of this section is to analyze the optimal stopping problem (3.2) in more detail when the absorption time has a Coxian distribution. More precisely, we compute a representation for the payoff function (3.3) that lends itself to explicit computations in the next section. To this end, assume that the number of transient states of Y is p. The process Y is then started from the state 1, therefore the initial distribution π = (1, 0, . . . , 0). The subgenerator of Y is then written as The quantities λ i (resp. λ i,i+1 ) are the absorption intensities (resp. transition intensities from state i to i +1). The time of absorption ζ has a now Coxian distribution, see, e.g., Bladt (2005). We remark that t = (λ 1 , . . . , λ p ) . Furthermore, the eigenvalues We study the function where the coefficients α jk are implicitly defined in the proof of Theorem 3.3. To determine these constants, we first find that the matrix S of the eigenvectors of T reads as where the ith column is the eigenvector corresponding to the eigenvalue −μ i . Thus Since S is upper unitriangular, the inverse S −1 is also upper unitriangular. More specifically, we readily verify that Therefore the matrix exponential for m = 2, . . . , p. By substituting this to the definition of the density f , we find From this expression, we obtain the coefficients α: Having the coefficients α at our disposal, we proceed with the derivation of the payoff function g π . Let A = (α jk ) and denote the vector of resolvents (R r +μ j g), j = 1, . . . , p, as r. Then the analysis above implies that for all x ∈ E. We use the following lemma; this result can be regarded as a generalization of the resolvent equation.
Lemma 4.1 For each k = 2, . . . , p, the following holds: Proof For all j = 1, . . . , k, let U j ∼ Exp(μ j ). Let S k = k j=1 U j . Since the elements μ j are distinct, we find by using the formula for the distribution of S k from Ross (2010), p. 309 that On the other hand, since X is strong Markov, we find that proving the claim.
We can now rewrite the expression (4.3) using Lemma 4.1 to obtain the following result.
Proposition 4.2 Assume that subgenerator T of Y is given by (4.1). Then for all x ∈ E.
The expression (4.4) has a natural interpretation. Denote the time of absorption from state i as A i and time of transition from state i to state i + 1 as C i . For each k, define the random time T k = k−1 i=1 C i + A k on the set where the process Y is absorbed from state k -for k = 1, obviously T 1 = A 1 . Then This representation of the value is in line with the law of total probability. Indeed, the payoff is obtained as the sum over all paths of the process Y where the payoff corresponding to each realization is expected present value of the variable g(X ) sampled at the time of absorption T k determined by this particular realization. The times T k are sums of mutually independent exponentially distributed random times conditional on a particular realization.

Case study II: Coxian distribution with scalar diffusion dynamics
In this section, we elaborate the results of the previous section in the case where the process X follows a scalar diffusion. More precisely, we assume that the state process X evolves on R + and follows the regular linear diffusion given as the weakly unique solution of the Itô equation d X t = μ(X t )dt + σ (X t )dW t , X 0 = x. Here, W is a Wiener process on ( , F, F, P) and the real valued functions μ and σ > 0 are assumed to be continuous. Using the terminology of Borodin and Salminen (2015), the boundaries 0 and ∞ are either natural, entrance-not-exit, exit-not-entrance or nonsingular. In the case a boundary is non-singular, it is assumed to be either killing or instantaneously reflecting, see Borodin and Salminen (2015), pp. 18-20. As usually, we denote as A = 1 2 σ 2 (x) d 2 dx 2 + μ(x) d dx the second order linear differential operator associated to X . Furthermore, we denote as, respectively, ψ r > 0 and ϕ r > 0 the increasing and decreasing solution of the ODE Au = ru, where r > 0, defined on the domain of the characteristic operator of X . By posing appropriate boundary conditions depending on the boundary classification of the diffusion X , the functions ψ r and ϕ r are defined uniquely up to a multiplicative constant and can be identified as the minimal r -excessive functions, see (Borodin and Salminen 2015, pp. 18 -20). Finally, we define the speed measure m and the scale function S of X via the formulaae m (x) = 2 σ 2 (x) e B(x) and S (x) = e −B(x) for all x ∈ R + , where B(x) := x 2μ(y) σ 2 (y) dy, see Borodin and Salminen (2015), p. 17. We know from the literature that for a given f ∈ L r 1 the resolvent R r f can be expressed as S (x) ψ r (x) denotes the Wronskian determinant, see Borodin and Salminen (2015), p. 19.
We consider the optimal stopping problem where the payoff The following proposition is the main result of this section.
Proposition 5.1 Let the assumptions of Theorem 3.3 be met. In addition, assume that (A) the function g is stochastically C 2 , that is, continuous and twice continuously differentiable outside of a countable set D that has no accumulation points, (B) the function g(x) ψ r (x) attains a maximum at an interior pointx , holds for all μ > min{μ i }, where μ i are the eigenvalues of the subgenerator T.
Then there is a unique x * which maximizes the function x → g π (x) ψ r (x) . The state x * is the optimal stopping threshold for the optimal stopping problem (5.2) and the value function V π can be expressed as (5.4)

Remark 5.2 By assumption (A), we have
Thus, by assumption (B), a threshold x 0 <x such that (A−r )g(x) 0, when x x 0 .

Example 5.3 We present a set of suffients conditions for the assumptions (A)-(D) of Proposition 5.1. Assume that the function g is
(1) non-negative and non-decreasing with g(0+) = 0, (2) piecewise linear with a finite number corner points.
These assumptions cover various option-like payoffs such as g(x) = (x − K ) + . Now the assumption (A) is clearly satisfied. The function ψ r is known to be increasing. If it is furthermore assumed to convex, as it is for instance for GBM (see Alvarez (2003) for general conditions for convexity of ψ r ), then the assumption (B) is also satisfied. With regard to assumption (C), we find by assuming sufficient regularity that for some constant c k > 0. Thus l (x) = c k (μ (x) − r ) < 0 whenever μ (x) < r for all x, that is, the growth rate of the drift function must be uniformly bounded by the rate of discounting. This condition is not very severe and is satisfied for reasonable parameter configurations by GBM and various mean-reverting diffusions such as Cox-Ingersoll-Ross and Verhulst-Pearl processes. Finally, since (A − r )(R r +μ g)(x) = μ(R r +μ g)(x) − g(x), we find that the condition of assumption (D) is satisfied in this case. An even simpler payoff structure g(x) = x − K is also covered by (A)-(D). In comparison to (1) and (2), g(0+) is now negative and we cannot argue as above for (D) to hold. Assume now, that the function μ is non-negative at origin. Then (A − (r + μ))g(0+) = μ(0+) − (r + μ)g(0+) > 0 for all μ > 0 and, consequently, the assumption (D) holds.
Remark 5.4 Proposition 5.1 gives sufficient conditions for the existence of a onesided optimal stopping rule. Analogous conditions that would result in two-sided rules could most likely be provided. Indeed, as Remark 5.2 points out, the function g is rsubharmonic for on (0, x 0 ). On the other hand, if we assume that g is r -subharmonic on an interval (a, b), where 0 < a < b < ∞, then we would be likely to work out a set of assumptions such the resulting optimal continuation region is (z * , y * ), where 0 < z * < y * < ∞. These assumptions would most likely include boundedness and monotonicity assumptions of the functions g ψ r and g ϕ r , see Lempa (2010). However, this generalization is beyond the scope of this paper.
Lemma 5.5 Let the assumptions of Proposition 5.1 hold. Then, for all μ > 0, the the function (A − r )(R r +μ g) is decreasing.
Lemma 5.6 Let the assumptions of Proposition 5.1 hold. Then, for all μ > 0, the function x → is decreasing for all x ≥x.
Proof Following Remark 5.2, we find by changing the order of integration that By the virtue of Lemma 5.5, it is enough to show that the differential (5.6) is negative at x. To this end, define the functionψ x r (y) = ψ r (y)1 {y≤x} . Since the upper boundary ∞ is natural for the diffusion X , it follows from Lempa (2012a), Lemma 2.1 and the fact that the function ψ r is non-negative that μ(R r +μψ x r )(y) ≤ μ(R r +μ ψ r )(y) = ψ r (y), for all y ∈ (0, ∞). By evaluating the right hand side at the pointx, we then find that x 0 ψ r +μ (z) x z ϕ r +μ (y)ψ r (y)m (y)dy (A − r )g(z)m (z)dz This proves the claim.
Proof of Proposition 5.1 Let k ∈ {2, . . . , p} and consider the function We start by analyzing the properties of μ k (R r +μ k g) and show that it satisties the same assumptions as g. Under the assumption (A), the function μ k (R r +μ k g) is obviously stochastically C 2 . Lemmas (5.5) and (5.6) coupled with the assumption (D) imply that μ k (R r +μ k g) satisfies also the assumptions (B) and (C). To see that the condition (D) is also satisfied, assume that parameters η 1 , η 2 > min{μ i }. Without loss of generality, we cas assume that η 1 > η 2 . Then the resolvent equation implies that We can now use the same procedure iteratively through the entire sum in (5.3) to conclude that the function g π satisfies also the same assumptions as g. The claim follows now from Alvarez (2001), Thrm. 3.

Remark 5.7
We observe from the proof of Proposition 5.1, in particular, from Lemma 5.6, that the optimal stopping threshold x * is dominated by the statex. This state is the optimal stopping threshold for the same problem in absence of the implementation delay. In other words, we observe that the introduction of the exercise lag accelerates the optimal exercise of the option to stop.

Remark 5.8
We observe from the proof of Proposition 5.1 also that we can modify the function (5.3) and allow for different payoffs for absorption from different phases.
To elaborate, say that we have a collection of functions (g k ) p k=1 which all satisfy the assumptions of Proposition 5.1. Then we modify (5.3) such that if the absorption of Y occurs from the state i, the resulting payoff is given by the function g i , that is, Then we can do a similar analysis as in the proof of Proposition 5.1 to conclude that the optimal stopping threshold is given the global maximum of the function x → g π (x) ψ r (x) . To see why this is interesting, consider the following example. In real option applications, the investor often pays a lump sum cost K either when the investment option is exercised or when project is completed. The basic form of the exercise payoff is x → E x e −r ζ g(X ζ ) , where the function g can, for instance, be x → x − K . Now the lump sum is paid at the completion time. To illustrate how to shift this payment to the start of the project, consider the following. We assume for brevity that the lag variable ζ has a two-phase Cox distribution. Furthermore, denote the constant function x → 1 as 1. Then for a sufficiently nice function g. Define the functions If the functions x → g i (x) ψ r (x) satisfy the assumptions of Proposition 5.1, the conclusion of 5.1 holds for the payoff (5.7). This payoff corresponds to the case where the lump sum is paid at time when the project is initiated.

Two examples
The purpose of this section is to illustrate the previous results with explicit examples. We assume throughout the section that the process Y has the subgenerator This implies that the time of absorption ζ has a two-phase Coxian distribution.

Geometric Brownian motion
Assume that the process X is given by a geometric Brownian motion, that is, the solution of the stochastic differential equation d X t = μX t dt + σ X t dW t . Here, W is a Wiener process and the constants μ and σ > 0 satisfy the conditions μ < r and μ − 1 2 σ 2 > 0. Then the optimal stopping time is almost surely finite. The scale density S reads as S (x) = x − 2μ σ 2 and the speed density m as m (x) = 2 σ 2 x 2 x 2μ σ 2 .

A problem with an explicit solution
We follow Remark 5.8 and consider the exercise payoff g π : x → E x e −r ζ X ζ − K , where K > 0. Using the identity (5.7), we obtain where id is the identity function. Let We readily verify that the functions g i satisfy the assumptions of Proposition 5.1. Next, we study the function x → g π (x) ψ r (x) . Since we start by computing the required resolvents. First, we find that We can then write the expectation and, consequently, express the exercise payoff as g π (x) = C x − K . The solution to this optimal stopping problem is well known, see, e.g., McKean (1965). The optimal stopping threshold, which can be identified as the global maximum of the function x → g π (x) ψ r (x) , is the level x * = K β r C(β r −1) . The value function is

Sensitivity analysis
We study the sensitivities of the trigger threshold x * with respect to the jump intensities of the process Y . We first observe that here, λ denotes any of the rate variables in the coefficient C. For brevity, denote δ = r − μ. Then elementary computation yields These results are natural. Indeed, if we increase either the rate λ 1 or λ 2 , we increase the net absorption rate in the process Y . This decreases the mean absorption time which, in turn accelerates the optimal exercise. The sensitivity with respect λ 12 depends on relation of λ 1 and λ 2 . Say that λ 2 > λ 1 . Then, when we increase the rate λ 12 , we are again increasing the net absorption rate of Y . Increased λ 12 means that it becomes likely that Y jumps from state 1 to 2 and since λ 2 > λ 1 , Y is more likely to be absorbed from state 2 than 1. In the complementary case λ 1 > λ 2 , we reason similarly that increased λ 12 decreased the net absorption rate.

On the effect of increased uncertainty
Conventional wisdom in options theory says that increased risk (measured by the volatility σ ) postpones the optimal exercise of option. Next we study whether we can draw some similar conclusion with respect to the risk emerging from the random exercise lag. We do this as follows: by moving along the level curvē ζ = E[ζ ] = λ 12 + λ 2 (λ 1 + λ 12 )λ 2 (6.2) of the expected absorption time, we study whether an increase in variance of the absorption time, that is, in Var(ζ ) = λ 2 2 + 2λ 12 λ 1 + λ 2 12 (λ 1 + λ 12 ) 2 λ 2 2 (6.3) leads to an increase in the value of C defined in (6.1). First, we solve for λ 1 in (6.2) and obtain Then substitution to the expression (6.3) and simplification yields It is a matter of elementary differentiation to see that the gradient 2ζ λ 12 (λ 12 + λ 2 (2 −ζ λ 2 )) λ 2 2 (λ 12 + λ 2 ) 2 := (a 1 , a 2 ). (6.5) Next we do the same analysis for the coefficient C. After substituting (6.4) to the definition of C, a round of simplification and differentiation yields where δ = r − μ > 0. We identify now the directions to which the variance is increasing. Assume that λ 2 >ζ −1 , the complementary case is studied similarly. Then the coefficient a 1 < 0 and, consequently, the directional derivative of the variance to the directionū = (u 1 , u 2 ), that is, Assume now that u 1 < λ 12 (λ 12 +λ 2 (2−ζ λ 2 )) λ 2 2 (1−ζ λ 2 ) u 2 and that the coefficient the case complementary to (6.8) is studied similarly. By studying the sign of the numerator and combining the result with the condition λ 2 >ζ −1 , we find that if λ 2 ∈ 1 ζ , 1 + 1 + λ 12ζ ζ , (6.9) then (6.8) holds.
The next task is to look at the directional derivatives of the coefficient C. We notice that under the assumption (6.9), the coefficient b 1 in (6.6) is negative. Thus Define the function f as on the positive real numbers. Elementary differentiation yields We obtain by a simple computation f (δ) 0 when λ 2 1 2 + 1 4 + λ 12ζ ζ < 1 + 1 + λ 12ζ ζ . (6.11) Using the function f , we observe that the condition (6.8) can be written as f (0) < 0 and that the sufficient condition in (6.7) can be written as u 1 < f (0)u 2 . Furthermore, the sufficient condition in (6.10) can be written as u 1 < f (δ)u 2 . Assume that u 2 < 0, thus f (0)u 2 > 0. Using the condition (6.11), we find that if λ 2 ∈ ⎛ ⎝ 1 ζ , 1 2 + 1 4 + λ 12ζ ζ ⎞ ⎠ , (6.12) then 0 < f (0)u 2 < f (δ)u 2 . In other words, if the condition (6.12) is satisfied and the variance is increasing in the direction in a direction where u 2 is negative, then the coefficient C and, consequently, the optimal stopping threshold x * is also increasing in that same direction. By reasoning similarly, we obtain the same conclusion in case u 2 > 0 when λ 2 ∈ ⎛ ⎝ 1 2 + 1 4 + λ 12ζ ζ , 1 + 1 + λ 12ζ ζ ⎞ ⎠ . (6.13) Summarizing, we have the following partial result: the conditions (6.12) and (6.13) identify, in their corresponding cases, parts of the level curve (6.2) where increased exercise lag risk (measured in terms of the variance of the exercise lag) postpones the optimal exercise.

Square-root diffusion
Let X be a square root diffusion known as the Cox-Ingersoll-Ross process, for the following properties of X , we refer to Campolieti and Makarov (2012), Sect. 5.2. The infinitesimal generator of X reads as Introduce the parameters The scale function and speed measure are now given by S (x) = x −μ−1 e κ x and m (x) = 2 σ 2 x μ e −κ x for all x ∈ (0, ∞). The upper boundary ∞ is natural. We assume that μ > 0, then the lower boundary 0 is entrance-not-exit.
For an arbitrary ρ > 0, the functions ψ ρ and ϕ ρ can be represented as where M and U are, respectively, the confluent hypergeometric functions of first and second type, for the definitions of M and U , for the properties of these functions, see Borodin and Salminen (2015), p. 647. Finally, the Wronskian determinant can then be expressed as B ρ = κ −μ (μ+1) ( ρ b ) .
We cannot find explicit expression for these resolvents, therefore we resort to numerical solution. To this end, R-packages fAsianOptions and numDeriv were employed to handle the Kummer functions numerically and to numerically determine the zero (x * , that is) of the derivative of g π (x) ψ r (x) , respectively. Using these, the value function were obtained numerically from Proposition 5.1.
In Fig. 1 we illustrate the solution of the problem. The solid black curve represents the value function and the grey dashed line represents the exercise payoff x → E x e −r ζ X ζ − K under the parameter configuration a = 0.03, b = 0.05, σ = 0.2, r = 0.06, K = 1, λ 1 = 0.1, λ 12 = 0.2, and λ 2 = 0.1. The black dashed line indicates the position of the optimal exercise threshold x * = 1.701597. As the general theory suggests, the figure indicates that the value function is smooth over the optimal exercise boundary x * .