Optimal dividend distribution under Markov regime switching

We investigate the problem of optimal dividend distribution for a company in the presence of regime shifts. We consider a company whose cumulative net revenues evolve as a Brownian motion with positive drift that is modulated by a finite state Markov chain, and model the discount rate as a deterministic function of the current state of the chain. In this setting, the objective of the company is to maximize the expected cumulative discounted dividend payments until the moment of bankruptcy, which is taken to be the first time that the cash reserves (the cumulative net revenues minus cumulative dividend payments) are zero. We show that if the drift is positive in each state, it is optimal to adopt a barrier strategy at certain positive regime-dependent levels, and provide an explicit characterization of the value function as the fixed point of a contraction. In the case that the drift is small and negative in one state, the optimal strategy takes a different form, which we explicitly identify if there are two regimes. We also provide a numerical illustration of the sensitivities of the optimal barriers and the influence of regime switching.


Introduction
A classical topic in finance and actuarial science is that of optimal dividend distribution for a company, which can be phrased as the problem of determining the optimal timing and sizes of dividend payments in the presence of bankruptcy risk, where the usual objective is to maximize the expected value of the cumulative discounted dividend payments until bankruptcy. The earliest work in this setting can be traced back to De Finetti [8] who studied the dividend problem for an insurance company under the binomial model. In continuous time, the problem was posed and solved in a Brownian motion model for the cash reserves by Jeanblanc-Picqué and Shiryaev [20] and Asmussen and Taksar [2], using optimal control theory. Since then an extensive literature has appeared on the dividend problem and its extensions, including reinsurance (e.g. [26]), optimal investment of the reserves (e.g. [18]), tax and proportional cost (e.g. [6,23]), and growth options (e.g. [7]).
In general, the form of the optimal dividend policy has been found to depend on the expected growth rate and variability of future revenues, and on the discount rate. These quantities will evolve in time, reflecting changing market and economic conditions, and those changes may happen gradually or occur abruptly and be more substantial. Here we focus on the changes of the latter type (also called regime shifts or switches) and model the cumulative net revenues of the company as a Brownian motion with the drift and volatility modulated by a finite state Markov chain, and the discount rate as a deterministic function of the chain. Since Hamilton [16,17], a substantial econometric literature has appeared that supports the use of Markov regime-switching models to describe business cycles, term structure of interest rates and other macroeconomic quantities. Such models have been shown to be capable of capturing occasional simultaneous and substantial changes of the parameters. Regime-switching models also have the advantage of retaining a degree of analytical tractability, and models from this class can in principle approximate a given diffusion arbitrarily closely by taking the state space large enough and specifying the generator matrix appropriately. In the mathematical finance literature, regime-switching models have become more popular, and have found their applications in stock price models, interest rate models and the real option literature. See e.g. Boyarchenko and Levendorskiǐ [4], Buffington and Elliott [5], Driffill et al. [9], Duan et al. [10], Elliott et al. [12], Guo and Zhang [14], Jiang and Pistorius [21], Naik [24] for derivative pricing, Elliott and van der Hoek [11] and Guidolin and Timmermann [13] for asset allocation, Bäuerle [3], Li and Lu [22], Zhu and Yang [28] and Asmussen [1] for ruin and risk theory, and Guo et al. [15] for irreversible investment.
In this regime-switching setting, we consider the problem of the management of the company to find a dividend distribution policy that maximizes expected discounted dividend payments until bankruptcy, which is defined to occur at the first moment when the level of the cash reserves hits zero. We restrict ourselves to the case that the management can only control the timing and size of the dividend payments. In the case that the drift is positive in every regime, we show that it is optimal to adopt a barrier-type strategy at certain positive levels that depend on the current regime, that is, it is optimal to make the minimal payments needed to keep the cash reserves below these barrier levels. When a regime switch occurs, dividend payments are to be postponed or brought forward in time, according to whether the barrier jumps up or down, and in the latter case a lump sum should be paid if the reserves were above the new barrier at the moment of the switch. In the case of a single regime, this strategy reduces to the classical constant barrier strategy that was found before by Asmussen and Taksar [2].
After an adverse economic regime switch, it could happen that the expected net revenue of the company becomes negative, in which case the optimal strategy takes a different form. Intuitively, it is clear that if the drift is negative and the reserves are sufficiently small, it will be optimal to liquidate the company by paying out the reserves as a lump sum. In the absence of regime switching, this optimality actually holds irrespective of the size of the reserves. In the presence of regime switching, however, we find that it is optimal to continue the business if the drift is small and negative and the reserves are not too small: the prospect of switching to a better regime with suitable positive drift outweighs the risk of ruin. In this case, the value function is not concave, which differs from what is usually found in singular control problems. An explicit solution is derived in Sect. 5 in the case of two regimes.
The dividend optimization problem gives rise to a singular control problem, whose HJB equation takes the form of a coupled system of variational inequalities, due to the fact that the problem is driven by a two-dimensional Markov process. A commonly used direct approach for explicitly solving optimal control problems proceeds by guessing a candidate optimal solution, constructing a corresponding value function, assuming smoothness if necessary, and subsequently verifying its optimality by employing a verification result. Here we follow a different approach to construct the candidate value function, by directly employing a dynamic programming equation. We prove that the value function is the fixed point of a certain contraction operator, which is given explicitly in terms of the initial data, and derive an explicit iterative algorithm to calculate the value function, which 'decouples' the different regimes such that, at any stage, one-dimensional control problems are solved. This construction yields in particular that the value function is C 2 , which implies that the value function is a classical solution of the HJB equation. At this point, it is worth mentioning that although it is possible to follow the direct approach, this seems to become intractable if the number of states is large, as it leads to a large collection of systems of coupled nonlinear equations (corresponding to different orderings of the dividend levels).
After the first version of this paper was written, we discovered a related work on optimal dividend problems by Sotomayor and Cadenillas [27]. In a setting that is a particular case of ours, with two regimes and constant rate of discounting, they solve three dividend distribution problems with bounded and unbounded dividend rates, and in the presence of fixed cost, respectively, under the assumption of existence of a solution to the smooth fit equation.
The remainder of the paper is organized as follows. In Sect. 2, we give a statement of the problem and present a dynamic programming equation and related theorem. In Sects. 3 and 4, we present the optimal solution and give a proof by constructing an iterative algorithm to calculate the value function V . Section 5 is devoted to a case study of the setting of two regimes, with a numerical illustration of the sensitivities of the optimal barrier levels to the different parameters. Section 6 concludes. Some proofs are presented in the Appendix.

Problem formulation
Let {W t : t ≥ 0} be a Wiener process and {Z t : t ≥ 0} a continuous time Markov chain with finite state space E and generator matrix Q = (q ij ) i,j ∈E , independent of W . Assume that the cash reserves X = {X t , t ≥ 0} evolve, in the absence of dividend payments, as a regime-switching linear Brownian motion, that is, X satisfies the SDE where Z represents the state of economy. For every state i in E, both drift parameter μ(i) and volatility parameter σ (i) > 0 are assumed to be known constants. In case there is no notational confusion possible, we write μ i and σ i for μ(i) and σ (i), respectively. The processes X and Z are defined on some filtered probability space (Ω, F , F, P) where F = {F t , t ≥ 0} denotes the right-continuous completed filtration jointly generated by X and Z. We denote by P x,i and P x the measure P conditioned on {X 0 = x, Z 0 = i} and {X 0 = x}, respectively, and write E x,i and E x for the corresponding expectations. We assume that the processes X and Z are both fully observable to the shareholders, and that these decide on the dividend strategies on the basis of the available information.
A dividend strategy D is a nondecreasing and right-continuous stochastic process D = {D t : t ≥ 0} with D 0− = 0. Here D t represents the cumulative amount of dividends that has been paid out until time t. We assume that, apart from reducing the reserves, dividend payments have no effect on the business and that there are no transaction costs associated to the payment or receipt of dividends. The dynamics of the risk reserve process U = {U t : t ≥ 0} in the presence of dividend payments are then given by for all t until the time τ of bankruptcy and dU t = 0 for t after τ , where is the first time that U hits zero. To avoid degeneracies, only those dividend strategies will be considered that have no lump sum dividend payments larger than the current level of the reserves. A dividend strategy D is called admissible if D is F-adapted, dD t = 0 for t ≥ τ and Denoting by D the set of admissible dividend strategies, the objective function of the shareholders is given by where V D denotes the expected value of the discounted dividends until the time of ruin τ under the dividend strategy D, with r : E → (0, ∞) the Markov-modulated rate of discounting. The problem for the shareholders is to identify a dividend strategy D * ∈ D that attains the supremum in (2.2), that is, V ≡ V D * .

A priori bounds
Assume for the moment that there is only a single regime, E = {i}. Then we are back in the classical linear Brownian motion setting that was investigated in Asmussen and Taksar [2]. They showed that if μ i > 0, the optimal strategy is a constant barrier strategy at the level According to this strategy, the overflow of the reserves above the level a * i is immediately paid out as dividends. The corresponding value function is given by Equations (2.3)-(2.6) show that the value function and optimal level are both functions of the drift and of the rate of discounting per unit of squared volatility. This observation leads one to expect that V (x, i) is bounded above and below by the values V + (x) and V − (x) of firms operating in a more or less favorable environment, with volatility constant equal to one and with drift and discounting equal to ( μ + σ 2 spectively. The following result confirms that these explicit bounds indeed hold true.
for all The bounds in (2.7) will be employed in the construction of the optimal value function in Sect. 4.

Dynamic programming equation and comparison result
The following dynamic programming equation for the value function of the singular control problem (2.2) will form the basis for its solution.

Proposition 2.2 We have
where ζ denotes the epoch of the first regime switch and Λ t = t 0 r(Z s ) ds.
The proof of Proposition 2.2 is given in the Appendix. This dynamic programming equation is associated with the Hamilton-Jacobi-Bellman equation for the value function given by where denotes the partial derivative with respect to x and G denotes the infinitesimal generator of (X, Z), which acts on functions w : The next result shows that any sufficiently regular supersolution of the HJB equation (2.9) dominates the value function.

Theorem 2.3 Assume that there exists a function
Then: (ii) If in addition w = V D for some D ∈ D, then D is an optimal strategy and V ≡ w.
Proof (i) Fix an arbitrary D ∈ D and let U be the corresponding risk process. The statement will follow once we have shown that w(x, i) ≥ V D (x, i). Applying a generalized form of Itô's lemma to the process {e −Λ T ∧τ w(U T ∧τ , Z T ∧τ ), T ≥ 0}, we find that Here the last integration is over the denoting the Dirac measure at the point (s, z), and the compensator ν is given by where δ is the counting measure on E. Notice from (2.10) that as M is bounded below and M 0 = 0, M is a supermartingale with E[M T ∧τ ] ≤ 0. In view of the HJB equation (2.9), the first three terms of right-hand side of (2.10) are nonpositive, so that taking expectations yields that By letting T → ∞ and invoking the monotone convergence theorem and the fact that (ii) The equality follows since V D ≤ V by definition of V and V D ≥ V by part (i).

The optimal dividend strategy
Following the classical approach to solving optimal control problems, we next construct a candidate optimal solution. In view of the fact that (U, Z) is a Markov pro- Fig. 1 Illustrated is the cash reserves process corresponding to a modulated barrier strategy. The barrier levels are represented by horizontal lines. In this case, the barrier jumps down at the moment of the regime switch and a lump sum payment is made cess, we consider strategies that pay out the overflow of the cash reserves above a regime-dependent level.

Definition 3.1 A modulated barrier strategy at level
where U b is the risk process (2.1) corresponding to D b .
According to this strategy, dividends are only paid out when U b is at the barrier b, which implies that the process D b is a local time (see Fig 1). It is straightforward to verify that D b can be explicitly expressed in terms of a running supremum as Employing the heuristic 'principle of C 2 fit' of singular control allows us to define candidate optimal levels as the solution of the system of equations if such a solution exists. In fact, (3.1) follows from Lemma 4.5 and Proposition 4.1 as we shall see later. If the drift is positive in all regimes, this candidate solution is indeed optimal: , and the following holds true: (i) The optimal value function V is a classical solution of the HJB equation (2.9).
In particular, V is equal to the unique solution The modulated barrier strategy at b * is an optimal policy in (2.2).
If the positive drift condition is not satisfied, it is not necessarily optimal to adopt a modulated barrier strategy. Indeed, in Sect. 5 we show that in the case of two regimes with a small and negative drift in one state and a positive in the other, the optimal dividend barrier depends on the regime as well as on the level of the reserves. In the following section, we give a proof of Theorem 3.2 by presenting an iterative construction of the optimal value function.

Algorithm to compute the value function V
Throughout this section we assume that μ i > 0 for all i ∈ E. We start by observing that the value function V b of a modulated barrier strategy at level b = (b i , i ∈ E) solves a fixed point equation in terms of the function W (q) i .
The previous result can be utilized to calculate the value function V b of the barrier strategy at b by iterating the map where the convergence is in · -norm and Proof of Proposition 4.1 Denote by U i = X i − D i the risk process corresponding to dividends D i being paid according to a constant barrier strategy at b i , with be the ruin times of U b and U i , and denote by ζ the epoch of the first regime switch and by η(a) an independent exponential random time with mean 1/a. Then we find that the ensemble . Thus, the value z 1 (x, i) of the discounted dividends received before ζ is given by where θ i = r i − q ii and in the last line we used (2.4). Similarly, the value z 2 (x, i) of the discounted dividends received after ζ satisfies, in view of the Markov property, Employing the identity (see e.g. Pistorius [25], , we find the result as stated. Proof of Corollary 4.2 Note that B endowed with the norm · is a complete metric space and that T maps B to itself, by definition of T and the fact that W (θ i ) i is C 1 . Subsequently we see that Thus it follows that T is a contraction on B, which implies the convergence in (4.3).

Iteration
In a next step, we consider the auxiliary control problem with a prescribed payoff function v to be received at the epoch of the first regime switch ζ , i.e., This singular control problem can be solved explicitly if v lies in the set of smooth concave payoff functions C = {v ∈ B : v i is increasing and concave, i ∈ E}.

Proposition 4.3
Let v ∈ C. Then Uv(·, i) ∈ C 2 [0, ∞) for i ∈ E, and the optimal strategy in (4.5) is given by a regime-switching barrier strategy at the levels with A v given in (4.2).
Supposing that the map U : v → Uv preserves concavity and smoothness, this proposition can be applied iteratively, as follows: Initialize by setting n = 0 and v = v 0 for some v 0 ∈ B and then , n ← n + 1 and v n ← v, and return to step (1).
The following result shows that the sequence (v n ) generated in this way converges to the value function V as n → ∞.
where the convergence is with respect to the norm · . In particular, V is concave.
In fact, we shall show below that U is a contraction on C. Notice that Theorem 3.2(i) is now a direct consequence of these results. Indeed, by combining Proposition 4.3 and the dynamic programming equation (2.8), we see that the optimal strategy in (2.2) is given by a modulated barrier strategy at some positive finite levels. Explicit examples of initial functions v ± 0 are the V ± given in Proposition 2.1.

Proofs
This subsection is devoted to the proofs of Propositions 4.3 and 4.4, which we split into a number of steps. The first step is to verify that the b v i as defined above are positive and finite, which is a matter of straightforward calculations using the explicit expression (2.5).
In particular, The proof of Lemma 4.5 is given in the Appendix. The key step is to verify next that the value function of a barrier strategy at level b v with a concave payoff function v(·, i) is itself concave.

Lemma 4.6 (Preservation of concavity)
Proof We first assume that v ∈ C ∩C 2 [0, ∞), and write b instead of b v to simplify the notation. In view of the smoothness of v and the definition of w i (x) := (T b v v)(x, i), we can obtain from (2.5) and (4.1) that for x ∈ (0, b i ), From these expressions, (2.5) and In addition, we have w i (b i ) = 1 from the above expressions and (4.1), and w i (b i ) = 0 by Lemma 4.5. As a result, w i is in C 2 [0, ∞).
An application of Itô's lemma shows that w i satisfies the ODE, for x ∈ (0, b i ), with boundary conditions w i (0) = 0, w i (b i ) = 1. Since w i (x) ≥ 0 for x > 0 and w i (0) = 0, we deduce that w i (0+) ≥ 0. Furthermore, the continuity of w i and the fact that w i (0) = 0 and v i (0) = 0 imply that so that w i (0+) < 0, as μ i > 0 by assumption. Write now ξ i (x) = w i (x) for x > 0, and denote ξ i (0) = w i (0+). By twice differentiating the first equation of the original system (3.2), which is justified since w i (x) ∈ C 4 (0, b i ) as a consequence of the assumptions, we find that Another application of Itô's lemma then yields for ξ the representation In particular, we deduce that x → w i (x) is concave and increasing on [0, ∞).
Suppose now that v ∈ C and let v n ∈ C ∩ C 2 [0, ∞) be a sequence that pointwise increases to v.

i), and the concavity of T b v (v)
directly follows from the fact that the pointwise limit of concave functions is concave.
We next verify that the modulated barrier strategy at b v is optimal for the problem (4.5).

Lemma 4.7 (Optimality of barrier strategies)
For v ∈ C, we have for , by continuity). Since the w i are C 2 and concave and satisfy (4.7), the assertion of the lemma follows by an argument similar to the one used in the proof of Theorem 2.3. Fix an arbitrary D ∈ D and let U be the corresponding risk process. Applying a generalized form of Itô's lemma to the process {e −Λ T ∧τ w(U T ∧τ , Z T ∧τ ), T ≥ 0}, taking expectations and using that f v i (x) ≤ 0 as in the proof of Theorem 2.3, we find that By letting T → ∞ and invoking the monotone convergence theorem and the fact that w and v are nonnegative and f v i (x) ≤ 0, we obtain w(x, i) ≥ Uv(x, i). Since the barrier strategy at level b v is an element of D, it also holds that Uv( The convergence of the iteration procedure is an immediate consequence of the following contraction property of Uv.
and Uv ∈ C for v ∈ C. Hence it follows that for v, w ∈ C, where C < 1 and the second inequality follows as in the proof of Corollary 4.2. Thus U is a contraction on C.

Proof of Propositions 4.3 and 4.4 Proposition 4.3 directly follows by combining Lemma 4.5 with Lemma 4.7.
From the definition of U and the dynamic programming equation, we directly see that Uv ≤ V ≤ Uw if v ≤ V ≤ w. In particular, taking v = v − 0 and w = v + 0 and repeatedly applying the former inequality yields that v − n ≤ V ≤ v + n . It follows from Lemma 4.8 that (v + n ) and (v − n ) converge to the unique fixed point of U , which is therefore equal to V . Next note that in view of Lemma 4.6, v ± n are concave (as we took v ± 0 ∈ C), so that V , a pointwise limit of concave functions, is also concave. This completes the proof of Proposition 4.4.

Positive drifts
From now on, we restrict ourselves to the case of two regimes, E = {0, 1}. For the setting of positive drifts, μ 0 , μ 1 > 0, we derive a system of two nonlinear equations for the optimal dividend barriers. We denote by F 0 and F 1 the quadratic polynomials given by with two different real roots λ k 1 and λ k 2 . Consider the fourth order polynomial The equation F k (λ) = 0 has two different roots λ k − < λ k + given in (2.6), and the equation F 0,1 (λ) = 0 has four real roots satisfying λ 1 < λ 2 < 0 < λ 3 < λ 4 .
Solving the systems of differential equations in Theorem 3.2 leads to the following result: where d = (d 1 , . . . , d 4 ) solves the linear system Ad = h, where h = (0, 0, 1, 0) and The proof of Proposition 5.1 is given in the Appendix.

Sensitivities of the optimal barriers
To illustrate the effects of regime switching and the sensitivities of the optimal barrier levels, we numerically solved the system of nonlinear equations in Proposition 5.1 for different parameter values, and compared the results with the explicit solutions (2.3) and (2.4) corresponding to the absence of regime switching. The nonlinear equations were solved using a Maple routine based on the standard quasi-Newton method. We chose the parameters as in Table 1 and varied μ 0 , σ 0 , q 00 and r 0 individually while keeping the other parameters fixed-the results are given in Table 2. We see that when the drift parameter μ 0 is increased, then initially b * 0 and b * 1 increase, while they decrease when the drift μ 0 becomes very large. Apparently, for relatively low drift it is optimal to reduce the probability of ruin, while for large drift the effect of discounting takes priority. Table 2 also shows that the two barriers b * 0 and b * 1 monotonically increase when σ 0 increases. A larger volatility leads to a higher probability of ruin, requiring the company to raise the level of the barrier in order to protect its future operations. We can also observe the effect of the transition rates of the underlying Markov chain. For example, if the rate is −q 00 = 0.01, the chain spends a large part of the time in state 0 (in equilibrium, 3/3.01 ≈ 99.7% of the time), which we find back as b * 0 = 1.014 is very close to a * 0 = 1.013, whereas if −q 00 and −q 11 are of similar size, the chain spends on average similar amounts of time in both states and the level b * 0 differs substantially from a * 0 . Finally, we note that both b * 0 and b * 1 decrease when the rate of discounting r 0 is increased; if the rate of discounting is higher, it is optimal to increase the dividend payments by lowering the dividend barriers.

Adverse regime shifts: negative drift
We next consider the case that the drift is positive in one state and negative in the other. Intuitively, it is clear that for sufficiently small reserves a quick bankruptcy of the company is quite likely if the drift is negative, so that it is optimal to liquidate the company by paying out the entire reserves as a lump sum. If, however, the negative drift is moderate and the reserves are not too small, the expected future gains from a regime switch to a 'good' state may outweigh the effect of the negative drift, and it may be optimal to continue the business. In that case, a sensible strategy could be to liquidate the company for small initial reserves, but to pay out dividends according to a modulated barrier strategy for larger levels of reserves, which we formalize as follows.

Definition 5.2 A modulated liquidation and dividend barrier strategy at levels
where U d,b is the insurance risk process (2.1) corresponding to D d,b .
Condition (iii) states that all the reserves are paid out as dividends once the risk reserves fall below the level d(Z t ). Define next the critical levels for all x small enough, which implies that Δ i ∈ (0, ∞]. If Δ i = +∞, which is the case if μ i < 0 and |μ i | is sufficiently large, it is optimal in state i to liquidate the company for any level of the reserves, by immediately paying out all the reserves as dividends-this can be directly checked from Theorem 2.3. In the case that μ 0 < 0 < μ 1 and Δ 0 < ∞ (the case μ 1 < 0 < μ 0 follows by relabeling the states), it turns out that it is optimal to continue paying dividends if the reserves are large enough, where the 'liquidation' level d * 0 > 0 solves the smooth fit equation V 0 (d * 0 ) = 1. The solution is explicitly given as follows.

columns given by
The value functions are given by The proof is given in the Appendix. Observe that the value function V 0 is not concave, as there are two disjoint intervals where it has unit slope.
As illustration, we provide next a numerical example of a case where a modulated liquidation-dividend strategy is optimal.

Conclusion
In this paper, we have shown that in the presence of regime shifts, the optimal dividend policy is given by a threshold strategy set at a level that is a function of the current regime. That is to say, the policy that maximizes the expectation of the net present value of the paid dividends until the moment of default consists of paying out as dividends the overflow of the cash reserves above a certain optimal threshold, where this threshold jumps up or down exactly at the moment when the regime shifts. Hence, at the moment of a regime shift when the key parameters such as drift, volatility and discounting may change, it may be optimal to make a lump sum dividend payment, namely when the threshold level jumps below the current level of the cash reserves. We presented a contraction algorithm for the computation of the optimal threshold levels. As a case study, we numerically investigated the parameter sensitivities of the levels in the case of two regimes. It would be desirable to systematically explore the dependence of the optimal threshold levels on key parameters and its financial significance, which could be achieved by an analytical investigation of its form in specific parametric models; this is a topic left for future research.

Appendix: Proofs
A.1 Proof of the bounds (Proposition 2.1) To prove the upper and lower bounds in (2.7), we consider two auxiliary optimal switching problems where not only the dividend payout, but also the regime is a control variable. An admissible switching strategy σ = {Z σ t , t ≥ 0} is an F-adapted E-valued process that indicates the current regime. The two control problems are then given by where D − denotes the constant barrier strategy at level b − (where b − denotes the optimal barrier corresponding to V − ), S and D are the sets of all admissible switching and dividend strategies, Λ σ s = s 0 c(Z σ u ) du, and τ σ is the corresponding ruin time. As the regime-switching process Z is one particular admissible switching strategy, the upper and lower bounds in (2.7) will follow once we have shown that v + ≤ V + and v − ≥ V − .
In the proof, we use the following sub-and super-harmonicity properties: Proof Since V + is the value function corresponding to the optimal dividend problem without regime switching and with volatility, drift and discounting given by 1, max i∈E where M σ is some local martingale which is a supermartingale as it is bounded below. Taking note of Lemma A.1 and the facts that V + ≥ 1 and V + (0) = 0, it follows by rearranging and taking expectations that Subsequently taking the supremum in (A.3) over all σ ∈ S and D ∈ D shows that . By a similar line of reasoning, it can be verified that is valid for all x ≥ 0, and since V χ ≤ V , the proof of (2.7) is complete.

A.2 The dynamic programming equation (Proposition 2.2)
The proof is an adaptation of a classical line of reasoning to a regime-switching setting. We start with the following two lemmas.
where θ i = c i − q ii In particular, it follows that V (·, i) is Lipschitz-continuous.
Proof Let > 0 and let D(u, i) be an -optimal strategy for U 0 = u, Z 0 = i, and consider the strategies D t (u, y) = (u − y)1 {t=0} + D t (y, i)1 {t>0} ("pay a lump sum u − y and follow then the strategy D(y, i)") andD t (u, x) = 1 {t>τ (x),Z τ (x) =i} D(x, i) for x ≥ u ≥ y ("wait until the first time τ (x) that the reserves reach the level x; if no regime switch has occurred by then, follow the strategy D(x, i), otherwise don't pay any dividends"). Then it follows that . Letting → 0, the bounds follow. Let D i,j be -optimal strategies corresponding to U 0 = x (j ) and Z 0 = i, that is, (j ) , i) < , and define the strategy D depending on U 0 = x and Z 0 = i as "pay a lump sum (x − x (j * ) ) and follow then the strategy D i,j * ", where j * = max{j : x (j ) ≤ x}, i.e., Then it follows that As this estimate holds for arbitrary x ≥ 0 and i ∈ E, the proof is complete.
Proof of Proposition 2.2 Denote by w the right-hand side of (2.8) and by D ∈ D and U an arbitrary admissible strategy and the corresponding cash reserves. To show that V ≤ w, we verify that V D ≤ w; indeed, To prove the opposite bound w ≤ V , we show that for given > 0 and D ∈ D, there exists a strategy D( ) ∈ D such that w D ≤ V D( ) + const , where w D denotes the expectation in (2.8 , which is finite in view of Proposition 2.1. A.3 The optimal levels (Lemma 4.5) Proof of Lemma 4.5 By straightforward calculus, it can be verified that its derivative is given by Since it also holds that Proof When 0 < b ı < x < b j , it follows immediately from (3.2) that whose general solution is of the form for any k 1 , k 2 ∈ R, since the quadratic characteristic equation F j (λ) = 0 of its corresponding homogeneous equation has two roots λ j 1 and λ j 2 , and its particular solution is obviously k 3 x + k 4 , where k 3 = q jj q jj − c j and k 4 = q jj (q jj − c j )(V (b ı , ı) − b ı ) − μ j q jj (q jj − c j ) 2 .
Using the boundary conditions ∂V (x,j) ∂x | x=b j = 1 and ∂ 2 V (x,j) ∂ 2 x | x=b j = 0 then yields The proof is complete.
Proof of Proposition 5.1 In view of Lemma A.4, to complete the proof, it remains to derive the system. For 0 < x < b ı < b j , it follows from (3.2) that V (x, ı) satisfies a fourth order linear homogeneous ordinary differential equation with the characteristic equation F 0 (λ)F 1 (λ) − q 00 q 11 = 0 having four real roots λ 1 < λ 2 < λ 3 < λ 4 . Thus, for 0 < x < b ı , V (x, ı) and V (x, j) can be, respectively, expressed as V (x, ı) = in terms of these coefficients, we arrive at the matrix equation Ad = h. The equations for the optimal levels follow since V (x, j) is C 2 at b ı (noting that any two of the three equations implies the third one).
Proof From the first two equations of the system Ad = h and in view of μ ı > 0 and λ 4 > λ 3 > 0, we can express the coefficients d 3 and d 4 in terms of d 1 and d 2 by Substituting this into (A.5) and (A.6) yields The last two equations of Ad = h can then be rewritten as according to Cramér's rule (as G = 0).
Reasoning as in the proof of Proposition 5.1, we can subsequently derive the expressions in Proposition 5.3.