An optimal reinsurance problem in the Cramér–Lundberg model

In this article we consider the surplus process of an insurance company within the Cramér–Lundberg framework with the intention of controlling its performance by means of dynamic reinsurance. Our aim is to find a general dynamic reinsurance strategy that maximizes the expected discounted surplus level integrated over time. Using analytical methods we identify the value function as a particular solution to the associated Hamilton–Jacobi–Bellman equation. This approach leads to an implementable numerical method for approximating the value function and optimal reinsurance strategy. Furthermore we give some examples illustrating the applicability of this method for proportional and XL-reinsurance treaties.


Introduction
The determination of optimal insurance contracts is a classical topic in insurance mathematics. The first results are stated in a static utility theoretic framework and concern the relation between a risk facing individual and the insurer. The goal is the construction of an optimal insurance arrangement for the first party with a certain constraint stemming from the second party. Classical contributions in this context are Kenneth (1973), Raviv (1979) and Borch (1974), where one finds a collection of pioneering articles. A more recent paper by Guerra and Centeno (2008) studies this problem for exponential utility and provides the link to the maximization of the so-called adjustment coefficient which is the decay rate of the ruin probability for increasing initial capital. The idea of using reinsurance for maximizing the adjustment coefficient was introduced by Waters (1983), further studied by Centeno (1986Centeno ( , 2002 and Schmidli and Hald (2004), and can be considered as the motivation for studying optimal reinsurance.
The first paper to study dynamic optimal reinsurance in the classical risk model for the minimization of the ruin probability is Schmidli (2001), who dealt with the case of proportional reinsurance treaties. This approach was extended to excess of loss contracts by Hipp and Vogt (2003). A general presentation on ruin probability minimization by means of reinsurance in the classical and diffusion risk model can be found in Schmidli (2008). Furthermore, this reference provides some asymptotic studies of the behaviour of optimal strategies, which in certain situations coincide with the ones maximizing the adjustment coefficient. Some additional results with a focus on non-proportional reinsurance contracts are given in Hipp and Taksar (2010).
Using a different criterion to assess the performance of an insurance portfolio, Eisenberg (2010) thoroughly covers a variety of capital injection minimization problems under both the classical risk model and its diffusion approximation where the insurer has the possibility to dynamically reinsure its risk. The incorporation of dynamic reinsurance to the classical problem of maximizing the dividend payouts of an insurance company prior to ruin in a compound Poisson framework was treated by Azcue and Muler (2005) for general reinsurance schemes and by Mnif and Sulem (2005) for excess of loss reinsurance. In a diffusion setting, the corresponding problem was studied by Højgaard and Taksar (1999) in the case of proportional reinsurance. Combining dividend pay-outs maximization with proportional risk exposure reduction, Schäl (1998) formulated a piecewise deterministic Markov model where only jumps but not the deterministic flow can be controlled. In contrast to the aforementioned references which deal with optimal reinsurance for continuous time risk processes, Schäl (2004) investigates a discrete time insurance model controlled by reinsurance and investments in a financial market with the intention to either maximize the expected exponential utility or minimize the ruin probability. An analogous problem was treated by Irgens and Paulsen (2004), where the authors examine the purpose of maximizing the expected utility of terminal wealth by use of optimal investment and reinsurance.
Finally, we would like to mention a new approach linking ruin theoretical concepts with the framework of worst-case optimization theory explored by Korn et al. (2012). Embedded in a differential game setup, the authors applied a worst-case scenario approach to maximize the expected utility of the surplus of an insurance company at some given deterministic terminal time by dynamic proportional reinsurance.
In this contribution, we will study the use of dynamic reinsurance for maximizing a particular economic performance measure which for a diffusion risk model was introduced by Højgaard and Taksar (1998a, b).
For its definition, let X u = (X u t ) t≥0 be a surplus process comprising a reinsurance strategy u. The performance measure of this particular strategy is defined by where δ > 0 denotes a discount or preference rate and τ u is the time of ruin of X u . In Taksar (2000) this measure is motivated by the following arguments: the surplus of the insurance company is kept on a bank account and interest gains are immediately distributed as dividends, thus maximizing expected discounted dividend payments is equivalent to maximizing (1). Another way to motivate this value function in a Markovian environment is to introduce a random life time S ∼ E x p(δ) which is independent of all other model ingredients. Then one observes which tells that the performance measure is proportional to the expected surplus at a random exponential time S. This means that a dynamic reinsurance strategy is used for maximizing the surplus at some exogenous point in time. Cost functions of the form (1), or more generally involving a running costs function l(X t ), are also studied by Cai et al. (2009) in an uncontrolled piecewise-deterministic compound Poisson environment. The structure of the manuscript is as follows. In Sect. 2, we give a precise mathematical formulation of the problem, introducing the controlled surplus process and the value function. The analytical characterization of the value function is presented in Sect. 3. It starts with a collection of basic properties and employs the dynamic programming approach for achieving a final statement. Section 4 includes some comments on the numerical procedure obtained from the analytical results and two illustrative examples. Finally, a conclusion is stated in Sect. 5.

Problem statement
In the sequel, we will always work on a probability space (Ω, F, P) which carries all stochastic quantities to be defined in the following. In the Cramér-Lundberg model (also known as compound Poisson model or classical risk model), the surplus process X = (X t ) t≥0 of a homogeneous insurance portfolio is modeled as Starting with an initial deterministic surplus X 0 = x ≥ 0, the surplus process increases linearly due to premiums that are collected continuously over time at a constant rate c > 0. On the other hand, it decreases due to claims happening at the arrival times of a homogeneous Poisson process N = (N t ) t≥0 with intensity λ > 0. The claims {Y i } i∈N constitute a sequence of positive independently and identically distributed random variables with a density function f Y (·) and finite mean μ. Later on we will use Y as a representative random variable from this distribution. In addition, the sequence {Y i } i∈N and N are assumed to be independent. The flow of information is given by the filtration {F t } t≥0 which is generated by the surplus process X . In the remainder of the manuscript, we will use the symbol E for the expectation with respect to the probability measure P, for the conditional expectation E(· | X 0 = x) we will use the expression E x . Fundamental quantities in this framework are the time of ruin and the probability of ruin for initial capital x ≥ 0. In some of the proofs below we will compare pathwise, i.e., we fix an ω ∈ Ω, processes starting at different initial values x and y. Therefore it will be necessary to add the initial value in the definition of the time of ruin, for example E x (X τ y ) denotes the expected value of the surplus started at x stopped at the time of ruin as if the surplus would have started in y (x > y) (thinking along the same path). Certainly, we have, using θ = inf{t ≥ 0 | X t < x − y}, E x (X τ y ) = E x (X θ ), but we believe that out of the context our notation will be more intuitive. It is well known, that for avoiding almost sure ruin, it is necessary to choose a premium intensity fulfilling the net-profit condition c > λμ. Therefore, based on the expected value premium principle we set c = (1 + η)λμ with a safety loading η > 0. For further details on classical problems in risk theory and related topics we refer to Asmussen and Albrecher (2010).
Assume now that in order to reduce the risk exposure of the portfolio, the insurer (cedent) has the possibility to take reinsurance in a dynamic way. Namely, at each time t, the insurer transfers a portion of the premium income to a reinsurer, who in turn commits to cover a part of the occurred claims. The dynamic reinsurance setup we are going to use follows the presentation from Schmidli (2008).
Formally, a reinsurance scheme is given by a monotone increasing function r : [0, ∞) → [0, ∞) which fulfills 0 ≤ r (y) ≤ y. Then r is the retention function with the meaning that for a claim of size Y , the amount r (Y ) is paid by the insurer and Y − r (Y ) is taken by the reinsurer. For introducing a control possibility a family of available schemata R is parameterized by a control parameter u from a compact set U . This means that for u ∈ U the chosen reinsurance contract is given by r (·, u) ∈ R, where r : [0, ∞) × U → R + with 0 ≤ r (y, u) ≤ y. In addition we assume that r (y, u) is continuous in both arguments. After fixing the family R, the set of available reinsurance schemes is given by R = {r (·, u) ∈ R |u ∈ U , 0 ≤ r (y, u) ≤ y, r continuous, and increasing in y}.
For later use we denote by ρ(y, u) the generalized inverse of r (y, u) in the y−component, which due to monotonicity exists. Naturally, when employing reinsurance there are premiums to be paid. We assume that the reinsurer uses a deterministic premium function π : L 1 (Ω, P) → [0, ∞), such that when fixing u ∈ U the premium is based on π(Y − r (Y, u)). From an aggregated risk perspective, if the insurer chooses reinsurance u ∈ U at time t, the premium at rate λ π(Y − r (Y, u)) is paid to the reinsurer. Consequently the premium income of the insurer reduces to c(u) = c − λ π(Y − r (Y, u)). In the sequel, we shall always assume that c(u) is continuous and that full reinsurance leads to a negative premium income, i.e., c < λπ(Y ).
The premium function π may be based on the expected value principle, where θ > η denotes the safety loading of the reinsurer, or on the variance principle, Possible concrete choices for R and U are the classical situations of proportional reinsurance and excess-of-loss reinsurance. In the first case we have r (y, u) = uy and u ∈ U = [0, 1], in the second case r (y, u) = min(y, u) and u ∈ U = [0, ∞]. Notice, that in the latter case, an infinite retention level is equivalent to no reinsurance. In the following we will restrict the set of control parameters to the set U = {u ∈ U | c(u) ≥ 0} for avoiding a negative premium rate. Since U is supposed to be compact and c(·) is continuous we have that U is compact.

Remark 1
The idea of a dynamic reinsurance strategy can be explained as follows. At each time instant t, the insurer chooses a control parameter u = u t ∈ U which specifies a reinsurance scheme r (·, u) from an available set of schemes. The choice of u simultaneously determines the extent to which the insurer wants to reduce its risk exposure and the additional cost this protection incurs, taking the form of a reinsurance premium. Namely, if a claim occurs at time t, the insurer pays r (Y, u t ) and the reinsurer pays the rest, i.e. Y − r (Y, u t ). In exchange of this risk transfer, the insurer pays to the reinsurer a reinsurance premium at a rate λπ (Y − r (Y, u t )).
Let u = (u t ) t≥0 be a U-valued stochastic process which is {F t } t≥0 previsible and called a reinsurance strategy. Then the dynamics of the controlled surplus process X u = (X u t ) t≥0 are described by Remark 2 From Rogers and Williams (1994, p.182) we can deduce that the previsibility of u induces the fact that it is progressively measurable and thus also measurable as a function in time. Since the premium rate c(·) is assumed to be continuous and bounded by c, the integral t 0 c(u s ) ds exists at least in the Lebesgue sense. Because jumps of the process X u occur according to the fundamental Poisson process and behaves continuously between jump times, the process X u is right continuous with existing limits from the left, i.e., cádlág. Consequently, X u is progressively measurable as well and for fixed ω, X u (ω) is measurable in t. Again, integrals of the form t 0 X s ds certainly do exist in the Lebesgue sense.
The time of ruin τ u x denotes the time the controlled surplus process X u first becomes negative, From now one we call a stochastic process u = {u t } t≥0 admissible reinsurance strategy if it fulfills all the previously made assumptions. In this context the previsibility is crucial. That is, at claim time T i , the reinsurance parameter is chosen based on the information up to time T i −. The previsibility of the reinsurance strategy is a natural assumption in this setting, otherwise the insurer could change the reinsurance parameter to full reinsurance at the claim occurrence time. The reinsurer would then pay all claims while all premiums would be collected by the insurer. Let U denote the set of admissible reinsurance strategies. Associated to an admissible reinsurance strategy u and an initial reserve x ≥ 0, we define its performance criterion as the expected cumulative discounted surplus process until ruin, with δ > 0 a discount or preference rate. In the sequel, we will refer to V u (x) as the return function. The optimization problem then consists of finding the optimal return function, or value function, defined as and an optimal admissible reinsurance strategy u leading to the value function, i.e. a strategy which delivers the maximal return function (5).

Main results
In this section, we first derive some elementary bounds, which allow for a rough characterization of the value function. In a next step, we are able to prove the existence of a solution to an integro-differential equation which is closely related to the problem's Hamilton-Jacobi-Bellman equation. Finally, a verification argument provides the bridge between these analytical results and the stochastic optimization problem of interest.

Some elementary bounds
Proposition 1 For x ≥ 0, the value function V (x) admits the following bounds: Taking the supremum over all admissible strategies u shows that the value function V (x) satisfies inequality (a).
It remains now to validate inequality (b). The choice of the admissible strategy u 0 which corresponds to buying continuously full reinsurance until the time of ruin leads to a deterministic reserve X u 0 with negative drift. As a consequence, the time of ruin τ u 0 x can be explicitly computed, that is, The following result presents bounds on increments of the value function and also provides its continuity.
Proposition 2 For x > y ≥ 0, the value function satisfies: For given x > 0 and given > 0, consider an admissible -optimal strategy u such that Since u is also admissible for initial capital y with x > y ≥ 0 (up to time τ u y ), we have where E x , E y indicate the starting value of the corresponding process. Now we are going to use a pathwise argument, let Notice that on E the paths (for fixed ω) of the reserves started in x and y move parallel with a distance x − y > 0 and get ruined at the same point in time. Therefore, we can rewrite the above inequality in the following way, The first inequality is just a restatement of (6). It incorporates the fact that the two values, the values of the strategy u for surplus processes started in x and y, only differ on E c . This difference is given by the third expectation, in which E x indicates that the surplus within the integral is started at x. The second inequality follows from the observation that The last inequality uses X u τ u y ≤ x − y for the reserve started in x and that consequently the corresponding expectation is smaller than Let us now prove inequality (b). Let y ≥ 0 and > 0 be given, consider an admissible strategyū such that Vū(y) + ≥ V (y). For x > y, we have, Again, let E = {τū x = τū y } and let T 1 be the time of the first claim occurrence. We can write From the arbitrariness of > 0, we get the result.
Additionally, we can derive the following.

Lemma 1 The value function V is locally Lipschitz continuous.
Proof For given x > 0 and > 0, consider an admissible strategy u = (u x t ) t≥0 such that Finally, after explicitly evaluating the last estimate we derive for x > y ≥ 0, This implies that V is locally Lipschitz continuous.
Finally, we can summarize the following elementary properties of the value function V (x). Notice that absolute continuity follows from the local Lipschitz continuity mentioned in the previous Lemma.

Corollary 1 The value function V is strictly positive, linearly bounded, monotone increasing and absolutely continuous.
Remark 3 Suppose we assume in the proof of part (a) of Proposition 2, that for all u ∈ U the random variable r (Y, u) admits a bounded density f u r . Then, we can formally derive we get from (7) that the value function is globally Lipschitz continuous. For example, this case appears when dealing with proportional reinsurance.
For further investigations, we need to improve on the lower bound from Proposition 1. When dealing with a contraction operator later on, the refined bound will allow us to describe the growth behaviour of the value function in a more precise way.
We start with showing that for is the infinitesimal generator of the uncontrolled process X . For that purpose, we define which can be rewritten as

Lemma 2 The value function V is bounded from below by
Proof Since g(x) is differentiable we can apply Dynkin's formula and get From above, we already know that Lg(X s ) − δg(X s ) ≥ −X s + c−λμ δ , using this estimate, we arrive at, where T 1 denotes the time of the first claim occurrence. Using linear boundedness of g(X t∧τ ) in t and monotone convergence, we arrive at From its definition, we get Remark 4

Characterization of the value function
Based on the elementary properties of the value function which are collected in Corollary 1, we can work out the dynamic programming approach for solving the optimization problem. We start with observing that V fulfills the dynamic programming principle, that is, for every F t -adapted stopping time S ≥ 0 the following relation is valid: The proof of this fact is mainly based on the continuity of V and follows standard arguments from the corresponding literature, see for instance the proof of Azcue and Muler (2014, Prop.2.3).
The following Lemma shows that at least in some weak sense V fulfills the associated Hamilton-Jacobi-Bellman equation. (5) is a.e. a solution to:

Lemma 3 The value function V defined in
Proof In a first step we show that (10) is smaller equal to zero. Fix x > 0, h > 0 and let u ∈ U. Defineũ = (u t ) t≥0 such that u t = u for t ∈ [0, h] and u t =ũ t−h for t > 0 for someũ ∈ U. If necessary, we choose h small enough such that x + c(u)h > 0. Let T 1 denote the time of the first claim occurrence and set S = min{T 1 , h}. Then, (9) yields Since u is a constant control which applies on the time horizon [0, S] we can apply Rolski et al. (1999, Th.11.2.2) and get that V ∈ D(A u ), i.e., V lies in the domain of the generator. In the present situation the generator A u of the constantly controlled process X u is given by The particular result from Rolski et al. (1999, Th.11.2.2) applies, because the map t → V (x + c(u)t) is absolutely continuous, the so-called active boundary is empty and the bounds from Proposition 1 and Proposition 2 guarantee the asked for integrability condition. Therefore we can apply Dynkin's formula, identifying V with the measurable density of V , and can rewrite (11) to After regrouping and division by h we have The integral in the first expectation can be interpreted in the Riemann sense, V is continuous, such that sending h → 0 leads to The second limitation procedure needs a bit more care since the integrands as functions in t are only measurable and the respective integral is interpreted in the Lebesgue sense. For this purpose consider where in the second equality we used Lebesgue's Differentiation Theorem from Wheeden and Zygmund (1977, Th.7.16) which applies since the measurable density V certainly is locally integrable in the Lebesgue sense because of the bounds on the function V and its increments. One may notice that since the ds integrand equals zero for s = 0. The choice of the control parameter u ∈ U was arbitrary, such that we have We can turn to the second step, showing that (10) is also larger or equal to zero. Set again S = min{T 1 , h} for some h > 0 and let the strategy u 1 = (u 1 t ) t≥0 be h 2 −optimal for the right hand side of (9), that is where we added the term h with some arbitrary ε > 0 for achieving strict positivity.
In the above equation we can use T 1 ∼ Exp(λ) and regroup a little bit to arrive at We kept E x since u 1 is still stochastic on the time interval under consideration. In the following we divide A, B, C, D by h and study the limits as h tends to zero -for interchanging limitation and expectation we will repeatedly make use of the dominated convergence Theorem. We start with discussing B: which follows from continuity of V . Next we deal with C: which is derived by an application of Wheeden and Zygmund (1977, Th.7.16). For part D we exploit a similar procedure together with the absolute continuity of V , Part A is resolved in the same way and delivers Finally we arrive at which concludes the proof since ε was arbitrary.
At this point, we know that the value function is in some sense a solution to the associated HJB-equation. What remains to be done for a complete analytical characterization is a complement on uniqueness. For accomplishing such a result we are going to rewrite (10) in a way similar Schmidli as (2008, p. 47) did, when transforming equation (2.14) into (2.15). Suppose x is meaningful in the sense that V (x) exists. Since the set U is compact and all corresponding terms are continuous in u, a maximizer u(x) exists such that the supremum equal to zero is attained. Replacing the sup u by u(x) in (10) we have from which we can observe, using the lower bound (8) on V (x), that c(u(x))V (x) > 0 ⇒ c(u(x)) > 0. Hence, in the supremum we can replace the set U by the set U = {u ∈ U | c(u) > 0}. Since V (x) is monotone, we can rewrite (10) into the equivalent form: Formally, we know that a.e. V (x) is a solution to (13). In addition, for x such that V (x) exists, we have the following, where , which can be used in (14), leading to Reinspecting (12) gives a positive lower bound on c(u(x)), where the last inequality is due to Lemma 2. Together with (15) we have As a consequence, we can redefine the crucial set for taking the supremum (resp. inf) U = {u ∈ U | c(u) ≥ L}. One may notice that in (13) the infimum is taken again over a compact set and that the denominator is uniformly bounded away from zero.
The first step towards a unique characterization of the value function is given in the following theorem the proof of which relies on the fixed point property of a certain operator (inspired by a similar approach used in Muler (2005, 2014)).
Theorem 1 Let f (0) > 0 be some given initial value, then there exists a unique a.e. differentiable solution to Proof Let x 0 ≥ 0 and a continuous function f : [0, x 0 ] → R be given. Fix h > 0 and set is defined on C and x ∈ [x 0 , x 0 + h] and clearly T g ∈ C. Since for all s ∈ [x 0 , x 0 + h] all terms involving u are continuous in it and the infimum is taken over a compact set, we know that a minimizer u(s) exists. Now let g 1 , g 2 ∈ C and u 1 (s), u 2 (s) be the corresponding minimizers, we get Interchanging the roles of g 1 and g 2 and choosing h = L 2(δ+2λ) we get, such that T is a contraction on C and that consequently an unique fixed point of it exists. Since h and the contraction factor do not depend on x 0 , we can iterate this procedure on the intervals [0, h], [h, 2h], . . .. Finally, we observe that these fixed points, on the end points of the intervals [k h, (k + 1) h] continuously pasted, induce an unique solution to (13) with given initial value f (0). By construction, this solution is absolutely continuous on R + , since one may alter the grid for the construction procedure.
We are now able to finalize the analytical characterization of V .
Theorem 2 Suppose g : R → R with g(x) = 0 for x < 0 is linearly bounded by x δ + c δ 2 and an absolutely continuous solution to (13), then g(x) = V (x). The optimal strategy u * = (u * t ) t≥0 is induced by the pointwise minimizer u(x) of (13) such that u * t = u(X u * t− ). Remark 5 One can use verbatim the proof from Schmidli (2008, Lem.2.12) to show that the function u defining the optimal strategy is measurable. Consequently the process (u * t ) t≥0 is previsible and constitutes an admissible strategy. Proof Let t > 0 and u = (u t ) t≥0 ∈ U, since the paths of (X u t ) t≥0 are of bounded variation, we can use the Stieltjes integral to obtain The process M = (M t ) t≥0 defined by is a zero-mean martingale, due to compensation. Therefore, taking expectations in (16) leads to Remember that for g (X u s ) we have (at least a.e.) which yields for the particular control parameter u s , From Schmidli (2008, Lem.2.9), we know that either ruin occurs or the controlled surplus tends (linearly bounded) to infinity. Therefore, using bounded convergence in (17) results in . One observes that in (17) we have equality for the strategy u * , defined in the statement of the theorem, such that finally V (x) = g(x).
The combination of the statement of the last theorem with the uniqueness result and the properties of the value function enables us to state a complete characterization.

Corollary 2
The value function V is the unique solution to (10) in the set of absolutely continuous function g : R → R with g(x) = 0 for x < 0 which are bounded by x δ + c δ 2 . In particular just the initial value V (0) for equation (13) allows for a solution g(x) with the property lim x→∞

Numerical examples
In this section, we will illustrate the theoretical results and sketch a numerical solution method by means of two examples. Furthermore, for the particular case of proportional reinsurance and a reinsurer using the expected value premium principle, we can refine the analytical results and state the asymptotic behaviour of the optimal strategy as the initial capital tends to infinity. Since an explicit solution to (10) is unfortunately out of reach, for deriving a solution one needs to rely on a numerical method. Luckily, the theoretical characterization stated in Theorem 2 and Corollary 2 constitutes an implementable procedure. These results tell that an iterated application of the operator T , defined in the proof of Theorem 1, on some linear function g(x) = x δ + g 0 leads to an approximation of the value function if and only if g 0 = V (0) is correctly chosen, cf. Corollary 2. Consequently, the first step in the procedure asks for a good guess of g 0 , which can (and needs) to be improved in later steps. For determining a meaningful approximation of g 0 , we exploit the idea of policy improvement, see for instance Bäuerle and Rieder (2011).
The starting point is the value V sr (x) corresponding to the situation of no reinsurance, which in our parameter setting can be explicitly determined. Based on this value V sr , we compute a strategy u 1 = {u 1 t } with u 1 t = u 1 (X u t ) from the HJB-equation (10) via In a next step we determine a good approximation for V u 1 (0), which can be done by using the Monte-Carlo method with direct simulations of the controlled surplus process from (4). Now we know that V u 1 (0) corresponds to an admissible strategy but does not necessarily equal V (0). But with V u 1 (0) at hand we can determine V u 1 (x) for x ≥ 0 either by an iteration of an operator, similar to T but without the infimum in its definition, or by a finite-difference method. We use this value V u 1 as the starting point of iterations of T . After a number of iterations, one can improve the initial value again by using the same method as illustrated above, but with the function obtained from the iterations as basis for the policy improvement step. This newly obtained value V u 2 then serves as the basis for new iterations of T .
Remark 6 Alternatively, one can execute a policy iteration procedure on the basis of the original HJB-equation (10). Our experience showed that the obtained strategies are very close to the ones determined via the first method. Unfortunately, the quality of the simultaneously generated return functions is not always trustworthy, a fact which originates from the presence of the control parameter in front of the sensitive derivative term and inside the integral. Nevertheless, the use of these strategies allows for a considerable acceleration of the whole procedure.
In this way we create, by the use of policy iterations at intermediate steps, an increasing sequence of initial values and also determine candidates for a fixed point of T . To decide whether an initial value is significantly too small one can check the behaviour of the function obtained from the corresponding iterations of T . If an initial value is far away from V (0) we observe a violation of the lower bound from Lemma 2 for relatively small values of x. We can accept an initial value V * (0) as a good guess for V (0) if the function V * obtained from iterations stays within the theoretically given bounds. If additionally V * matches the value of the implicitly given strategy, we can accept it as an valid approximation of the value function.
Remark 7 Instead of starting the iteration procedure always at predetermined values V u , we can also start with g(x) = x δ + V u (0) and all previously stated arguments still apply.
Our experience showed that this procedure leads to trustworthy results and representative illustrations of our theoretical findings. Certainly, a theoretical numerical analysis would be necessary and highly interesting but this is out of the scope of this publication.

Example: proportional reinsurance
In the following, we are going to use the model parameters given by: Gamma(2, γ ) distributed claim amounts. The insurer's premium rate is determined via the expected value principle and reads as c = (1+η)λμ with μ = 2 γ and η > 0. For the reinsurer, we assume the same premium principle but with a safety loading θ > η. The concrete numbers are given in Table 1.
The considered reinsurance schema is r (y, u) = u y for a control parameter u ∈ (u, 1] with u = inf{u ∈ [0, 1] | c(u) > 0}, as discussed before the statement of Theorem 1.
For deriving numerical approximations to the value function and to the optimal strategy, we implemented the program we have illustrated in the introduction to this section. In contrast to the case of excess of loss reinsurance, the proportional situation turned out to be numerically demanding, requiring lots of computational efforts for arriving at passably satisfying results.
The strategy obtained from 20 policy iterations steps, starting from V sr , is depicted in Fig. 1. In the remark following below, the shape of this strategy is discussed in some detail. Figure 2 contains the graphs of V sr (dotted line), V 1 (dashed line) and V 20 (full line). V 1 is computed from 30 iterations of T starting with g and an initial value g 0 = 212 corresponding to the strategy obtained from 1 policy improvement step based on V sr . The function V 20 is derived from 30 operator iterations, but using the initial value g 0 = 226.436 associated to the strategy from Fig. 1.
In Table 2, we present some exemplary function values from the iterations of T towards the computation of V 20 .
Remark 8 We would like to discuss lim x→∞ u * (x), which by the numerical computations is suggested to be one. Here, we exclusively deal with the case of proportional  reinsurance and the expected value premium principle for both insurer and reinsurer, c(u) = λμ(u(1 + θ) − (θ − η)) for safety loadings θ > η. From the definition of the value function, we have Above, we introduced the martingale M = (M t ) t≥0 which is the compensated compound Poisson process: Now, we can regard τ 0 e −δt M t dt pathwise as a Stieltjes integral and apply integration by parts, Wheeden and Zygmund (1977, Th.2.21), to arrive at In (18), the integral with respect to the martingale is itself a martingale, leading to the second equality. At the same time, using an ε−optimal strategy u * for initial capital x > 0, we have If we suppose that u * is a Markov control, then we certainly have that M * t = t 0 λμu * s ds − N t k=1 u * T k Y k is a zero mean F X t martingale and the same integration by parts procedure as before applies. Consequently, we have for x large such that τ and τ u * (M t is linearly bounded) are tending almost surely to infinity that: Now, we proceed with determining lim x→∞ u * (x). Here, u * (x) denotes the pointwise maximizer in u of the HJB-equation (10), which due to continuity exists. Plugging in c(u) = λμ(u(1 + θ) − (θ − η)) and regrouping, we see that yd F Y (y)+(δ + λ) ηλμ δ 2 − λ 2 μη δ 2 F(x/u * (x)) λμ δ (1 + θ) .
If we now assume that lim x→∞ u * (x) = u * exists, it should fulfill u * ≈ θ + u * 1 + θ , which can be fulfilled only if u * = 1. The two plots in Figs. 3 and 4 illustrate the sharp linear upper bound together with V (x) and f (x) = E x τ 0 e −δt X t dt for exponentially ν distributed claims and the following set of parameters given in Table 3.

Example: XL-reinsurance
As a second example, we consider the case of dynamic XL-reinsurance with E x p(ν) distributed claim amounts. The particular numbers chosen are close to the ones chosen by Hipp and Vogt (2003) and can be found in Table 4.
The numerically determined approximative optimal strategy is displayed in Fig. 5. The corresponding value function's numerical approximation (full line) is shown in   Remark 9 (Comparison with ruin probability minimization) When numerically determining the approximative optimal strategies, one observes some similarities but also differences to the situation of optimal dynamic reinsurance strategies for minimizing ruin probabilities, see Schmidli (2008, Ch. 2.3.1) and Hipp and Vogt (2003). In both situations, proportional and XL, the behaviour for small initial capital is similar, one finds that for some x 0 > 0 on [0, x 0 ], it is optimal to take no reinsurance. From that point on, a certain amount of reinsurance is bought. For larger x, the reinsurance choice is either returning to the no reinsurance case (proportional) or converging towards a constant level (XL).
Here, the proportional case is in contrast to the situation when minimizing the ruin probability. There, for small claims the optimal reinsurance choice converges to a finite value as x tends to infinity. This different behaviour may be explained by the underlying performance measure which in the present framework is profit orientated. Because of discounting, a ruin event late in time does not bother the insurer which implies that above a certain surplus level (large enough for having early ruin just with a low probability) one is focusing on the maximal drift and not buying reinsurance. The question: " why does the numerically optimal XL strategy behave differently?" is interesting as a future research project on its own. The answer to this question may be based on the comparison of solutions to integro-differential equations.

Conclusion
In this paper, we studied a dynamic optimal reinsurance problem which is derived from an economical valuation criterion in risk theory. An interplay between analytical and probabilistic arguments allowed us to characterize the associated value function and finally the theoretical results were complemented by numerical examples. Based on the alternative interpretation of the studied value function, which is given in (2), we can state, that our results suggest that reinsurance can accelerate the process of building up a free reserve and that the use of reinsurance is beneficial in the economical context.