Conditional tail moment and reinsurance premium estimation under random right censoring

We propose an estimator of the conditional tail moment (CTM) when the data are subject to random censorship. The variable of main interest and the censoring variable both follow a Pareto-type distribution. We establish the asymptotic properties of our estimator and discuss bias-reduction. Then, the CTM is used to estimate, in case of censorship, the premium principle for excess-of-loss rein-surance. The ﬁnite sample properties of the proposed estimators are investigated with a simulation study and we illustrate their practical applicability on a dataset of motor third party liability insurance.


Introduction
The estimation of tail parameters is crucial in many fields where extreme events with possible catastrophic impacts can happen occasionally.In these contexts, Extreme Value Theory (EVT) appears to be the natural tool for modelling this type of events.However, the sample on which the estimation procedure has to be done can be censored.This problem of censoring, where only partial information on a random variable is available, typically that it exceeds a given threshold, is common in many scientific disciplines.For instance, when studying advanced age mortality, one often has that some individuals of a birth cohort are still alive at the time of follow-up, meaning that only a lower bound for their actual lifetime is available.This motivates the study of EVT in the censoring framework, a topic originally considered in the literature by Beirlant et al. (2007) and Einmahl et al. (2008) in all the domains of attraction.However, most recent publications on EVT in the censoring context are concerned with heavy-tailed distributions, motivated by applications in insurance where accurate modelling the upper tail of the claim size distribution is crucial for the risk management but can be difficult since long developments of claims are encountered.This implies that at the evaluation of a portfolio some claims might not be completely settled, and hence they are only known to exceed what has already been paid at the evaluation time.Concretely, this means that the real payments are right censored.We refer for instance to Beirlant et al. (2016Beirlant et al. ( , 2018Beirlant et al. ( , 2019)), Bladt et al. (2021) or Worms and Worms (2014, 2016, 2018), among others.
In this paper, we will also focus on heavy-tailed distributions, i.e., we will assume that our variable of interest X has a distribution function F X which satisfies where γ X > 0 is called the extreme value index of X, and X (.) is a slowly varying function at infinity, i.e., a function which satisfies lim x→∞ X (λ x) X (x) = 1, for all λ > 0.
This variable X is censored by another random variable, say Y , independent of X, which has also a heavy-tailed distribution F Y satisfying with γ Y > 0 and Y (.) again a slowly varying function at infinity.In the censoring framework, only Z := min(X, Y ) is observed together with the indicator δ := 1 {X≤Y } which specifies whether or not X has been observed.Given a sample (Z i , δ i ), 1 ≤ i ≤ n, of independent copies of (Z, δ), our aim is to make inference on the right tail of the in practice unknown distribution function F X .This setup, where both the variable of main interest, X, and the censoring variable, Y , are heavy-tailed, and assumed independent is common in the extreme value literature, and was also assumed in, among others, Beirlant et al. (2007), Einmahl et al. (2008) and Bladt et al. (2021).The situation of dependent right censoring in an extreme value context was, to the best of our knowledge, only studied by Stupfler (2019), and this approach requires an assumption on the dependence between X and Y .
In particular, first interest is in the estimation of γ X , despite the fact that the X−sample was not observed.To understand how to proceed, first remark that, if F Z denotes the distribution function of Z, then (1) and (2) imply that where γ Z := γ X γ Y γ X +γ Y , and Z := X Y .That means that the distribution function of Z is also heavy-tailed.Thus, if we turn a blind eye by ignoring the censorship and decide simply to use a classical estimator of the extreme value index such as the Hill estimator (Hill, 1975) based on the Z−sample and defined as where Z i,n , 1 ≤ i ≤ n, denote the order statistics associated to the Z−sample and k the number of random variables taken into account in the estimation procedure, then γ estimates the extreme value index γ Z of Z instead of the one of interest, γ X , of X.Since γ Z ≤ γ X , by using γ (H) k , one underestimates the risk if one does not take censoring into consideration.To resolve this problem, Beirlant et al. (2007) proposed to divide the estimator (4) by the proportion of non-censored observations in the k largest Z's, i.e., they introduced the adjusted estimator where δ Since the estimation of extreme value index in the censoring framework is nowadays well-known in the literature, we will assume in the sequel that we have at our disposal an extreme value index estimator adjusted to censoring, denoted γ (c) k , for which the following convergence in distribution holds under suitable conditions.Here Γ follows a normal distribution, with a mean and variance depending on the estimator of the extreme value index used.We refer to Section 2 for specific examples.
Our first aim will be then the estimation of the conditional tail moment (CTM) for a positive random variable X, defined for some ζ > 0 as ) is the tail quantile function of X, and when X is censored.This corresponds, for an insurance company, to the case where some claims can be considered as open in the sense that the company does not know yet the full extent of the loss.Then, using the CTM estimator adapted to censoring, we will look at some well-known risk measures, such as the premium for excess-of-loss reinsurance.By using different values of ζ, we will be able in particular to recover the mean and variance of the payment by the reinsurer.Note that this topic has been recently considered in Goegebeur et al. (2022) in case of no-censoring.
The remainder of the paper is organized as follows.In Section 2, we introduce an estimator for the conditional tail moment adapted to censoring and we establish its main asymptotic properties.Then, in Section 3, we look at the premium principle for excess-of-loss reinsurance by providing estimates for the mean and variance of the payment by the reinsurer in case of censoring and we establish their asymptotic behaviors.Finally, Section 4 is devoted to a simulation study, whereas Section 5 illustrates the performance of the estimators on a real dataset from motor third party liability insurance.All the proofs are postponed to the Supplementary Information, which contains also additional simulation results.

Estimation of the conditional tail moment in case of censoring
We assume that X and Y follow a second order Pareto-type model.Let RV ψ denote the class of regularly varying functions at infinity with index ψ ∈ R, i.e., positive measurable functions f satisfying f (tx)/f (t) → x ψ , as t → ∞, for all x > 0. In the below, • denotes either X or Y .

Assumption (D)
The survival function of X and Y satisfies where Clearly, the associated tail quantile function U • (•) satisfies where ), and thus |a Note that the survival function of Z also satisfies the form of Assumption (D) with A γ X +γ Y and |δ Z | being regularly varying with index −β Z , where β Z := min(β X , β Y ), and hence U Z will satisfy (7).
We can now expand the CTM under our Pareto-type model on X.
Proposition 1 If X satisfies Assumption (D) with γ X < 1/ζ, then for p ↓ 0, we have Note that Proposition 1 is a reformulation of Theorem 2.1 in Goegebeur et al. (2022) under our new Assumption (D) instead of their second order condition.
As a result, a natural estimator for θ p,ζ in the case of censoring is given by where is an extreme quantile estimator for U X 1 p adapted to censoring.For the latter, we can use a Weissman-type estimator (see Weissman, 1978) defined as where F n (•) denotes the Kaplan-Meier product-limit estimator (Kaplan and Meier, 1958), defined as As a preliminary result, we will prove the convergence in distribution of Theorem 1 Assume (D) with F X continuous and that the convergence (6) holds.Then for k, n → ∞, with k/n → 0 and √ kδ X (U Z (n/k)) → λ ∈ R, and p satisfying Note that the assumption log(k/(np (γ in the proof of Theorem 1 in the Supplementary Information.
We have now all the ingredients to prove the convergence in distribution of θ Theorem 2 Under the same assumptions as in Theorem 1, for γ X < 1/ζ, we have Note that, in case of no-censoring, this estimator is different from the one proposed in Goegebeur et al. (2022) but has the same limiting distribution under a weaker condition on the tail index, that is γ 9) and ( 10), then we can make explicit the asymptotic bias and variance of the CTM estimator, denoted in that case by θ Moreover, the function x → δ • (x) is monotone for x large enough.
Note that assumption (11) corresponds to the well-known Hall and Welsh (1985) model often used in the literature, for instance in Beirlant et al. (2016). where Since the estimator for θ p,ζ , after normalisation, inherits the limiting distribution of the estimator that was used for γ X ,we have that the use of γ where and, for s < 0, The limiting distribution of γ (BC,c) k , after proper normalisation, is given in Theorem 1 of Beirlant et al. (2016), and can be used to obtain the following corollary about θ . This bias-corrected estimator for CTM will be denoted as θ Note that the asymptotic variance of the bias-corrected estimator θ is larger than the one of the uncorrected estimator θ (H,c) p,ζ , but this is expected since bias-reduction implies often an increase in the variance.Additionally, remark that the limiting distribution for γ 2016) is derived for the case where β Z is known.In practice the true β Z is typically unknown, and for this case Beirlant et al. (2016) propose to replace k , where ρ Z < 0 is the usual second order parameter in extreme value statistics (in our context ρ Z = −β Z γ Z ), and where ρ Z is fixed by the user, either at the correct value or mis-specified.In the framework of bias-corrected estimation it is quite common to fix the second order rate parameter at some value, see, e.g., Feuerverger and Hall (1999), Gomes and Martins (2004), Dutang et al. (2014Dutang et al. ( , 2016)).Note that in case ρ Z is mis-specified then one typically loses theoretically the bias-correction (in the sense that the mean of the limiting normal distribution is no longer zero), but the bias-corrected estimators continue to perform well with respect to bias, and usually outperform estimators that are not corrected for bias; we refer to the simulation results.

Premium calculation in case of censoring
Our aim in this section is to look at the premium principle for excess-of-loss reinsurance.In particular, we want to estimate in case of censorship and when ζ = 1 or 2, the quantity where A + := max(A, 0).This study is motivated by applications in reinsurance, where X denotes the claim size, (X − U X (1/p)) + the payment by the reinsurer, which arises when the amount of the claim exceeds the retention level U X (1/p).Thus Π ζ (p) represents the mean and the second moment of the payment when ζ = 1 or 2, respectively.Since some of the claims are still open at the time of the study, i.e., their specific amounts are not yet known, it is crucial to construct estimators which take this censoring into account.
According to Theorem 3.1 in Goegebeur et al. ( 2022), if X is of Paretotype with a continuous distribution function F X , we have the following links between the CTMs and Π ζ (p) for ζ = 1 or 2: from which the natural estimators adapted for censoring follow: From these, we can also introduce an estimator for the variance of the payment by the reinsurer, namely Theorem 3 Under the same assumptions as in Theorem 1, we have √ k Again, these estimators of Π ζ (p), for ζ = 1, 2, are different from those proposed in Goegebeur et al. (2022) but with similar limiting distributions, and weaker restrictions on γ X .This is a nice feature since it allows to construct confidence intervals for Π ζ (p) for a wide range of values of γ X .
As before, if we set γ 9) and ( 10 A similar result can also be obtained for the bias-corrected version, denoted by Π Corollary 4 Under the same assumptions as in Corollary 2, we have for

Simulation study
In this section we evaluate the finite sample performance of the proposed estimator with a simulation experiment.We simulate from the Burr(η, λ, τ ) distribution with distribution function where η, λ, τ > 0. This distribution function satisfies Assumption (D H ) with γ = 1/(λτ ), β = τ .We consider X ∼ Burr(1, 1, 4), giving γ X = 0.25 and β X = 4.We choose the distribution of Y to obtain a range of asymptotic censoring proportions as follows: The simulation results for some other distributions and parameter settings can be found in the Supplementary Information.
Next, we illustrate the selection of k for θ  where The resulting value for k will be denoted as k.Note that µ k is based on the first line of µ in (12).This is motivated as follows.The unknown parameters in the above AM SE expression clearly need to be estimated in practice, see below.
By doing so, it is unlikely that the estimates for β X and β Y will be equal, so we ignore the last line for µ in (12).The second line in (12) is not useful for determining k, as the AM SE would then only depend on the variance component, leading to a selection of the largest possible value of k.Whence our proposal to use the first line in (12).Note that this might imply that we incorrectly use the first line of ( 12) while we should have used the second line.
In such situations the value of k will be chosen too small, though with still an acceptable performance.As an alternative, we could also determine k by minimising the AMSE of γ , given by AM SE( γ where µ k is as above.The resulting value for k is denoted by k.As mentioned, the above AM SE expressions depend on unknown parameters which need to be estimated from the data.For these we proceed as follows: • γ X is estimated by γ (BC,c) k , • The usual second order parameter ρ X in extreme value statistics is given by ρ X = −β X γ X .We propose to use the canonical choice ρ X = −1, and to estimate β X as 1/ γ In Figure 6 we show the boxplots of θ and for the four distributions mentioned above.Overall, and as expected, estimation of the second conditional tail moment is more difficult than the estimation of the first moment, with more bias and variability.Also, θ ).Finally, as was indicated by Figure 1, estimation of θ 1/n,1 is very difficult for the dataset with 51% censoring.However, as can be seen from the boxplot in Figure 6, bottom right, combining θ (H,c) 1/n,1 (k) with the AMSE based procedure for determining k still leads to acceptable results, though the variability is large.

Application to insurance data
In this section we analyse a dataset from Motor Third Party Liability Insurance (MTPL) provided by a direct insurance company operating in the EU.The dataset contains the yearly payments of the insurance company, corrected for inflation, for claims by its policyholders over the period 1995-2010.The dataset consists of 837 claims, with about 60% of them being right censored (open) at 2010.We refer to Section 1. .Both the Hill and bias-corrected estimate indicate a value for γ X above 0.5 but below 1, which in view of the assumption γ X < 1/ζ indicates that expected values can be estimated but not second moments (and thus variances).According to these values for γ one can expect estimation results with large variability.The estimation of Π 1 (p), with p = 1/n as in the simulations, is illustrated in Figure 9  show less variability and are more or less stable for k in the range 25 till 150.Finally, we illustrate the construction of confidence intervals for Π 1 (p).These will be based on a log-scale version of Theorem 3, as suggested by Drees (2003) in the context of extreme quantile estimation, namely for which has the same limiting distribution as in Theorem 3.These approximate 100(1 − α)% confidence intervals are then given by where Φ −1 denotes the quantile function of the standard normal distribution and σ γ is an estimate of the standard deviation of Γ.In Figure 9 (right panel) we show the approximate 95% confidence intervals for Π 1 (p) with p = 1/n as a function of k in case the estimate is based on γ . The width of these confidence intervals fluctuates with k which is due to the variability of the random quantities appearing in the expression for the confidence interval, e.g., σ γ .Note that for this specific dataset the estimation is challenging due to the large values of γ, indicating that the second moment does not exist, as well as the large proportion of censoring.

Appendix B Proof of Theorem 1
We use the decomposition Concerning Q 1,k , according to Theorem 2 in Csörgő (1996), if F X is continuous, we have sup This implies that Remark that, according to Assumption (D) and ( 7), we have as soon as k n → 0 and k np → ∞.Combining (B1) with (B2), we deduce that Q 1,k = o P (1).Now, concerning Q 2,k , remark that, again according to Assumption (D) and ( 7), we have Consequently, using again (B2), we deduce that Q 2,k = o P (1) as soon as → λ ∈ R.This achieves the proof of Theorem 1.
we have where, b := − β Z γ Z 1+β Z γ Z λ Z (respectively σ 2 := γ 2 Z ) denotes the bias (respectively, the variance) of the classical Hill estimator γ H k defined in (4).Now, direct computations yield Now, using (7), we have for • being either X or Y and n large enough: Moreover, using again Proposition B.1.10 in de Haan and Ferreira (2006), we have for 0 < δ < β • and n large enough Consequently (D6) is satisfied according to (D4), with Concerning (D7), assume first that min(β X , β Y )γ Z > 1.Then, using the mean value theorem, for s ≤ t < 1 with s large enough, we have where C X , C Y > 0 (possibly different from line to line).Thus (D7) is clearly satisfied.Assume now that max(β X , β Y )γ Z < 1.Then, for s ≤ t < 1 with s large enough, using the monotonicity of δ by assumption.In case min(β X , β Y )γ Z < 1 < max(β X , β Y )γ Z , we can combine the two above ideas of proof to conclude that the convergence (D7) also holds.Combining (D8) with (D9), Corollary 1 holds.
Overall, the same conclusions can be drawn as those mentioned in the simulation section of the main paper.Note that Setting 1 is very difficult as the GPD has second order parameter ρ X = −γ X , so in our case ρ X = −0.3.It is well-known in extreme value statistics that ρ X close to zero is very challenging as the convergence rate in the second order condition is then slow, which leads typically to more biased estimators.Secondly, γ X = 0.3 is the largest value considered in our simulation setup, and clearly makes the estimation of second moments challenging for the given size of the dataset.
, under the following sub-model of (D): Assumption (D H ) The survival function of X and Y satisfies Assumption (D) with

((
H,c) k will lead to a potentially biased estimator for θ p,ζ .As an alternative, one could use a bias-corrected estimator for γ X , like the one proposed in Beirlant et al. (2016) when β Z is known, and given by γ
we show the sample mean (left) and sample mean squared error (MSE) (right) of θ (H,c) 1/n,1 (black solid line) and θ (BC,c) 1/n,1 (blue dashed line), computed over the 200 simulation runs as a function of k for 5% (first row), 10% (second row), 20% (third row) and 51% (fourth row) of censoring.Figures 2 till 5 are similar, but concern estimation of θ 1/n,2 , Π 1 (1/n), V (1/n) and θ 1/(1.5n),1, respectively.As for the estimators based on γ (BC,c) k , we follow Beirlant et al. (2016), where it was suggested to replace β Z by β Z := −ρ Z / γ (H) k , whereafter the estimators for θ p,ζ , Π 1 (p) and V (p) are computed for several values of ρ Z , and plotted as a function of k.The ρ Z -value that leads to the most stable sample paths is then the one to be used for estimation.In our settings ρ Z = −1.5 gave the most stable results for all the distributions and estimators under consideration.From the simulations we can draw the following conclusions: • The estimator for the CTM with ζ = 1 based on the Hill estimator for γ only works reasonably for small values of k compared to n, which is in line with the theoretical results.For increasing values of k the estimators diverge from the true value and hence show a considerable bias.For the CTM with ζ = 2 the Hill based estimator performs poorly with a large bias, even at small values of k. • On the other hand, the estimator for the CTM based on the bias-corrected estimator of Beirlant et al. (2016) for γ shows an average which is quite stable and close to the true value θ p,ζ for a wide range of values for k.This confirms again our theoretical results which give that θ (c) p,ζ , after normalisation, inherits the limiting distribution of the γ-estimator, so if one uses a bias-corrected estimator for γ then this bias-correction is carried over to θ (c) p,ζ .Also, θ (BC,c) p,ζ has a minimum value of the MSE which is typically lower than the minimum value of the MSE for θ (H,c) p,ζ .Moreover, the bias-corrected estimator has the advantage that the MSE values stay close to the value of the minimum MSE and this for a wide range of values for k.For practical use of the estimators, the stable sample paths of θ (BC,c) p,ζ make the typical issue of choosing k properly less critical, unlike the situation of θ (H,c) p,ζ where proper choice of k is quite crucial.• The conclusions above for the CTM also hold for Π (c) 1 (1/n) and V (c) (1/n).The estimators for Π 1 (1/n) and V (1/n) based on γ (BC,c) k perform better than those based on γ (H,c) k .
. In order to make the dependence of θ (H,c) p,ζ on k explicit, we introduce the notation θ (H,c) p,ζ (k).Obviously, small values of k lead to a large variance of the estimator, while a too large k leads to a bias issue.Hence, it is natural to determine k by minimizing an approximation to the asymptotic mean squared error (AM SE) of θ (H,c) p,ζ (k), which, based on Corollary 1, is given by
estimated by the fraction of non-censored observations in the top k order statistics, i.e., by d k , • γ Z and δ Z (U Z (n/k)) are estimated by fitting the extended Pareto distribution to the relative excesses of the Z-data, see Beirlant et al. (2009).
,ζ ( k) performs better than θ (H,c) 1/n,ζ ( k), which is also expected as k minimizes (an estimate of) AM SE θ (H,c) p,ζ (k) , while k minimizes AM SE( γ (H,c) k 3.1 inAlbrecher et al. (2017) for more details about this dataset.In Figure7, left panel, we show the log-claim sizes in chronological order, where the closed claims are depicted in blue and the open claims are in red.As expected, the more recently arrived claims show more censoring than older claims.In order to evaluate the assumption that the underlying distribution of the claim sizes is of Pareto-type we make an adapted Pareto QQ plot in case of right censored data, which is based on the Kaplan-Meier estimator, see Figure7, right panel.This Pareto QQ plot shows a clear linear pattern in the largest observations which confirms an underlying Paretotype distribution.We refer toBeirlant et al. (2007) for more details on the construction and interpretation of such QQ plots.InBladt et al. (2020) it was argued that the assumption of random right censoring is adequate for these data.Figure8shows the Hill estimate γ (H,c) k (black line) and the bias-corrected estimate γ (BC,c) k (blue line) for γ X as a function of k.The Hill estimate is only stable for the smaller values of k, say for k between 50 and 100, while the biascorrected estimate is stable for k up to 400.For the values of k where γ (H,c) k is stable it agrees quite well with γ (BC,c) k (left panel), where the estimates based on γ (H,c) k are in the black solid line and those based on γ (BC,c) k are in the blue dashed line.The estimates based on γ (H,c) k are very variable and show hardly any stable part, while the premium estimates based on γ (BC,c) k

Fig. 8
Fig. 8 Insurance data.γ (H,c) k (black line) and γ (BC,c) k (blue line) as a function of k.

−
)) , where q Z n−k,n is order statistic n − k in a random sample of size n from the unit Pareto distribution.Since k n q Z n−k,n P → 1 (see, e.g.Corollary 2.2.2 in de Haan and Ferreira, 2006) and by using the uniform convergence property of regularly varying functions (Theorem B.1.4 in de Haan and Ferreira, 2006) we have that 1