Abstract
Expectiles induce a lawinvariant risk measure that has recently gained popularity in actuarial and financial risk management applications. Unlike quantiles or the quantilebased Expected Shortfall, the expectile risk measure is coherent and elicitable. The estimation of extreme expectiles in the heavytailed framework, which is reasonable for extreme financial or actuarial risk management, is not without difficulties; currently available estimators of extreme expectiles are typically biased and hence may show poor finitesample performance even in fairly large samples. We focus here on the construction of biasreduced extreme expectile estimators for heavytailed distributions. The rationale for our construction hinges on a careful investigation of the asymptotic proportionality relationship between extreme expectiles and their quantile counterparts, as well as of the extrapolation formula motivated by the heavytailed context. We accurately quantify and estimate the bias incurred by the use of these relationships when constructing extreme expectile estimators. This motivates the introduction of classes of biasreduced estimators whose asymptotic properties are rigorously shown, and whose finitesample properties are assessed on a simulation study and three samples of real data from economics, insurance and finance.
Introduction
The \(\alpha \)th expectile \(\xi _{\alpha }\) of an integrable random variable Y is defined as
where \(\eta _{\alpha }(u)=\alpha \mathbb {1}\{ u\le 0 \} u^2\) is the socalled expectile check function and \(\mathbb {1}\{ \cdot \}\) the indicator function. Expectiles are \(L^2\)analogues of quantiles, which are obtained by minimising asymmetrically weighted mean absolute deviations (Koenker and Bassett 1978):
where \(\rho _{\alpha }(u)=\alpha \mathbb {1}\{ u\le 0 \} u\) is the quantile check function. Expectiles, originally introduced by Newey and Powell (1987) in the context of testing for homoscedasticity and conditional symmetry of the error distribution in linear regression, are always uniquely defined by their convex optimisation problem unlike quantiles, for which uniqueness is only guaranteed if the underlying distribution function is strictly increasing. Expectiles satisfy
In particular, contrary to quantiles, expectiles are determined by tail expectations rather than tail probabilities.
Expectiles induce risk measures which have recently gained traction in the risk management context, for several axiomatic and practical reasons, including the fact that they induce the only risk measure, apart from the simple expectation, which is lawinvariant, coherent (Artzner et al. 1999) and elicitable (Gneiting 2011), see Bellini et al. (2014) and Ziegel (2016). As such, a natural backtesting methodology exists for expectiles, in the sense that a socalled strictly consistent functional exists for expectiles and provides a properly justified decisiontheoretic scoring function which allows one to rank expectile forecasts by their accuracy (see Theorem 10 in Gneiting 2011). Quantiles are indeed elicitable, but not coherent in general, and are often criticised for missing out on relevant information about distribution tails because their calculation only depends on the frequency of tail events and not on their sizes. The Expected Shortfall, meanwhile, takes into account the actual values of the risk variable on the tail event and is a coherent risk measure, but is not elicitable. In financial applications specifically, expectiles are linked through Formula (2) to the notion of gainloss ratio, which is well known in the literature on no good deal valuation in incomplete markets and is a popular performance measure in portfolio management (see Bellini and Di Bernardino 2017, and references therein). Further axiomatic, theoretical and practical justification for the use of expectiles alongside or instead of the quantile and Expected Shortfall can be found in, among others, Ehm et al. (2016) and Bellini and Di Bernardino (2017).
Expectile estimation was first considered in Newey and Powell (1987) in the context of linear regression, and has been developing since then; recent contributions include Sobotka and Kneib (2012) as well as Holzmann and Klar (2016) and Krätschmer and Zähle (2017) for the estimation of central, nontail expectiles of fixed order \(\alpha \). By contrast, probabilistic aspects of extreme expectiles, with \(\alpha \uparrow 1\), were first considered by Bellini et al. (2014) and later Bellini and Di Bernardino (2017). The estimation of extreme expectiles has been considered even more recently in Daouia et al. (2018, 2019, 2020), where it is shown that extreme expectiles can be estimated in several ways. The construction of each of the estimators uses a combination of the heavytailed distributional assumption (representing the tail structure of many financial and actuarial data examples fairly well, see e.g. p. 9 of Embrechts et al. 1997 and p. 1 of Resnick 2007) and a remarkable asymptotic proportionality relationship linking extreme expectiles to their quantile counterparts.
An inspection of the finitesample results of Daouia et al. (2018, 2020) reveals that these estimators suffer from substantial finitesample bias, even though this is not clear from the asymptotic normality results presented therein. This is of course an issue if expectiles are to be used widely in the management of extreme risk. A partial answer to this bias problem is presented in Girard et al. (2022) in a regression setup; however, the method of Girard et al. (2022) is designed to eliminate the source of bias due to the amount of tail heaviness (which has an important influence in the asymptotic proportionality relationship), but cannot handle the bias purely due to the secondorder framework. It will therefore perform poorly when this particular source of bias dominates, that is, when the underlying heavytailed distribution is far from the standard Pareto distribution on which the extrapolation procedure is based. The problem of having to account for this discrepancy between the actual distribution and the ideal Pareto distribution in the right tail is well known and important in extreme value analysis, although historically, attention seems to have been mostly restricted to tail index estimation: a reasonably recent account of bias reduction methods in this context is given in Sect. 5.3 of Gomes and Guillou (2015).
The contribution of this paper is to provide a wide class of automatic, datadriven, secondorder fully biasreduced versions of the extreme expectile estimators currently available in the literature. This is done in three steps. First, we briefly recall the construction of extreme expectile estimators at a level \(\beta =\beta _n\uparrow 1\) as \(n\rightarrow \infty \), where n denotes sample size. This construction is based on the extrapolation of purely empirical expectile estimators at a much lower, intermediate level \(\alpha _n\uparrow 1\), with the help of an appropriate tail index estimator, and we highlight how bias may appear from tail index estimation and from the extrapolation procedure itself through a specific bias term. Second, in the Hall–Welsh subclass of heavytailed models (Hall and Welsh 1985) which contains most of the heavytailed distributions typically encountered in extreme value analysis, we provide estimators of this bias term that we then use to define versions of extrapolated extreme expectile estimators that are fully corrected for extrapolation bias. Third and last, we discuss the use of biasreduced estimators of the tail index as a way to complete the elimination of the bias of our extrapolated estimators.
As we shall see in our simulation study, the expectilebased tail index estimator allows us to gain accuracy mostly in those difficult situations where the socalled secondorder parameter is close to 0, that is, when the underlying heavytailed distribution is far from the standard Pareto distribution on which the extrapolation procedure is based. In this sense, we make substantial further gains compared to the partial biascorrection procedure discussed in Girard et al. (2022) that cannot handle this case. The combination of these second and third steps results in a fully biasreduced class of extrapolated extreme expectile estimators in our heavytailed setting. To make these estimators completely automatic, we introduce a selection rule of the Asymptotic Mean Squared Erroroptimal value of the tuning parameter \(\alpha _n\) representing the upper sample fraction used in our tail index and expectile estimators as well as in the extrapolation bias correction term. This results in estimators whose finitesample performance is superior to that of previously considered estimators in the extreme expectile estimation literature, as we shall illustrate in our simulation study.
The paper is organised in the following way. Section 2 gives details on our estimation framework and reviews currently available extreme expectile estimators. Section 3 contains the main contributions of the paper on the biasreduced estimation of extreme expectiles; all our methods and samples of real data are incorporated into the R package Expectrem, currently available at https://github.com/AntoineUC/Expectrem. Section 4 examines the performance of our estimators with a simulation study. We finally illustrate the practical applicability of our procedures on real samples of economic, actuarial and financial data in Sect. 5. More details about the implementation of our methods, mathematical proofs and a complete set of numerical results from our simulation study are relegated to the Supplementary Material document.
State of the art on extreme expectile estimation
We start by describing the existing techniques in extreme expectile estimation. Suppose throughout that the available data \((Y_1,\ldots ,Y_n)\) is made of independent realisations of the random variable Y with cumulative distribution function F (resp. survival function \({\overline{F}}=1F\)). It is assumed that \({\mathbb {E}}Y<\infty \), so that expectiles of Y of any order exist indeed. Our goal is to estimate an extreme expectile of Y, i.e. whose order tends to 1 as \(n\rightarrow \infty \).
Intermediate level We start by the case of a socalled intermediate level \(\alpha _n\uparrow 1\), namely such that \(n(1\alpha _n)\rightarrow \infty \) as \(n\rightarrow \infty \). Intermediate levels tend to infinity slowly enough that expectiles are well within the sample and can thus be estimated by purely empirical methods. It was observed by Jones (1994) that the \(\alpha _n\)th expectile is actually the quantile of level \(\alpha _n\) associated with the distribution function E defined by
Recall that the quantile at level \(\alpha _n\) of the distribution function F is defined as \(q_{\alpha _n} = \inf \{ y\in {\mathbb {R}} \,  \, F(y)\ge \alpha _n \} = \inf \{ y\in {\mathbb {R}} \,  \, {\overline{F}}(y)\le 1\alpha _n \}\). Intermediate quantiles of F may then be estimated by inverting the empirical survival function:
Here \(Y_{1,n}\le Y_{2,n}\le \cdots \le Y_{n,n}\) are the order statistics associated with \((Y_1,\ldots ,Y_n)\) and \(\lfloor \cdot \rfloor \) denotes the floor function. We apply the same principle to the estimation of the intermediate expectile: replacing population averaging with sample averaging in the definition of the distribution function E results in the estimator
This estimator is an unconditional version of the intermediate conditional expectile estimator introduced in Girard et al. (2022). A straightforward calculation shows that this estimator is in fact also exactly the Least Asymmetrically Weighted Squares (LAWS) estimator studied in Daouia et al. (2018), that is, the unique solution of the empirical counterpart of the minimisation problem (1):
An alternative estimator can be found in the class of heavytailed distributions we shall focus on hereafter. Recall that the distribution of Y is heavytailed if and only if there exists \(\gamma >0\) such that
The tail index \(\gamma \) characterises the tail heaviness of the distribution of Y: if \(\gamma >a\) then \({\mathbb {E}}(Y^{1/a}\mathbb {1}{\{ Y>0 \}}) = \infty \) (a precise statement is Exercise 1.16 p. 35 in de Haan and Ferreira 2006). Our minimal working assumption throughout will therefore be that \(\gamma < 1\) and \({\mathbb {E}}(Y_) < \infty \), where \(Y_=\max (Y,0)\), so as to ensure that \({\mathbb {E}}Y<\infty \) and thus that expectiles at any order exist indeed. In this case, we have the following asymptotic proportionality relationship between expectile and quantile:
This was first noted by Bellini et al. (2014). This connection suggests the class of indirect estimators
where \({\overline{\gamma }}\) is a consistent estimator of \(\gamma \).
Extreme level The problem of most relevance in extreme value analysis is to consider the case of a level \(\beta _n\uparrow 1\) such that \(n(1\beta _n)\rightarrow c<\infty \) as \(n\rightarrow \infty \). In this situation, purely empirical methods are typically no longer consistent: for extreme quantile estimation, in the particular example \(\beta _n>11/n\), this can be seen by combining Theorem 1.1.6 p.10, Corollary 1.2.4 p.21 and Theorem 2.1.1 p.38 in de Haan and Ferreira (2006). One therefore has to use information about the tail of the data in order to construct an extrapolation procedure. In the context of expectile estimation, this is made possible by the heavy tail assumption and convergence (4): these entail
We call this approximation the Weissman approximation, after the work of Weissman (1978) on extreme quantile estimation. This justifies introducing the class of semiparametric extrapolating estimators
where \({\overline{\xi }}_{\alpha _n}\) is any consistent estimator of \(\xi _{\alpha _n}\). One immediately deduces from (6) two subclasses of estimators, replacing \({\overline{\xi }}_{\alpha _n}\) by the LAWS estimator of \(\xi _{\alpha _n}\) or its indirect counterpart; in the latter, the estimator of \(\gamma \) can be chosen different from the estimator featured in the above extrapolation procedure, although we shall not pursue this here for the sake of simplicity. One may also construct a weighted combination of the LAWSbased and indirect estimators, as done in Daouia et al. (2021), although the finitesample benefit of doing so can be marginal.
The extrapolated LAWSbased and indirect estimators unfortunately suffer from a sizeable amount of finitesample bias. This is clear from, among others, Figs. 3 and 4 in Daouia et al. (2018), where it can be seen that even for the sample size \(n=1000\), and for certain distributions of interest in extreme value modelling, these estimators have a relative bias of the order of \(50\%\). This means that the estimator is on average \(50\%\) larger than the target extreme expectile. The contributions of this paper, which we gather in the next section, are a precise quantification of this bias using a standard secondorder refinement of the heavy tail condition, and the introduction of automatic bias reduction procedures whose finitesample performance will be examined in detail in Sects. 4 and 5.
Automatic bias reduction methodology for extreme expectile estimation
Rationale for our bias correction methods
The construction of the class of extrapolated estimators in (6) relies on the successive use of Eq. (4), in order to approximate a ratio of high expectiles by a ratio of high quantiles at corresponding levels, and an approximation of this ratio of high quantiles that is warranted by the heavy tail assumption. The magnitude of the bias of the extrapolated estimators will therefore be crucially driven by the rates of convergence and the error terms in these two approximations.
To simplify the exposition, we assume from now on that \(k_n = n(1\alpha _n)\) is a sequence of positive integers, and we rewrite the assumptions \(\alpha _n\uparrow 1\) and \(n(1\alpha _n)\rightarrow \infty \) as \(k_n\rightarrow \infty \) and \(k_n/n\rightarrow 0\). This choice is motivated by the fact that in quantile estimation, the quantity \(k_n\) denotes the effective sample size, i.e. the number of top order statistics eventually used for the estimation. Adopting this convention will make it easier to state and compare our results with existing results in the extreme value analysis of heavy tails. This is not a restriction in practice since our estimators of extreme expectiles, having order \(\beta _n\uparrow 1\) such that \(n(1\beta _n)\rightarrow c<\infty \), are built on intermediate expectile estimators whose level \(\alpha _n\) we are free to choose, and integer values for \(k_n = n (1  \alpha _n)\) induce a sufficient set of levels \(\alpha _n\) to work with.
A classical device in extreme value analysis for bias quantification is the following secondorder regular variation condition that refines our initial heavy tail assumption.
Definition 1
(Class \({\mathcal {C}}_2(\gamma ,\rho ,A)\)). The survival function \({\overline{F}}\) is said to belong to the class \({\mathcal {C}}_2(\gamma ,\rho ,A)\) of secondorder regularly varying functions with index \(1/\gamma <0\), secondorder parameter \(\rho \le 0\) and a measurable auxiliary function A having constant sign and converging to 0 at infinity, if
for all \(y>0\). Here and throughout the ratio \((y^a1)/a\) should be read as \(\log y\) when \(a=0\).
An equivalent condition on the tail quantile function \(t\mapsto q_{1t^{1}}\) is that
See de Haan and Ferreira (2006, Theorem 2.3.9 p. 48). Numerous examples of commonly used distributions that satisfy this assumption can be found in Beirlant et al. (2004).
The fundamental argument behind our methodology is that the direct and indirect extrapolated estimators, i.e.
where \({\overline{\gamma }}\) is a \(\sqrt{k_n}\)consistent estimator of \(\gamma \), satisfy
Under condition \({\mathcal {C}}_2(\gamma ,\rho ,A)\) and standard technical assumptions on \(k_n\) and \(\beta _n\), \({\widehat{\xi }}_{1k_n/n}\) and \(Y_{nk_n,n}\) have the same rate of convergence \(\sqrt{k_n}\), which is also the rate of convergence of the final (pure bias) nonrandom term, see e.g. Theorem 5 in Daouia et al. (2020) and its proof. Since \(\log ( k_n/(n(1\beta _n)) ) \rightarrow \infty \), the first term dominates, leading to a common asymptotic distribution for \({\widehat{\xi }}_{\beta _n}^{\star }\) and \({\widetilde{\xi }}_{\beta _n}^{\star }\): if \(\sqrt{k_n} ({\overline{\gamma }}\gamma ) {\mathop {\longrightarrow }\limits ^{\mathrm {d}}}\varGamma \), then
A naive biascorrection strategy would thus focus on the bias incurred in the estimation of \(\gamma \), identifiable through the asymptotic distribution \(\varGamma \). However, Eqs. (7) and (8) reveal another source of bias, which is the use of the Weissman approximation itself to control the final terms in Eqs. (7) and (8) [note that the second term in Eq. (7) and the third term in Eq. (8) are asymptotically unbiased, see Theorem 2 in Daouia et al. (2018) and Theorem 2.4.8 p. 52 in de Haan and Ferreira (2006)]. Our contribution hereafter is to design fully biasreduced extreme expectile estimators by eliminating both of these sources of bias.
Construction of biasreduced extreme expectile estimators
We construct biasreduced versions of the extrapolated estimators \({\widehat{\xi }}_{\beta _n}^{\star }\) and \({\widetilde{\xi }}_{\beta _n}^{\star }\) in two steps. Our methods will feature estimators of the secondorder parameter \(\rho \), but also estimators of the auxiliary function A. Estimating this function without making any further assumption can be a difficult task; however, for most of the distributions satisfying condition \({\mathcal {C}}_2(\gamma ,\rho ,A)\) used for modelling purposes, the function A takes the form \(A(t)=b \gamma t^{\rho }\), for a certain nonzero constant b and \(\rho <0\). We assume in what follows that the function A is indeed of this form, which amounts to assuming that the underlying distribution belongs to the Hall–Welsh class in the sense of Gomes and Pestana (2007). We give a list of examples of classical heavytailed distributions in Table 1, containing among others the distributions we shall work with in our simulation study, with their respective values of \(\gamma \), \(\rho \) and b. The results in Table 1 can be checked in a straightforward manner using a general result on the link between secondorder regular variation and asymptotic expansions of the probability density function; see Lemma 1 in Sect. A of the Supplementary Material document.
The function A can then be estimated using consistent estimators \({\overline{\gamma }}\), \({\overline{b}}\) and \({\overline{\rho }}\) of \(\gamma \), b and \(\rho \), respectively. We assume in Sect. 3.2.1 that such estimators are given; we shall explain in detail in Sect. 3.2.2 which estimators \({\overline{\gamma }}\) we consider. The estimators \({\overline{b}}\) and \({\overline{\rho }}\) are calculated directly for all procedures considered in this paper using the R package evt0 (see Manjunath and Caeiro (2013) and Sect. A in the Supplementary Material document for a brief summary of how these estimators are constructed). We start by dealing with the bias due to the extrapolation procedure itself, contained in the final terms of Eqs. (7) and (8).
Bias due to the extrapolation procedure
We deal with the nonrandom bias term in Eq. (7), and we write
By Theorem 2.3.9 p. 48 in de Haan and Ferreira (2006), the bias term \(B_{1,n}\) can be written as
We now focus on the other two bias terms \(B_{2,n}\) and \(B_{3,n}\) linking an expectile to its quantile counterpart at intermediate and extreme levels, respectively. It follows from the proof of Proposition 1 in Daouia et al. (2018) that
as \(\alpha \uparrow 1\). Using Lemma 1 in Daouia et al. (2020) together with the heavy tail assumption then entails
as \(\alpha \uparrow 1\). With \(\alpha =1k_n/n\) and \(\alpha =\beta _n\), this yields
Each of these bias terms can be estimated. Recall our assumption that \(A(t)=b \gamma t^{\rho }\), and estimate \(B_{1,n}\) by
Let further \({\overline{Y}}_n\) denote the sample mean of \(Y_1,\ldots ,Y_n\), \({\overline{\xi }}_{1k_n/n}\) be either the LAWS or indirect intermediate expectile estimator, and \({\overline{\xi }}_{\beta _n}^{\star }\) be the related extrapolated estimator (in our current implementation we use the LAWS estimator for \({\overline{\xi }}_{1k_n/n}\) and its extrapolated version for \({\overline{\xi }}_{\beta _n}^{\star }\)). The remainder terms \(r(1k_n/n)\) and \(r(\beta _n)\) are estimated by
This yields estimators of \(B_{2,n}\) and \(B_{3,n}\) as
We deduce from (7) and (10) that a version of the direct extrapolated estimator \({\widehat{\xi }}_{\beta _n}^{\star }\), corrected for the bias exclusively due to the heavytailed extrapolation, is
A correction for the extrapolation bias in the indirect estimator is simpler. Indeed,
From (14), a version of the indirect extrapolated estimator \({\widetilde{\xi }}_{\beta _n}^{\star }\), corrected for the bias exclusively due to the heavytailed extrapolation, is then
The methodology introduced here differs from the earlier bias reduction technique introduced in Girard et al. (2022) in a regression setup. In Girard et al. (2022), the bias term proportional to \(A(n/k_n)\) is not corrected because it is very difficult to correct accurately this source of bias in the conditional, nonparametric setup on which that paper focuses. In addition, the correction terms in the aforementioned paper rely on linearising the bias terms, whereas we keep the structure of the bias as intact as possible. This makes our correction term \(B_{2,n}\) more accurate than in this earlier attempt. The inclusion of the term \(B_{3,n}\) is also new; while it could be expected that this term only has a small influence because it relies on quantities calculated at a higher asymptotic order, it is our experience that its inclusion substantially improves finitesample performance.
We now concentrate on reducing the bias due to the estimation of \(\gamma \). Combined with the general corrections in (13) and (15), this will result in a fully biascorrected extrapolated estimator.
Bias reduction for tail index estimation: an expectilebased method
Numerous tail index estimators have been introduced and studied in the literature; a review of some of the most important estimators is given in de Haan and Ferreira (2006, Chapter 3). There are various techniques for the reduction of bias of such estimators, an excellent summary being given in the Introduction of Cai et al. (2013). Here our contribution is to propose a biasreduced version of a purely expectilebased tail index estimator, our procedure being partly inspired by a method developed in, among others, Caeiro et al. (2005). To make the construction of this estimator easier, we start by briefly recalling how the technique of Caeiro et al. (2005) works. Consider the classical Hill estimator (Hill 1975):
It is known that under \({\mathcal {C}}_2(\gamma ,\rho ,A)\), and if, in addition, \(\sqrt{k_n} A(n/k_n)\rightarrow \lambda \in \mathbb {R}\), then (see Theorem 3.2.5 p. 74 in de Haan and Ferreira 2006)
In finite samples \(\lambda \approx \sqrt{k_n} A(n/k_n) = \sqrt{k_n} b \gamma (n/k_n)^{\rho }\), meaning that the pseudoestimator (depending on the true unknown values of b and \(\rho \))
should be asymptotically unbiased with the same variance as the Hill estimator. Caeiro et al. (2005) then plug in consistent estimators \({\overline{b}}\) of b and \({\overline{\rho }}\) of \(\rho \) and arrive at the biasreduced Hill estimator
Theorem 3.1 of Caeiro et al. (2005) shows that \({\widehat{\gamma }}_{k_n}^{\mathrm {H},\mathrm {RB}}\) is indeed \(\sqrt{k_n}\)asymptotically Gaussian with expectation zero and variance \(\gamma ^2\). The construction of this estimator essentially hinges on eliminating the bias by multiplying the original estimator by a quantity cancelling this bias.
We adapt here this construction to propose a biasreduction procedure for the estimator
The rationale behind this estimator, studied in a different context by Girard et al. (2022), is that, from (11),
as \(\alpha \uparrow 1\). To find a biasreduced version of this asymptotic proportionality tail index estimator \({\widehat{\gamma }}_{k_n}^{\mathrm {E}}\), we follow the above idea and use the sample counterpart \({\overline{r}}(1k_n/n)\) of \(r(1k_n/n)\) defined in Sect. 3.2.1: this yields a biasreduced asymptotic proportionality tail index estimator as
In our current implementation we take \({\overline{\xi }}_{1k_n/n} = {\widehat{\xi }}_{1k_n/n}\) and \({\overline{\gamma }} = {\widehat{\gamma }}_{k_n}^{\mathrm {H},\mathrm {RB}}\) inside \({\overline{r}}(1k_n/n)\), specifically for the calculation of this biasreduced version.
Our first main theoretical result gives the asymptotic normality and unbiasedness of \({\widehat{\gamma }}_{k_n}^{\mathrm {E},\mathrm {RB}}\).
Theorem 1
Suppose that \({\mathbb {E}}Y_{}^{2+\delta }<\infty \) for some \(\delta >0\). Assume further that \({\mathcal {C}}_2(\gamma ,\rho ,A)\) holds with \(0<\gamma <1/2\), \(\rho <0\) and \(A(t)=b \gamma t^{\rho }\), and let \(k_n\) be a sequence such that \(k_n \rightarrow \infty \) and \(k_n/n \rightarrow 0\) as \(n \rightarrow \infty \). If \(\sqrt{k_n} A (n/k_n) \rightarrow \lambda _1 \in {\mathbb {R}}\), \(\sqrt{k_n}/q(1k_n/n) \rightarrow \lambda _2 \in {\mathbb {R}}\), and \({\overline{\gamma }}\), \({\overline{\rho }}\) and \({\overline{b}}\) are consistent estimators of \(\gamma \), \(\rho \) and b such that \(({\overline{\rho }}\rho ) \log (n)={\text {o}}_{{\mathbb {P}}}(1)\), then
We note that the asymptotic variance in Theorem 1 is an unconditional version of the asymptotic variance found for a conditional analogue of \({\widehat{\gamma }}_{k_n}^{\mathrm {E}}\), see Theorem 4 in Girard et al. (2022).
We provide a comparison of the two tail index estimators \({\widehat{\gamma }}_{k_n}^{\mathrm {E},\mathrm {RB}^{}}\) and \({\widehat{\gamma }}_{k_n}^{\mathrm {H},\mathrm {RB}}\) in terms of variance in Fig. 1. The asymptotic variance of \({\widehat{\gamma }}_{k_n}^{\mathrm {E},\mathrm {RB}}\) is substantially smaller than that of \({\widehat{\gamma }}_{k_n}^{\mathrm {H},\mathrm {RB}}\) when \(\gamma \) is less than 0.35. The variance of \({\widehat{\gamma }}_{k_n}^{\mathrm {E},\mathrm {RB}}\) explodes, however, as \(\gamma \uparrow 1/2\), which is to be expected since the estimators \({\widehat{\gamma }}_{k_n}^{\mathrm {E}}\) and \({\widehat{\gamma }}_{k_n}^{\mathrm {E},\mathrm {RB}}\) are based on the intermediate LAWS estimator \({\widehat{\xi }}_{1k_n/n}\), itself known to be asymptotically normal only when \(\gamma <1/2\) (see Daouia et al. 2018). In our implementation (and in particular in the calculation of the \({\overline{B}}_{j,n}\) in Sect. 3.2.1), one can choose either \({\overline{\gamma }}={\widehat{\gamma }}_{k_n}^{\mathrm {H},\mathrm {RB}}\) or \({\overline{\gamma }}={\widehat{\gamma }}_{k_n}^{\mathrm {E},\mathrm {RB}}\).
Fully biasreduced estimators and their asymptotic properties
Our final, fully biasreduced estimators are members of the following two classes. The first class, which extrapolates the intermediate LAWS estimator, is derived from Eq. (13) and is defined as
where \({\overline{\gamma }}\) is any biasreduced estimator of the tail index \(\gamma \), and \({\overline{B}}_{1,n}\), \({\overline{B}}_{2,n}\) and \({\overline{B}}_{3,n}\) are defined in Sect. 3.2.1. The second class, based on extrapolating the indirect estimator, is derived from Eq. (15) and is
where again \({\overline{\gamma }}\) is any biasreduced estimator of the tail index \(\gamma \). It should be noted that this second class of estimators has a nice interpretation in terms of a biasreduced version for the Weissman extreme quantile estimator (Weissman 1978; Gomes and Pestana 2007): this estimator is, with our notation,
It follows that an equivalent expression for \({\widetilde{\xi }}_{\beta _n}^{\star ,\mathrm {RB}}\) is
In other words, this estimator is obtained by using the asymptotic proportionality relationship (4) at level \(\alpha =\beta _n\), plugging in biasreduced estimators of the tail index and extreme quantile involved, and finally correcting directly for the bias incurred using this proportionality relationship at the level \(\beta _n\) only.
We briefly explore the asymptotic properties of these biasreduced estimators \({\widehat{\xi }}_{\beta _n}^{\star ,\mathrm {RB}}\) and \({\widetilde{\xi }}_{\beta _n}^{\star ,\mathrm {RB}}\). It was highlighted in Sect. 3.1 [see limit (9)] that extrapolated expectile estimators have a limiting distribution controlled by their tail index estimator. This is also true for their biasreduced versions, as the following result shows.
Theorem 2
Assume that \({\mathcal {C}}_2(\gamma ,\rho ,A)\) holds with \(\rho <0\) and \(A(t)=b \gamma t^{\rho }\), and let \(k_n\), \(\beta _n\) be two sequences such that \(k_n \rightarrow \infty \), \(k_n/n \rightarrow 0\), \(n(1\beta _n)/k_n \rightarrow 0\) and \(\log ( k_n/(n(1\beta _n)) ) / \sqrt{k_n} \rightarrow 0\) as \(n \rightarrow \infty \). Assume further that \(\sqrt{k_n} A (n/k_n) \rightarrow \lambda _1 \in {\mathbb {R}}\), \(\sqrt{k_n}/q(1k_n/n) \rightarrow \lambda _2 \in {\mathbb {R}}\), \(\sqrt{k_n} ({\overline{\gamma }}\gamma ) {\mathop {\longrightarrow }\limits ^{\mathrm {d}}}\varGamma \) where \(\varGamma \) is a nondegenerate distribution, and \({\overline{\rho }}\) and \({\overline{b}}\) are consistent estimators of \(\rho \) and b such that \(({\overline{\rho }}\rho ) \log (n)={\text {o}}_{{\mathbb {P}}}(1)\).

(i)
If \({\mathbb {E}}Y_^2<\infty \) and \(0<\gamma <1/2\), then the extrapolated LAWS estimator satisfies
$$\begin{aligned} \displaystyle \frac{\sqrt{k_n}}{\log (k_n/(n(1\beta _n)))} \left( \frac{{\widehat{\xi }}_{\beta _n}^{\star ,\mathrm {RB}}}{\xi _{\beta _n}}1 \right) {\mathop {\longrightarrow }\limits ^{\mathrm {d}}}\varGamma . \end{aligned}$$ 
(ii)
If \({\mathbb {E}}Y_<\infty \) and \(0<\gamma <1\), then the extrapolated indirect estimator satisfies
$$\begin{aligned} \displaystyle \frac{\sqrt{k_n}}{\log (k_n/(n(1\beta _n)))} \left( \frac{{\widetilde{\xi }}_{\beta _n}^{\star ,\mathrm {RB}}}{\xi _{\beta _n}}1 \right) {\mathop {\longrightarrow }\limits ^{\mathrm {d}}}\varGamma . \end{aligned}$$
It follows from our Theorem 2 and Corollaries 3 and 4 in Daouia et al. (2018) that the biasreduced extrapolated expectile estimators have the same rates of convergence and weak limits as their standard counterparts. Note that the indirect estimator is applicable in a wider range of situations since it only requires a finite first moment of Y, contrary to the extrapolated LAWS estimator which essentially requires a finite second moment of Y.
We shall illustrate in Sect. 4 that, even though the biasreduced versions have the same asymptotic properties as their standard counterparts, they generally have much better finitesample properties. Before that, we explain how to select the important tuning parameter \(k_n\) appearing in the estimation of the tail index, intermediate expectile level, and bias correction terms.
Choice of the intermediate level \(k_n\)
The choice of the sequence \(k_n\) is a crucial point: a low \(k_n\) translates into a large variance, and a high \(k_n\) translates into a large bias. Choosing \(k_n\) therefore leads to solving a tradeoff between the bias and variance of the tail index estimator to be used. In order to find the right balance, de Haan and Ferreira (2006, pp. 77–82) proposed a choice of \(k_n\) for the Hill estimator as a minimiser of an estimate of its Asymptotic Mean Squared Error. We propose to develop our own selection rule for \(k_n\), to be used in conjunction with the expectilebased biasreduced asymptotic proportionality estimator \({\widehat{\gamma }}_{k_n}^{\mathrm {E},\mathrm {RB}}\), based on the ordinary asymptotic proportionality estimator \({\widehat{\gamma }}_{k_n}^{\mathrm {E}}\) also presented in Sect. 3.2.2. For this estimator, it holds that
See Proposition 1 in Sect. B of the Supplementary Material document. Consequently, there are two sources of bias in the expectilebased estimator \({\widehat{\gamma }}_{k_n}^{\mathrm {E}}\): one proportional to \(A( n/k_n )\), having order \((n/k_n)^{\rho }\), and another proportional to \(1/q( 1k_n/n )\), having order \((n/k_n)^{\gamma }\). The leading term of bias will thus depend on \(\rho \) and \(\gamma \). The second source of bias, proportional to \(1/q( 1k_n/n )\), can very accurately be eliminated; indeed, its expression only features \(\gamma \), \({\mathbb {E}}(Y)\) and \(q( 1k_n/n )\), of which we have good estimators that converge at the rate \(\sqrt{k_n}\) or more. By contrast, the first source of bias features secondorder parameters from the distribution of Y, whose estimators converge slowly (see e.g. p. 298 in Gomes et al. 2009 and p. 2638 in Goegebeur et al. 2010), and thus is more difficult to remove. In practice, this means that the tradeoff to be solved when using the expectilebased asymptotic proportionality tail index estimator will essentially be between the bias due to the secondorder quantity \(A( n/k_n )\) and the variance of the estimator. This gives us the idea of minimising the Partial Asymptotic Mean Squared Error
It is readily checked that this function of \(k_n\) has a unique minimum, and that (viewing \(\mathrm {PAMSE}\) as a differentiable function of a single real variable) cancelling its first derivative leads to
This optimal value depends on the unknown \(\gamma \), \(\rho \) and b; in practice we use the estimated value \({\widehat{k}}_n^{\mathrm {E}} = \)
In Eq. (16), the estimators \({\overline{\rho }}\) and \({\overline{b}}\) are the same as in Sect. 3.2, and \({\overline{\gamma }}\) is the biasreduced Hill estimator \({\widehat{\gamma }}_{k_n}^{\mathrm {H},\mathrm {RB}}\) with the corresponding estimated AMSEoptimal choice of \(k_n\), that is
The fact that we force our selected \({\widehat{k}}_n^{\mathrm {E}}\) to be less than \(\lfloor n/2 \rfloor \) is due to the presence of the multiplicative term \((12k_n/n)^{1}\) in our bias reduction methodology (featuring in \({\overline{r}}(1k_n/n)\)). Since \(n^{2\rho /(12\rho )} = {\text {o}}(n)\) for any \(\rho <0\), this restriction disappears with arbitrarily high probability as \(n\rightarrow \infty \). We recommend this choice \({\widehat{k}}_n^{\mathrm {E}}\) when the expectilebased asymptotic proportionality estimator \({\widehat{\gamma }}_{k_n}^{\mathrm {E},\mathrm {RB}}\) is used.
The Expectrem package
We have implemented our methods in an R package called Expectrem, freely downloadable at https://github.com/AntoineUC/Expectrem. This package contains the following functions and data sets relevant to this paper:

Basic functions for the estimation: Fbarhat returns the empirical estimator of the survival function, and expect provides the empirical LAWS expectile estimator at a given level. Basic population expectile calculations: enorm, et, elog, epareto, egpd and eburr respectively return the expectiles of the normal, Student, logistic, Pareto, Generalised Pareto (GP) and Burr distributions.

Tail index estimation: The estimator \({\widehat{\gamma }}_{k_n}^{\mathrm {E}}\) is implemented in the function tindexp with argument br=FALSE (default). If br=TRUE, the biasreduced estimator \({\widehat{\gamma }}_{k_n}^{\mathrm {E},\mathrm {RB}}\) is returned. The other optional argument is the intermediate level k, set at \(k={\widehat{k}}_n^{\mathrm {E}}\) by default.

Extreme expectile estimation: The direct and indirect extreme expectile estimators \({\widehat{\xi }}_{\beta _n}^{\star }\) and \({\widetilde{\xi }}_{\beta _n}^{\star }\) are computed in the function extExpect with arguments method="direct" and method="indirect" respectively, and br=FALSE (default). If br=TRUE, the biasreduced estimators \({\widehat{\xi }}_{\beta _n}^{\star ,\mathrm {RB}}\) and \({\widetilde{\xi }}_{\beta _n}^{\star ,\mathrm {RB}}\) are returned instead. Argument estim="Hill" (default) calls the Hill estimator (function mop in the package evt0), and estim="tindexp" calls tindexp; setting br=TRUE calls the biasreduced versions of these estimators. The choice of \(k_n\) is also an option through k, and by default \(k_n={\widehat{k}}_n^{\mathrm {H}}\) (if estim="Hill") or \(k_n={\widehat{k}}_n^{\mathrm {E}}\) (if estim="tindexp").

Extreme quantile estimation: The estimator \({\widetilde{q}}_{\beta _n}^{\star }\) is computed in the function extQuant with argument br=FALSE (default). If br=TRUE, the biasreduced estimator \({\widetilde{q}}_{\beta _n}^{\star ,\mathrm {RB}}\) is returned instead. The other arguments are those of extExpect.

Data sets: austria, belgium, commerzbank, finland, france, greece, italy, namibia, netherlands, newzealand, secura and southafrica, as described in Sect. 5, with URLs pointing to the sources for these data sets.
Simulation study
We study the finitesample performance of our estimators on simulated data in order to assess the importance of bias reduction in extreme expectile estimation. For that purpose and in order to get a good overview of practical performance, we consider the following heavytailed distributions for Y (see Table 1):

A Burr distribution with tail index \(\gamma >0\) and secondorder parameter \(\rho <0\), i.e. \({\overline{F}}(y)=( 1+y^{\rho /\gamma } )^{1/\rho }\) for \(y>0\). The interesting point here is that the choices of \(\gamma \) and \(\rho \) are free, meaning that for a fixed \(\gamma \) we can make the secondorder parameter \(\rho \) vary in order to generate scenarios with various degrees of difficulty in the estimation. We consider here \(\rho =5,1,0.5\), corresponding respectively to an easy, medium, and hard estimation problem.

A Generalised Pareto Distribution (GPD) with tail index \(\gamma >0\) and unit scale, i.e. \({\overline{F}}(y)= ( 1+\gamma y )^{1/\gamma }\) for \(y>0\). Here \(\rho =\gamma \). For \(\gamma \) close to 0, \(\rho \) will then also be close to 0, meaning that we expect the estimation problem to be difficult when the data are generated from this distribution.
For each of these distributions (three Burr distributions and one GPD), we consider the cases \(\gamma =0.1,0.2,0.3,0.4\). This gives 16 cases in total. In each case, we simulate \(N=1000\) data sets \((Y_1,\ldots ,Y_n)\) of \(n=1{,}000\) independent realisations of Y, with survival distribution function \({\overline{F}}\). We estimate the expectile of level \(\beta _n=15/n=0.995\) using five methodologies:

(i)
The biasreduced extrapolated LAWS estimator \({\widehat{\xi }}_{\beta _n}^{\star ,\mathrm {RB}}\) with the expectilebased, biasreduced tail index estimator \({\widehat{\gamma }}_{k_n}^{\mathrm {E},\mathrm {RB}}\),

(ii)
The biasreduced extrapolated indirect estimator \({\widetilde{\xi }}_{\beta _n}^{\star ,\mathrm {RB}}\) with the expectilebased, biasreduced tail index estimator \({\widehat{\gamma }}_{k_n}^{\mathrm {E},\mathrm {RB}}\),

(iii)
The biasreduced extrapolated LAWS estimator \({\widehat{\xi }}_{\beta _n}^{\star ,\mathrm {RB}}\) with the biasreduced Hill estimator \({\widehat{\gamma }}_{k_n}^{\mathrm {H},\mathrm {RB}}\),

(iv)
The biasreduced extrapolated indirect estimator \({\widetilde{\xi }}_{\beta _n}^{\star ,\mathrm {RB}}\) with the biasreduced Hill estimator \({\widehat{\gamma }}_{k_n}^{\mathrm {H},\mathrm {RB}}\),

(v)
(As a benchmark) The extrapolated LAWS estimator \({\widehat{\xi }}_{\beta _n}^{\star }\) (without bias reduction) with the biasreduced Hill estimator \({\widehat{\gamma }}_{k_n}^{\mathrm {H},\mathrm {RB}}\).
Comparing methods (i) and (iii) on the one hand, and (ii) and (iv) on the other hand, allows us to see the influence of the choice of tail index estimator. Comparing (iii), (iv) and (v) makes it possible to assess the benefit of the bias reduction method in the expectile extrapolation. To get a further idea of the difference between using the biasreduced tail index estimator \({\widehat{\gamma }}_{k_n}^{\mathrm {E},\mathrm {RB}}\) introduced in the current paper and the biasreduced Hill estimator \({\widehat{\gamma }}_{k_n}^{\mathrm {H},\mathrm {RB}}\), we also record the values of \({\widehat{\gamma }}_{k_n}^{\mathrm {E},\mathrm {RB}}\) and \({\widehat{\gamma }}_{k_n}^{\mathrm {H},\mathrm {RB}}\).
We assess finitesample performance by computing the following quantities:

The relative bias, variance and meansquared error of the extrapolated expectile estimators. For the biasreduced extrapolated LAWS estimator \({\widehat{\xi }}_{\beta _n}^{\star ,\mathrm {RB}}\), that is
$$\begin{aligned}&{\text {RBias}}({\widehat{\xi }}_{\beta _n}^{\star ,\mathrm {RB}}) = \frac{1}{N} \sum _{j=1}^N \left( \frac{{\widehat{\xi }}_{\beta _n}^{\star ,\mathrm {RB},(j)}}{\xi _{\beta _n}}  1 \right) \\&\text{ and } {\text {RMSE}}({\widehat{\xi }}_{\beta _n}^{\star ,\mathrm {RB}}) = \frac{1}{N} \sum _{j=1}^N \left( \frac{{\widehat{\xi }}_{\beta _n}^{\star ,\mathrm {RB},(j)}}{\xi _{\beta _n}}  1 \right) ^2 \end{aligned}$$(where \({\widehat{\xi }}_{\beta _n}^{\star ,\mathrm {RB},(j)}\) is calculated on the jth sample) and \({\text {RVar}}=\text{ RMSE }{\text {RBias}}^2\) is the relative variance. Similarly for the indirect estimator \({\widetilde{\xi }}_{\beta _n}^{\star ,\mathrm {RB}}\) and the nonbiasreduced estimator \({\widehat{\xi }}_{\beta _n}^{\star }\).

The (classical) bias, variance and meansquared error of the estimators \({\widehat{\gamma }}_{k_n}^{\mathrm {E},\mathrm {RB}}\) and \({\widehat{\gamma }}_{k_n}^{\mathrm {H},\mathrm {RB}}\).
All these quantities are calculated for \(k_n\) chosen to be (in each sample) the selected value \({\widehat{k}}_n^{\mathrm {E}}\) or \({\widehat{k}}_n^{\mathrm {H}}\) as appropriate, depending on whether the estimator \({\widehat{\gamma }}_{k_n}^{\mathrm {E},\mathrm {RB}}\) or \({\widehat{\gamma }}_{k_n}^{\mathrm {H},\mathrm {RB}}\) is used. The results, referring to our experiments with the Burr distribution with secondorder parameter \(\rho =5, 1, 0.5\) and the Generalised Pareto distribution, are reported in Tables C.1, C.2, C.3 and C.4 respectively (see Sect. C of the Supplementary Material document). For each value of \(k_n\in \{2,3,\ldots ,450\}\), we also record, and report, median expectile and tail index estimates across all N replicates, along with the corresponding logmeansquared errors, in Figs. C.1, C.2, C.3, C.4 and C.5 in this same Sect. C of the Supplementary Material document.
We conclude from this simulation study that, on our tested cases, the bias reduction scheme is very effective: as a consequence, the RMSE of the biasreduced estimators is often one and sometimes two orders of magnitude lower than the RMSE of the standard extrapolated estimators. This is true across a wide range of values of \(k_n\), as shown in Figs. C.1, C.2 and C.3, where it is seen that overall the biasreduced estimators based on \({\widehat{\gamma }}_{k_n}^{\mathrm {E},\mathrm {RB}}\) seem to have an advantage in terms of bias, while those based on \({\widehat{\gamma }}_{k_n}^{\mathrm {H},\mathrm {RB}}\) have a lower MSE overall. There does not seem to be a clear winner between (biasreduced) direct and indirect estimators. Similar results, not reported here for the sake of brevity, were observed when estimating extreme expectiles at the more extreme level \(11/(2n)=0.9995\).
It also appears (from considering the case of the Burr distribution with \(\rho =0.5\) and the GPD distribution) that the biasreduced estimator \({\widehat{\gamma }}_{k_n}^{\mathrm {E},\mathrm {RB}}\) is particularly interesting, including in terms of MSE, for values of \(\rho \) close to 0, and is competitive otherwise; note that for large \(\rho \), the Burr distribution gets very close to the Pareto distribution for which the Hill estimator is the Maximum Likelihood estimator and known to be optimal, so it is not reasonable to expect that the estimator \({\widehat{\gamma }}_{k_n}^{\mathrm {E},\mathrm {RB}}\) would be more accurate than the biasreduced Hill estimator in such cases. The estimator \({\widehat{\gamma }}_{k_n}^{\mathrm {E},\mathrm {RB}}\) appears to be a useful complement to available tail index estimation devices.
As a complement, and in order to evaluate specifically the influence of a change in the bulk of the distribution upon the compared estimators, we examine the following model for Y:

A Fisher distribution with \((\nu _1,\nu _2)\) degrees of freedom, i.e. having probability density function
$$\begin{aligned} f(t) = \dfrac{(\nu _1/\nu _2)^{\nu _1/2}}{B(\nu _1/2,\nu _2/2)} t^{\nu _1/21}(1+\nu _1 t/\nu _2)^{(\nu _1+\nu _2)/2} \end{aligned}$$for \(t>0\). This distribution has tail index \(\gamma =2/\nu _2\), which varies in \(\{ 0.1,0.2,0.3,0.4 \}\), and secondorder parameter \(\rho =2/\nu _2\). Here, even though the tail index and secondorder parameter are constant in \(\nu _1\), the actual bias component is not because b is a nontrivial function of \(\nu _1\), see Table 1. We consider here \(\nu _1 = 1,2,5,10\), with smaller values of \(\nu _1\) corresponding to larger positive values of b and hence to larger amounts of bias (note that the quantity \(b=b(\nu _1,\nu _2)\) diverges to infinity as \(\nu _1\downarrow 0\) when \(\nu _2\) is fixed).
We again simulate \(N=1000\) data sets \((Y_1,\ldots ,Y_n)\) of \(n=1000\) independent realisations of Y and we estimate the expectile of level \(\beta _n=15/n=0.995\). For each value of \(k_n\in \{2,3,\ldots ,450\}\), we record and report median expectile estimates across all N replicates in Figs. C.6 and C.7, see Sect. C of the Supplementary Material document. It is readily seen that our proposed estimators provide a very substantial improvement upon the standard, not biasreduced alternative. The indirect estimator \({\widetilde{\xi }}_{\beta _n}^{\star ,\mathrm {RB}}\) based on \({\widehat{\gamma }}_{k_n}^{\mathrm {E},\mathrm {RB}}\) seems to perform especially well, even when \(\nu _1\le 2\). This difficult setup corresponds to those situations when the shape of the Fisher distribution is very dissimilar to the shape of the Pareto distribution upon which the extrapolation methodology is based. In such cases where the bias is very large, using the biasreduced expectilebased tail index estimator \({\widehat{\gamma }}_{k_n}^{\mathrm {E},\mathrm {RB}}\), rather than the biasreduced Hill estimator \({\widehat{\gamma }}_{k_n}^{\mathrm {H},\mathrm {RB}}\), appears to be particularly beneficial. It was pointed out to us by a referee that in each tested case (Burr, GPD, Fisher), the bias component b was positive; we defer to future research the question of studying the performance of the proposed bias reduction approaches depending on the sign of b.
Applications on real data
We apply our methodology to three data sets, from insurance, economics, and finance, as a way to illustrate the applicability of our expectile and tail index estimators.
Reinsurance premium estimation
Reinsurance is a very important way of mitigating risk associated with highimpact events such as extreme climate episodes. Reinsurance contracts typically involve two insurance companies A and B; by the terms of the contract, company A transfers to company B (totally or partially) the risk associated to events involving large claims. Here we focus on the case when risk is totally transferred, which is also called excessofloss reinsurance. Under such a policy, when a claim occurs, company A pays the claim amount up to a certain amount R decided in the reinsurance contract, called retention level, and company B underwrites all losses above that amount R. In other words, if the total claim amount is Y, company A pays \(\min (Y,R)\), and company B pays \(\max (YR,0)\). A crucial task to decide the terms of a reinsurance contract is to accurately price this contract, which leads to the calculation of the socalled reinsurance premium. A first, natural approach to do this is to use the net premium principle (see e.g. Chapters 4 and 5 in Kaas et al. 2008), namely
where \({\overline{F}}\) is the survival function of Y. However, paying company B this net average premium would not protect that company from a catastrophic loss much higher than its average value. A solution to this problem developed in the actuarial literature over the last 25 years has been to consider more conservative premium principles, including the distorted premiums introduced in Wang (1996):
where \(g : [0,1] \rightarrow [0,1]\) is a nondecreasing concave function such that \(g(0)=0\) and \(g(1)=1\), called the distortion function. The choice \(g(x)=x\) leads to the net premium principle, but there are several reasonable ways of choosing a function g leading to a more conservative (i.e. higher) premium, such as the Dual Power function, or the Proportional Hazards function (we refer to, among others, Wang 1995 and Chapter 3 of Dickson 2016).
Since in reinsurance the retention level R should be considered as a high (and therefore rarely observed) level of claim amount, the calculation of the reinsurance premium is very closely linked to the right tail of the distribution of the claim amount Y. This motivated Vandewalle and Beirlant (2006) to develop an extreme value theorybased method for the estimation of \(\varPi _g(R)\) (a different, somewhat linked theory for Wang distortion risk measures conditionally on the loss being high is provided in El Methni and Stupfler 2017). In particular, they proved that if Y is heavytailed with tail index \(\gamma \) and \(g(1/\cdot )\) is regularly varying with tail index \(\delta \), i.e. \(g(1/(ty))/g(1/t) \rightarrow y^{\delta }\) as \(t \rightarrow \infty \), where \(\delta<\gamma <0\), then the following limiting relationship holds between the distorted premium, the retention level and the probability of exceeding that level:
An interesting version of that relationship is found when \(R=\xi _{\beta }\) for \(\beta \uparrow 1\). In that case, using Eq. (11), we get
In the case of the net premium, \(g(x)=x\), so \(\delta =1\) and we find \(\varPi _g(\xi _{\beta }) = \varPi (\xi _{\beta }) \sim (1\beta ) \xi _{\beta }\); in this asymptotic equivalence, the tail index \(\gamma \) does not appear anymore, and the expectile is in some sense an asymptotic inverse of the function \(R\mapsto \varPi (R)/R\) that represents the proportion of the retention level R paid on average per claim by company B (in other words, if \(\varPi (R)/R = \pi \), then company B contributes \(\pi R\) to the payment of the average claim).
Given this relevance of expectiles to premium calculation for large claims, we propose to estimate the reinsurance premium \(\varPi _g(R)\) (for a large retention level R) using Eq. (17) and our biasreduced extreme expectile estimation methodology. For that purpose, we consider the wellknown Secura Belgian Re data used in Vandewalle and Beirlant (2006), available in our package Expectrem as well as several other R packages such as ReIns (Reynkens et al. 2020), CASdatasets (Dutang and Charpentier 2019) and ltmix (Blostein and Miljkovic 2019). This data set contains \(n=370\) inflationadjusted automobile claim amounts (from 1988 to 2001), larger than . We consider two premium principles: the net premium (\(g(x)=x\)) and Dual Power (\(g(x)=1(1x)^{\kappa }\)) principles, and to allow us to compare our results with those of Vandewalle and Beirlant (2006), we take \(\kappa =1.366\). This choice was already recommended in Wang (1996). We estimate the associated premiums \(\varPi _g(R)\) with the statistic
Here \({\widehat{\xi }}_{\beta }^{\star ,\mathrm {RB}}\) denotes the biasreduced LAWS extrapolated estimator calculated using either our biasreduced asymptotic proportionality tail index estimator \({\widehat{\gamma }}_{k_n}^{\mathrm {E},\mathrm {RB}}\) or its biasreduced Hill counterpart \({\widehat{\gamma }}_{k_n}^{\mathrm {H},\mathrm {RB}}\), and \({\overline{\gamma }}\) is taken to be the same tail index estimator as the one used in the extrapolation. We also compare this estimator with the one in which the biasreduced extreme expectile estimator is replaced by the standard, nonbiasreduced extrapolated LAWS estimator calculated with \({\widehat{\gamma }}_{k_n}^{\mathrm {H},\mathrm {RB}}\) (and thus \({\overline{\gamma }}={\widehat{\gamma }}_{k_n}^{\mathrm {H},\mathrm {RB}}\) everywhere). The estimated reinsurance premiums are represented in Fig. 2 on a fine grid of values of \(\beta _n\); this yields curves of estimates of \(\varPi _g(R)\) in the tail region \(R\rightarrow \infty \), that we compare to the premium curve obtained by Vandewalle and Beirlant (2006). Indirect estimators are here almost identical to their LAWS counterparts, and are not reported for the sake of readability.
We draw two conclusions from Fig. 2. First, our biasreduced expectilebased estimators constructed from Formula (17) are at first quite conservative, but become very close to those of Vandewalle and Beirlant (2006) when the retention level is large, confirming the accuracy of our (biasreduced) extreme expectile estimators. Interestingly, the point estimate based on our proposed tail index estimator \({\widehat{\gamma }}_{k_n}^{\mathrm {E},\mathrm {RB}}\) is slightly less conservative than the others for the largest values of R that we consider. While policymakers and regulators would favour higher (i.e. more pessimistic) estimates such as those given by Vandewalle and Beirlant (2006), more optimistic (i.e. lower) assessments of risk may be interesting for insurance companies, because lower premiums paid by consumers translate into improved competitivity on insurance markets. Second, it is clearly seen that without the bias reduction scheme that we propose, the expectilebased estimates seem to be very poor and a long way off the other estimates we consider. This example therefore clearly emphasises the importance of the bias reduction methodology proposed in this paper as far as expectilebased estimation is concerned.
Approximation of the Gini index
We now showcase how our methodology can be applied in economics through the example of the estimation of the Gini index. This economic indicator measures the statistical dispersion (and therefore inequality) of income within a country: the Gini index of a country with n workers having respective incomes \(Y_1,\ldots ,Y_n\) is given by
A higher Gini index means higher inequality of income within the sampled population. Of course, in practice n is very large (of the order of millions, if not a billion) and income data is typically very sensitive, so to estimate the Gini index of a country using the above formula, it is generally the case that a representative survey of incomes representing all the categories of workers is carried out. Ensuring representativity in this context can be extremely difficult as well as time and labourintensive. In particular, it is the case that substantial leftcensoring or lefttruncation can be present, as it is reasonable to imagine that accurately sampling from the lowestpaid workers is hard, for example because of job unstability, or labour market law violations from employers including minimum wage underpayment or illegal employment of foreign workers. Accurately representing the left tail of the income distribution, which is key if the above definition of the Gini index is to be used, can therefore be a tall order.
An alternative solution putting more weight on the right tail is to model the distribution of income within a country by a heavytailed distribution (see for example Gardes and Girard 2021). This kind of approach has a long history in labour economics, see for instance Singh and Maddala (1976) and McDonald (1984). A particularly interesting model uses the Burr distribution with parameters \(\gamma \) (the tail index) and \(\rho \) (the secondorder parameter). It can then be shown that the Gini index should be
Here \(\varGamma (\cdot )\) is Euler’s Gamma function; see e.g. Chotikapanich and Griffiths (2000). This way of modelling the Gini index has the advantage to use only the right tail of the distribution of income, and is thus more robust to sampling inaccuracies in the left tail.
We propose here to estimate the Gini index for several countries using Formula (18) and the biasreduced tail index estimates, and to compare our results with official Gini indices calculated by intelligence or economic agencies. The countries and data considered are the following:

The synthetic Eurostat data set of incomes for Austria (\(n=5977\)), Belgium (\(n=6159\)), France (\(n=11{,}131\)), Finland (\(n=11{,}370\)), Greece (\(n=7439\)) and the Netherlands (\(n=10{,}131\)).

A data set of \(n=8156\) Italian incomes for the year 2014, available from the Bank of Italy.

A survey of \(n=9656\) wages in Namibia during the period 2009–2010, provided by the Namibia Statistics Agency.

A synthetic data set of \(n=11{,}315\) incomes in New Zealand during the year 2003, collected by the official Statistics New Zealand agency.

The Filipino Family Income and Expenditure data set of \(n=41{,}544\) incomes measured in the Philippines in 2015 by the Philippine Statistics Authority.

The Living Conditions Survey 2014–2015 in South Africa, containing \(n=19{,}286\) wages.
The Gini coefficient is estimated with
where the parametric form \(G(\gamma ,\rho )\) of the Gini index is defined in (18) and \({\overline{\rho }}\) is the secondorder parameter estimator we use throughout in our bias reduction scheme (see Sect. A of the Supplementary Material document). Our estimates are reported in Table 2, where they are compared with official Gini indices calculated by the World Bank, CIA and/or Eurostat as appropriate, as well as with their versions \({\widehat{G}} \left( {\widehat{\gamma }}_{k_n}^{\mathrm {H}} \right) \) and \({\widehat{G}}\left( {\widehat{\gamma }}_{k_n}^{\mathrm {E}}\right) \) that do not feature any bias reduction.
The first conclusion we can draw is that the bias reduction scheme applied to the asymptotic proportionality tail index estimator is very effective: the nonbiasreduced estimates are typically far from their biasreduced counterparts as well as from official estimates, and when the nonbiasreduced estimate is sensible (in the example of the Philippines and South Africa), the biasreduced estimate is not unreasonable either. The second conclusion is that the biasreduced estimates based on \({\widehat{\gamma }}_{k_n}^{\mathrm {E},\mathrm {RB}}\) are competitive and sometimes even closer to the average Gini index (computed using the average Gini indices from our official sources) than the estimate using the biasreduced Hill estimator. This shows that, if the Burr model is to be applied on this data set, it is likely that the biasreduced approach we propose is beneficial, and therefore our biasreduced asymptotic proportionality tail index estimator is a valuable resource in practical applications. Of course, the validity of the Burr model is itself an assumption; a more principled method, such as a censored likelihood approach, may in general fit the data better by virtue of being more flexible. Unlike the method adopted in this illustration, this would not allow one to plug in the tail index and secondorder parameter estimators in a simple fashion.
An analysis of financial returns
Our final real data example focuses on financial returns. The fact that expectiles can, in financial contexts, be interpreted in terms of the gainloss ratio, makes them interesting in portfolio management. Recently Bellini and Di Bernardino (2017) have shown the practical interest of estimating extreme expectiles of series of financial logreturns; in particular, they recommend the estimation of the expectile of level \(\beta =0.99855\), following their observation that it coincides with the quantile of level \(\beta '=0.99\) in the standard Gaussian case. In this example, we consider the series of the daily negative logreturns of the Commerzbank stock prices on the DAX30 stock exchange between March 6, 2012 and July 28, 2016, resulting in a sample \(Y_1,\ldots ,Y_n\) of size \(n=1{,}048\) plotted in the top left panel of Fig. 3. To reduce the serial dependence in the observations, we filter our time series using an ARMA(1, 1)GARCH(1, 1) model:
see Sect. 5.2 p. 100 of Francq and Zakoïan (2010). Here \(\mu , \phi ,\theta \in {\mathbb {R}}\) and \({\mathfrak {a}},{\mathfrak {b}},{\mathfrak {c}}>0\) are unknown coefficients, and \((\varepsilon _t)\) is an unobserved independent nonconstant white noise sequence, i.e. such that \(\mathbb {E}(\varepsilon )=0\), \(\mathbb {E}(\varepsilon ^2)=1\) and \(\mathbb {P}(\varepsilon ^2=1)<1\). When \(\phi ,\theta <1\), under suitable conditions, this model has a stationary, nonanticipative solution, see Theorem 2.4 p. 30 of Francq and Zakoïan (2010); this is in particular the case if \({\mathfrak {a}}+{\mathfrak {b}}<1\). In this case, at time t, \(\sigma _t\) is a function of the past of the process (up to time \(t1\)) only; we let \({\mathcal {F}}_n\) be the sigmaalgebra generated by the ARMAGARCH process up to time n. By positive homogeneity of expectiles, the conditional expectile for the next day given the observations up to time n is then
We estimate here an extreme conditional expectile for tomorrow given our knowledge of today, that is, the quantity \(\xi _{\beta _n}(Y_{n+1}{\mathcal {F}}_n)\) with \(\beta _n=\beta =0.99855\). We first estimate all the parameters and predict the residuals \({\widehat{u}}_i\) using the function garchFit in the R package fGarch (Wuertz 2020) and the option @residuals, and then we obtain predictions \({\widehat{\varepsilon }}_i\) of the innovations by fitting a pure GARCH model directly to the \({\widehat{u}}_i\) and applying garchFit(...)@residuals. Following the theory developed in Girard et al. (2021), we treat the residuals \({\widehat{\varepsilon }}_i\) from the model as independent and identically distributed copies of \(\varepsilon \) for the estimation of the tail index \(\gamma \) of \(\varepsilon \), and \(\xi _{\beta _n} (\varepsilon )\). The independence assumption was checked using a series of LjungBox independence tests on residuals and their squares, with the lowest p value being 0.39. Evidence that \(\varepsilon \) is indeed heavytailed is gathered in the two bottom panels of Fig. 3 using exponential QQplots of the logspacings. The estimated tail indices of the residuals are very similar: \({\widehat{\gamma }}_{k_n}^{\mathrm {H},\mathrm {RB}} \approx 0.321\) and \({\widehat{\gamma }}_{k_n}^{\mathrm {E},\mathrm {RB}} \approx 0.345\) (selected \(k_n\) values are \({\widehat{k}}_n^{\mathrm {H}}=81\) and \({\widehat{k}}_n^{\mathrm {E}}=49\) respectively). The garchFit routine also provides an estimate \({\widehat{\sigma }}_n\) of the conditional standard deviation on day n. For a prediction on day \(n+1\), we estimate the volatility by
This eventually yields the conditional extreme expectile estimates
Our key observation now is that, if \({\widehat{\mu }}\), \({\widehat{\phi }}\), \({\widehat{\theta }}\) and the past \((Y_t)_{t\le n1}\) of the process are considered as fixed, then \({\widehat{u}}_n= Y_n  {\widehat{\phi }} Y_{n1}  {\widehat{\theta }} {\widehat{u}}_{n1}  {\widehat{\mu }}\) is an affine function of \(Y_n\), and \({\widehat{\sigma }}_{n+1}^2\) is a quadratic function of \({\widehat{u}}_n\) and hence of \(Y_n\). Each of our extreme expectile estimates can therefore be considered as a function of the observation \(Y_n=y_n\); these functions are represented in the top right panel of Fig. 3 as a way of evaluating the influence of the nth observation on the dynamic extreme expectile prediction for the next day. Our estimates \({\widehat{\xi }}_{\beta _n}^{\star ,\mathrm {RB}}(Y_{n+1}{\mathcal {F}}_n)\) and \({\widetilde{\xi }}_{\beta _n}^{\star ,\mathrm {RB}}(Y_{n+1}{\mathcal {F}}_n)\) are calculated using either \({\widehat{\gamma }}_{k_n}^{\mathrm {H},\mathrm {RB}}\) or \({\widehat{\gamma }}_{k_n}^{\mathrm {E},\mathrm {RB}}\), and are compared to their counterpart using the standard, nonbiasreduced Weissman estimate \({\widehat{\xi }}_{\beta _n}^{\star }(Y_{n+1}{\mathcal {F}}_n)\) extrapolated with \({\widehat{\gamma }}_{k_n}^{\mathrm {H},\mathrm {RB}}\). They are also compared with dynamic biasreduced Weissman quantile estimates of level \(\beta '_n=\beta '=0.99\), calculated using either \({\widehat{\gamma }}_{k_n}^{\mathrm {H},\mathrm {RB}}\) or \({\widehat{\gamma }}_{k_n}^{\mathrm {E},\mathrm {RB}}\):
where the expression of \({\widetilde{q}}_{\beta '_n}^{\star ,\mathrm {RB}} (\varepsilon )\) (adapted here with the use of residuals) is given in Sect. 3.3.
Estimated expectiles are substantially larger than estimated quantiles; this justifies further the nonGaussian behaviour of the returns, and also means that a risk assessment based on expectiles using the guidelines provided by Bellini and Di Bernardino (2017) in the Gaussian case would be more conservative than if it were based on quantiles. The biasreduced expectile estimates give similar results, and the nonbiasreduced counterpart is visually larger, suggesting that bias plays some role in this example: the biasreduced estimates strike a middle ground between an assessment of risk provided by extreme quantiles, which would be liberal, and a conservative assessment of financial risk given by the nonbiasreduced extreme expectile estimates.
References
Artzner, P., Delbaen, F., Eber, J.M., Heath, D.: Coherent measures of risk. Math. Financ. 9(3), 203–228 (1999)
Beirlant, J., Goegebeur, Y., Segers, J., Teugels, J.: Statistics of Extremes: Theory and Applications. Wiley, New York (2004)
Bellini, F., Di Bernardino, E.: Risk management with expectiles. Eur. J. Finance 23(6), 487–506 (2017)
Bellini, F., Klar, B., Müller, A., Gianin, E.R.: Generalized quantiles as risk measures. Insur. Math. Econ. 54, 41–48 (2014)
Blostein, M., Miljkovic, T.: ltmix: LeftTruncated Mixtures of Gamma, Weibull, and Lognormal Distributions. R package version 0.2.0 (2019)
Caeiro, F., Gomes, M.I., Pestana, D.: Direct reduction of bias of the classical Hill estimator. Revstat 3(2), 113–136 (2005)
Cai, J.J., de Haan, L., Zhou, C.: Bias correction in extreme value statistics with index around zero. Extremes 16(2), 173–201 (2013)
Chotikapanich, D., Griffiths, W.E.: Posterior distributions for the Gini coefficient using grouped data. Aust. N. Z. J. Stat. 42(4), 383–392 (2000)
Daouia, A., Girard, S., Stupfler, G.: Estimation of tail risk based on extreme expectiles. J. R. Stat. Soc. B 80(2), 263–292 (2018)
Daouia, A., Girard, S., Stupfler, G.: Extreme Mquantiles as risk measures: from \(L^1\) to \(L^p\) optimization. Bernoulli 25(1), 264–309 (2019)
Daouia, A., Girard, S., Stupfler, G.: Tail expectile process and risk assessment. Bernoulli 26(1), 531–556 (2020)
Daouia, A., Girard, S., Stupfler, G.: ExpectHill estimation, extreme risk and heavy tails. J. Econom. 221(1), 97–117 (2021)
de Haan, L., Ferreira, A.: Extreme Value Theory: An Introduction. Springer, New York (2006)
Dickson, D.C.: Insurance risk and ruin. Cambridge University Press, Cambridge (2016)
Dutang, C., Charpentier, A.: CASdatasets: Insurance Datasets. R package version 1.010 (2019)
Ehm, W., Gneiting, T., Jordan, A., Krüger, F.: Of quantiles and expectiles: consistent scoring functions, choquet representations and forecast rankings. J. R. Stat. Soc. B 78(3), 505–562 (2016)
El Methni, J., Stupfler, G.: Extreme versions of Wang risk measures and their estimation for heavytailed distributions. Stat. Sin. 27(2), 907–930 (2017)
Embrechts, P., Klüppelberg, C., Mikosch, T.: Modelling Extremal Events for Insurance and Finance. Springer, Berlin (1997)
Francq, C., Zakoïan, J.M.: GARCH Models: Structure, Statistical Inference and Financial Applications. Wiley, New York (2010)
Gardes, L., Girard, S.: On the estimation of the variability in the distribution tail. TEST 30(4), 884–907 (2021)
Girard, S., Stupfler, G., UsseglioCarleve, A.: Extreme conditional expectile estimation in heavytailed heteroscedastic regression models. Ann. Stat. 49(6), 3358–3382 (2021)
Girard, S., Stupfler, G., UsseglioCarleve, A.: Nonparametric extreme conditional expectile estimation. Scand. J. Stat. 49(1), 78–115 (2022)
Gneiting, T.: Making and evaluating point forecasts. J. Am. Stat. Assoc. 106(494), 746–762 (2011)
Goegebeur, Y., Beirlant, J., de Wet, T.: Kernel estimators for the second order parameter in extreme value statistics. J. Stat. Plan. Inference 140(9), 2632–2652 (2010)
Gomes, M.I., Guillou, A.: Extreme value theory and statistics of univariate extremes: a review. Int. Stat. Rev. 83(2), 263–292 (2015)
Gomes, M.I., Pestana, D.: A sturdy reducedbias extreme quantile (VaR) estimator. J. Am. Stat. Assoc. 102(477), 280–292 (2007)
Gomes, M.I., Pestana, D., Caeiro, F.: A note on the asymptotic variance at optimal levels of a biascorrected Hill estimator. Stat. Probab. Lett. 79(3), 295–303 (2009)
Hall, P., Welsh, A.H.: Adaptive estimates of parameters of regular variation. Ann. Stat. 13(1), 331–341 (1985)
Hill, B.M.: A simple general approach to inference about the tail of a distribution. Ann. Stat. 3(5), 1163–1174 (1975)
Holzmann, H., Klar, B.: Expectile asymptotics. Electron. J. Stat. 10(2), 2355–2371 (2016)
Hua, L., Joe, H.: Second order regular variation and conditional tail expectation of multiple risks. Insur. Math. Econ. 49(3), 537–546 (2011)
Jones, M.C.: Expectiles and Mquantiles are quantiles. Stat. Probab. Lett. 20(2), 149–153 (1994)
Kaas, R., Goovaerts, M., Dhaene, J., Denuit, M.: Modern Actuarial Risk Theory—Using R, vol. 128. Springer, Berlin (2008)
Koenker, R., Bassett, G.: Regression quantiles. Econometrica 46(1), 33–50 (1978)
Krätschmer, V., Zähle, H.: Statistical inference for expectilebased risk measures. Scand. J. Stat. 44(2), 425–454 (2017)
Manjunath, B. G., Caeiro, F.: evt0: Mean of Order p, Peaks Over Random Threshold Hill and High Quantile Estimates. R package version 1.13 (2013)
McDonald, J.B.: Some generalized functions for the size distribution of income. Econometrica 52(3), 647–665 (1984)
Newey, W.K., Powell, J.L.: Asymmetric least squares estimation and testing. Econometrica 55(4), 819–847 (1987)
Resnick, S.: HeavyTail Phenomena: Probabilistic and Statistical Modeling. Springer, Berlin (2007)
Reynkens, T., Verbelen, R., Bardoutsos, A., Cornilly, D., Goegebeur, Y., Herrmann, K.: Reins: Functions from “Reinsurance: Actuarial and Statistical Aspects.” R package version 1, 10 (2020)
Singh, S.K., Maddala, G.S.: A function for size distribution of incomes. Econometrica 44(5), 963–970 (1976)
Sobotka, F., Kneib, T.: Geoadditive expectile regression. Comput. Stat. Data Anal. 56(4), 755–767 (2012)
Vandewalle, B., Beirlant, J.: On univariate extreme value statistics and the estimation of reinsurance premiums. Insur. Math. Econ. 38(3), 441–459 (2006)
Wang, S.: Insurance pricing and increased limits ratemaking by proportional hazards transforms. Insur. Math. Econ. 17(1), 43–54 (1995)
Wang, S.: Premium calculation by transforming the layer premium density. ASTIN Bull. 26(1), 71–92 (1996)
Weissman, I.: Estimation of parameters and large quantiles based on the \(k\) largest observations. J. Am. Stat. Assoc. 73(364), 812–815 (1978)
Wuertz, D.: fGarch: Rmetrics  Autoregressive Conditional Heteroskedastic Modelling. R Package Version 3042(83), 2 (2020)
Ziegel, J.F.: Coherence and elicitability. Math. Finance 26(4), 901–918 (2016)
Acknowledgements
This research is supported by the French National Research Agency under the ExtremReg Project, Grant ANR19CE400013. S. Girard also acknowledges the support of the Chair Stress Test, Risk Management and Financial Steering, led by the French Ecole Polytechnique and its Foundation and sponsored by BNP Paribas, as well as the support of the French National Research Agency in the framework of the Investissements d’Avenir program (ANR15IDEX02). G. Stupfler also acknowledges support from an AXA Research Fund Award on “Mitigating risk in the wake of the COVID19 pandemic”.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Girard, S., Stupfler, G. & UsseglioCarleve, A. On automatic bias reduction for extreme expectile estimation. Stat Comput 32, 64 (2022). https://doi.org/10.1007/s1122202210118x
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s1122202210118x
Keywords
 Asymmetric least squares
 Bias reduction
 Expectiles
 Extremes
 Extrapolation
 Heavy tails
 Secondorder parameter