The \(\alpha \)th expectile \(\xi _{\alpha }\) of an integrable random variable Y is defined as

$$\begin{aligned} \xi _{\alpha } = \underset{{\theta \in \mathbb {R}}}{{{\,\mathrm{arg~min}\,}}}\, \mathbb {E}(\eta _{\alpha }(Y-\theta )-\eta _{\alpha }(Y)), \end{aligned}$$

where \(\eta _{\alpha }(u)=|\alpha -\mathbb {1}\{ u\le 0 \}| u^2\) is the so-called expectile check function and \(\mathbb {1}\{ \cdot \}\) the indicator function. Expectiles are \(L^2-\)analogues of quantiles, which are obtained by minimising asymmetrically weighted mean absolute deviations (Koenker and Bassett 1978):

$$\begin{aligned} q_{\alpha } \in \underset{q\in \mathbb {R}}{{{\,\mathrm{arg~min}\,}}}\, \mathbb {E}(\rho _{\alpha }(Y-q)-\rho _{\alpha }(Y)), \end{aligned}$$

where \(\rho _{\alpha }(u)=|\alpha -\mathbb {1}\{ u\le 0 \}| |u|\) is the quantile check function. Expectiles, originally introduced by Newey and Powell (1987) in the context of testing for homoscedasticity and conditional symmetry of the error distribution in linear regression, are always uniquely defined by their convex optimisation problem unlike quantiles, for which uniqueness is only guaranteed if the underlying distribution function is strictly increasing. Expectiles satisfy

$$\begin{aligned} \alpha =\mathbb {E}( |Y-\xi _{\alpha }|\mathbb {1}\{ Y\le \xi _{\alpha }\} )/\mathbb {E}|Y-\xi _{\alpha }|. \end{aligned}$$

In particular, contrary to quantiles, expectiles are determined by tail expectations rather than tail probabilities.

Expectiles induce risk measures which have recently gained traction in the risk management context, for several axiomatic and practical reasons, including the fact that they induce the only risk measure, apart from the simple expectation, which is law-invariant, coherent (Artzner et al. 1999) and elicitable (Gneiting 2011), see Bellini et al. (2014) and Ziegel (2016). As such, a natural backtesting methodology exists for expectiles, in the sense that a so-called strictly consistent functional exists for expectiles and provides a properly justified decision-theoretic scoring function which allows one to rank expectile forecasts by their accuracy (see Theorem 10 in Gneiting 2011). Quantiles are indeed elicitable, but not coherent in general, and are often criticised for missing out on relevant information about distribution tails because their calculation only depends on the frequency of tail events and not on their sizes. The Expected Shortfall, meanwhile, takes into account the actual values of the risk variable on the tail event and is a coherent risk measure, but is not elicitable. In financial applications specifically, expectiles are linked through Formula (2) to the notion of gain-loss ratio, which is well known in the literature on no good deal valuation in incomplete markets and is a popular performance measure in portfolio management (see Bellini and Di Bernardino 2017, and references therein). Further axiomatic, theoretical and practical justification for the use of expectiles alongside or instead of the quantile and Expected Shortfall can be found in, among others, Ehm et al. (2016) and Bellini and Di Bernardino (2017).

Expectile estimation was first considered in Newey and Powell (1987) in the context of linear regression, and has been developing since then; recent contributions include Sobotka and Kneib (2012) as well as Holzmann and Klar (2016) and Krätschmer and Zähle (2017) for the estimation of central, non-tail expectiles of fixed order \(\alpha \). By contrast, probabilistic aspects of extreme expectiles, with \(\alpha \uparrow 1\), were first considered by Bellini et al. (2014) and later Bellini and Di Bernardino (2017). The estimation of extreme expectiles has been considered even more recently in Daouia et al. (2018, 2019, 2020), where it is shown that extreme expectiles can be estimated in several ways. The construction of each of the estimators uses a combination of the heavy-tailed distributional assumption (representing the tail structure of many financial and actuarial data examples fairly well, see e.g. p. 9 of Embrechts et al. 1997 and p. 1 of Resnick 2007) and a remarkable asymptotic proportionality relationship linking extreme expectiles to their quantile counterparts.

An inspection of the finite-sample results of Daouia et al. (2018, 2020) reveals that these estimators suffer from substantial finite-sample bias, even though this is not clear from the asymptotic normality results presented therein. This is of course an issue if expectiles are to be used widely in the management of extreme risk. A partial answer to this bias problem is presented in Girard et al. (2022) in a regression setup; however, the method of Girard et al. (2022) is designed to eliminate the source of bias due to the amount of tail heaviness (which has an important influence in the asymptotic proportionality relationship), but cannot handle the bias purely due to the second-order framework. It will therefore perform poorly when this particular source of bias dominates, that is, when the underlying heavy-tailed distribution is far from the standard Pareto distribution on which the extrapolation procedure is based. The problem of having to account for this discrepancy between the actual distribution and the ideal Pareto distribution in the right tail is well known and important in extreme value analysis, although historically, attention seems to have been mostly restricted to tail index estimation: a reasonably recent account of bias reduction methods in this context is given in Sect. 5.3 of Gomes and Guillou (2015).

The contribution of this paper is to provide a wide class of automatic, data-driven, second-order fully bias-reduced versions of the extreme expectile estimators currently available in the literature. This is done in three steps. First, we briefly recall the construction of extreme expectile estimators at a level \(\beta =\beta _n\uparrow 1\) as \(n\rightarrow \infty \), where n denotes sample size. This construction is based on the extrapolation of purely empirical expectile estimators at a much lower, intermediate level \(\alpha _n\uparrow 1\), with the help of an appropriate tail index estimator, and we highlight how bias may appear from tail index estimation and from the extrapolation procedure itself through a specific bias term. Second, in the Hall–Welsh subclass of heavy-tailed models (Hall and Welsh 1985) which contains most of the heavy-tailed distributions typically encountered in extreme value analysis, we provide estimators of this bias term that we then use to define versions of extrapolated extreme expectile estimators that are fully corrected for extrapolation bias. Third and last, we discuss the use of bias-reduced estimators of the tail index as a way to complete the elimination of the bias of our extrapolated estimators.

As we shall see in our simulation study, the expectile-based tail index estimator allows us to gain accuracy mostly in those difficult situations where the so-called second-order parameter is close to 0, that is, when the underlying heavy-tailed distribution is far from the standard Pareto distribution on which the extrapolation procedure is based. In this sense, we make substantial further gains compared to the partial bias-correction procedure discussed in Girard et al. (2022) that cannot handle this case. The combination of these second and third steps results in a fully bias-reduced class of extrapolated extreme expectile estimators in our heavy-tailed setting. To make these estimators completely automatic, we introduce a selection rule of the Asymptotic Mean Squared Error-optimal value of the tuning parameter \(\alpha _n\) representing the upper sample fraction used in our tail index and expectile estimators as well as in the extrapolation bias correction term. This results in estimators whose finite-sample performance is superior to that of previously considered estimators in the extreme expectile estimation literature, as we shall illustrate in our simulation study.

The paper is organised in the following way. Section 2 gives details on our estimation framework and reviews currently available extreme expectile estimators. Section 3 contains the main contributions of the paper on the bias-reduced estimation of extreme expectiles; all our methods and samples of real data are incorporated into the R package Expectrem, currently available at Section 4 examines the performance of our estimators with a simulation study. We finally illustrate the practical applicability of our procedures on real samples of economic, actuarial and financial data in Sect. 5. More details about the implementation of our methods, mathematical proofs and a complete set of numerical results from our simulation study are relegated to the Supplementary Material document.

State of the art on extreme expectile estimation

We start by describing the existing techniques in extreme expectile estimation. Suppose throughout that the available data \((Y_1,\ldots ,Y_n)\) is made of independent realisations of the random variable Y with cumulative distribution function F (resp. survival function \({\overline{F}}=1-F\)). It is assumed that \({\mathbb {E}}|Y|<\infty \), so that expectiles of Y of any order exist indeed. Our goal is to estimate an extreme expectile of Y, i.e. whose order tends to 1 as \(n\rightarrow \infty \).

Intermediate level    We start by the case of a so-called intermediate level \(\alpha _n\uparrow 1\), namely such that \(n(1-\alpha _n)\rightarrow \infty \) as \(n\rightarrow \infty \). Intermediate levels tend to infinity slowly enough that expectiles are well within the sample and can thus be estimated by purely empirical methods. It was observed by Jones (1994) that the \(\alpha _n\)th expectile is actually the quantile of level \(\alpha _n\) associated with the distribution function E defined by

$$\begin{aligned} {\overline{E}}(y) = 1-E(y) = \frac{{\mathbb {E}} ( ( Y-y ) \mathbb {1}_{\{ Y > y \}} )}{{\mathbb {E}} | Y-y |}. \end{aligned}$$

Recall that the quantile at level \(\alpha _n\) of the distribution function F is defined as \(q_{\alpha _n} = \inf \{ y\in {\mathbb {R}} \, | \, F(y)\ge \alpha _n \} = \inf \{ y\in {\mathbb {R}} \, | \, {\overline{F}}(y)\le 1-\alpha _n \}\). Intermediate quantiles of F may then be estimated by inverting the empirical survival function:

$$\begin{aligned}&{\widehat{q}}_{\alpha _n}= \inf \left\{ y \in {\mathbb {R}} \, | \, \widehat{{\overline{F}}}_n(y) \le 1-\alpha _n \right\} = Y_{n-\lfloor n(1-\alpha _n) \rfloor ,n} \\&\text {with } \ \widehat{{\overline{F}}}_n(y) = \frac{1}{n} \sum _{i=1}^n \mathbb {1}_{\{ Y_i>y \}}. \end{aligned}$$

Here \(Y_{1,n}\le Y_{2,n}\le \cdots \le Y_{n,n}\) are the order statistics associated with \((Y_1,\ldots ,Y_n)\) and \(\lfloor \cdot \rfloor \) denotes the floor function. We apply the same principle to the estimation of the intermediate expectile: replacing population averaging with sample averaging in the definition of the distribution function E results in the estimator

$$\begin{aligned}&{\widehat{\xi }}_{\alpha _n}= \inf \left\{ y \in {\mathbb {R}} \, | \, \widehat{{\overline{E}}}_n(y) \le 1- \alpha _n \right\} \\&\text {with } \ \widehat{{\overline{E}}}_n(y)= \frac{\sum _{i=1}^n (Y_i-y) \mathbb {1}_{\{ Y_i>y \}}}{\sum _{i=1}^n |Y_i-y|}. \end{aligned}$$

This estimator is an unconditional version of the intermediate conditional expectile estimator introduced in Girard et al. (2022). A straightforward calculation shows that this estimator is in fact also exactly the Least Asymmetrically Weighted Squares (LAWS) estimator studied in Daouia et al. (2018), that is, the unique solution of the empirical counterpart of the minimisation problem (1):

$$\begin{aligned} {\widehat{\xi }}_{\alpha _n} = \underset{\theta \in \mathbb {R}}{{{\,\mathrm{arg~min}\,}}}\, \sum _{i=1}^n \eta _{\alpha _n}(Y_i-\theta ). \end{aligned}$$

An alternative estimator can be found in the class of heavy-tailed distributions we shall focus on hereafter. Recall that the distribution of Y is heavy-tailed if and only if there exists \(\gamma >0\) such that

$$\begin{aligned}&\forall y>0, \ \lim _{t\rightarrow \infty } \dfrac{{\overline{F}}(ty)}{{\overline{F}}(t)} = y^{-1/\gamma } \\&\text{ or } \text{ equivalently } \lim _{t\rightarrow \infty }\frac{q_{1-(ty)^{-1}}}{q_{1-t^{-1}}} = y^{\gamma }. \end{aligned}$$

The tail index \(\gamma \) characterises the tail heaviness of the distribution of Y: if \(\gamma >a\) then \({\mathbb {E}}(Y^{1/a}\mathbb {1}{\{ Y>0 \}}) = \infty \) (a precise statement is Exercise 1.16 p. 35 in de Haan and Ferreira 2006). Our minimal working assumption throughout will therefore be that \(\gamma < 1\) and \({\mathbb {E}}(Y_-) < \infty \), where \(Y_-=\max (-Y,0)\), so as to ensure that \({\mathbb {E}}|Y|<\infty \) and thus that expectiles at any order exist indeed. In this case, we have the following asymptotic proportionality relationship between expectile and quantile:

$$\begin{aligned} \lim _{\alpha \uparrow 1} \frac{\xi _{\alpha }}{q_{\alpha }} = \left( \gamma ^{-1}-1\right) ^{-\gamma }. \end{aligned}$$

This was first noted by Bellini et al. (2014). This connection suggests the class of indirect estimators

$$\begin{aligned} {\widetilde{\xi }}_{\alpha _n}= & {} \left( {\overline{\gamma }}^{-1}-1\right) ^{-{\overline{\gamma }}} {\widehat{q}}_{\alpha _n} = \left( {\overline{\gamma }}^{-1}-1\right) ^{-{\overline{\gamma }}}\nonumber \\&\quad \times Y_{n-\lfloor n(1-\alpha _n) \rfloor ,n} \end{aligned}$$

where \({\overline{\gamma }}\) is a consistent estimator of \(\gamma \).

Extreme level    The problem of most relevance in extreme value analysis is to consider the case of a level \(\beta _n\uparrow 1\) such that \(n(1-\beta _n)\rightarrow c<\infty \) as \(n\rightarrow \infty \). In this situation, purely empirical methods are typically no longer consistent: for extreme quantile estimation, in the particular example \(\beta _n>1-1/n\), this can be seen by combining Theorem 1.1.6 p.10, Corollary 1.2.4 p.21 and Theorem 2.1.1 p.38 in de Haan and Ferreira (2006). One therefore has to use information about the tail of the data in order to construct an extrapolation procedure. In the context of expectile estimation, this is made possible by the heavy tail assumption and convergence (4): these entail

$$\begin{aligned} \frac{\xi _{\beta _n}}{\xi _{\alpha _n}} \approx \frac{q_{\beta _n}}{q_{\alpha _n}} \approx \left( \frac{1-\beta _n}{1-\alpha _n} \right) ^{-\gamma } \ \text{ as } n\rightarrow \infty . \end{aligned}$$

We call this approximation the Weissman approximation, after the work of Weissman (1978) on extreme quantile estimation. This justifies introducing the class of semiparametric extrapolating estimators

$$\begin{aligned} {\overline{\xi }}_{\beta _n}^{\star } = \left( \frac{1-\beta _n}{1-\alpha _n} \right) ^{-{\overline{\gamma }}} {\overline{\xi }}_{\alpha _n} \end{aligned}$$

where \({\overline{\xi }}_{\alpha _n}\) is any consistent estimator of \(\xi _{\alpha _n}\). One immediately deduces from (6) two subclasses of estimators, replacing \({\overline{\xi }}_{\alpha _n}\) by the LAWS estimator of \(\xi _{\alpha _n}\) or its indirect counterpart; in the latter, the estimator of \(\gamma \) can be chosen different from the estimator featured in the above extrapolation procedure, although we shall not pursue this here for the sake of simplicity. One may also construct a weighted combination of the LAWS-based and indirect estimators, as done in Daouia et al. (2021), although the finite-sample benefit of doing so can be marginal.

The extrapolated LAWS-based and indirect estimators unfortunately suffer from a sizeable amount of finite-sample bias. This is clear from, among others, Figs. 3 and 4 in Daouia et al. (2018), where it can be seen that even for the sample size \(n=1000\), and for certain distributions of interest in extreme value modelling, these estimators have a relative bias of the order of \(50\%\). This means that the estimator is on average \(50\%\) larger than the target extreme expectile. The contributions of this paper, which we gather in the next section, are a precise quantification of this bias using a standard second-order refinement of the heavy tail condition, and the introduction of automatic bias reduction procedures whose finite-sample performance will be examined in detail in Sects. 4 and 5.

Automatic bias reduction methodology for extreme expectile estimation

Rationale for our bias correction methods

The construction of the class of extrapolated estimators in (6) relies on the successive use of Eq. (4), in order to approximate a ratio of high expectiles by a ratio of high quantiles at corresponding levels, and an approximation of this ratio of high quantiles that is warranted by the heavy tail assumption. The magnitude of the bias of the extrapolated estimators will therefore be crucially driven by the rates of convergence and the error terms in these two approximations.

To simplify the exposition, we assume from now on that \(k_n = n(1-\alpha _n)\) is a sequence of positive integers, and we rewrite the assumptions \(\alpha _n\uparrow 1\) and \(n(1-\alpha _n)\rightarrow \infty \) as \(k_n\rightarrow \infty \) and \(k_n/n\rightarrow 0\). This choice is motivated by the fact that in quantile estimation, the quantity \(k_n\) denotes the effective sample size, i.e. the number of top order statistics eventually used for the estimation. Adopting this convention will make it easier to state and compare our results with existing results in the extreme value analysis of heavy tails. This is not a restriction in practice since our estimators of extreme expectiles, having order \(\beta _n\uparrow 1\) such that \(n(1-\beta _n)\rightarrow c<\infty \), are built on intermediate expectile estimators whose level \(\alpha _n\) we are free to choose, and integer values for \(k_n = n (1 - \alpha _n)\) induce a sufficient set of levels \(\alpha _n\) to work with.

Table 1 A list of standard continuous heavy-tailed distributions satisfying \({\mathcal {C}}_2(\gamma ,\rho ,A)\) with \(A(t)=b\gamma t^{\rho }\), with the associated values of \(\gamma \), \(\rho \) and b

A classical device in extreme value analysis for bias quantification is the following second-order regular variation condition that refines our initial heavy tail assumption.

Definition 1

(Class \({\mathcal {C}}_2(\gamma ,\rho ,A)\)). The survival function \({\overline{F}}\) is said to belong to the class \({\mathcal {C}}_2(\gamma ,\rho ,A)\) of second-order regularly varying functions with index \(-1/\gamma <0\), second-order parameter \(\rho \le 0\) and a measurable auxiliary function A having constant sign and converging to 0 at infinity, if

$$\begin{aligned} \ \lim _{t \rightarrow \infty } \frac{1}{A( 1/{\overline{F}}(t) )} \left( \frac{{\overline{F}} ( ty )}{{\overline{F}} ( t )}-y^{-1/\gamma }\right) =y^{-1/\gamma } \frac{y^{\rho /\gamma }-1}{\gamma \rho }, \end{aligned}$$

for all \(y>0\). Here and throughout the ratio \((y^a-1)/a\) should be read as \(\log y\) when \(a=0\).

An equivalent condition on the tail quantile function \(t\mapsto q_{1-t^{-1}}\) is that

$$\begin{aligned} \forall y>0, \ \lim _{t\rightarrow \infty }\frac{1}{A(t)} \left( \frac{q_{1-(ty)^{-1}}}{q_{1-t^{-1}}} - y^{\gamma } \right) = y^{\gamma } \frac{y^{\rho }-1}{\rho }. \end{aligned}$$

See de Haan and Ferreira (2006, Theorem 2.3.9 p. 48). Numerous examples of commonly used distributions that satisfy this assumption can be found in Beirlant et al. (2004).

The fundamental argument behind our methodology is that the direct and indirect extrapolated estimators, i.e.

$$\begin{aligned}&{\widehat{\xi }}_{\beta _n}^{\star } = \left( \frac{n(1-\beta _n)}{k_n} \right) ^{-{\overline{\gamma }}} {\widehat{\xi }}_{1-k_n/n} \\&\text{ and } \ {\widetilde{\xi }}_{\beta _n}^{\star } = \left( \frac{n(1-\beta _n)}{k_n} \right) ^{-{\overline{\gamma }}} ({\overline{\gamma }}^{-1}-1)^{-{\overline{\gamma }}} Y_{n-k_n,n} \end{aligned}$$

where \({\overline{\gamma }}\) is a \(\sqrt{k_n}-\)consistent estimator of \(\gamma \), satisfy

$$\begin{aligned} \nonumber&\log \left( \frac{{\widehat{\xi }}_{\beta _n}^{\star }}{\xi _{\beta _n}} \right) = ({\overline{\gamma }}-\gamma ) \log \left( \frac{k_n}{n(1-\beta _n)} \right) \\&\quad + \log \left( \frac{{\widehat{\xi }}_{1-k_n/n}}{\xi _{1-k_n/n}} \right) - \log \left( {\left( \frac{n(1-\beta _n)}{k_n} \right) ^{\gamma }} \frac{\xi _{\beta _n}}{\xi _{1-k_n/n}} \right) \end{aligned}$$
$$\begin{aligned} \nonumber \text{ and }&\log \left( \frac{{\widetilde{\xi }}_{\beta _n}^{\star }}{\xi _{\beta _n}} \right) = ({\overline{\gamma }}-\gamma ) \log \left( \frac{k_n}{n(1-\beta _n)} \right) \\ \nonumber&\quad + \log \left( \frac{({\overline{\gamma }}^{-1}-1)^{-{\overline{\gamma }}}}{(\gamma ^{-1}-1)^{-\gamma }} \right) + \log \left( \frac{Y_{n-k_n,n}}{q_{1-k_n/n}} \right) \\&\quad - \log \left( \left( \frac{n(1-\beta _n)}{k_n} \right) ^{\gamma } \left( \gamma ^{-1}-1 \right) ^{\gamma } \frac{\xi _{\beta _n}}{q_{1-k_n/n}} \right) . \end{aligned}$$

Under condition \({\mathcal {C}}_2(\gamma ,\rho ,A)\) and standard technical assumptions on \(k_n\) and \(\beta _n\), \({\widehat{\xi }}_{1-k_n/n}\) and \(Y_{n-k_n,n}\) have the same rate of convergence \(\sqrt{k_n}\), which is also the rate of convergence of the final (pure bias) nonrandom term, see e.g. Theorem 5 in Daouia et al. (2020) and its proof. Since \(\log ( k_n/(n(1-\beta _n)) ) \rightarrow \infty \), the first term dominates, leading to a common asymptotic distribution for \({\widehat{\xi }}_{\beta _n}^{\star }\) and \({\widetilde{\xi }}_{\beta _n}^{\star }\): if \(\sqrt{k_n} ({\overline{\gamma }}-\gamma ) {\mathop {\longrightarrow }\limits ^{\mathrm {d}}}\varGamma \), then

$$\begin{aligned} \nonumber \frac{\sqrt{k_n}}{\log ( k_n/(n(1-\beta _n)) )} \log \left( \frac{{\widehat{\xi }}_{\beta _n}^{\star }}{\xi _{\beta _n}}\right)&{\mathop {\longrightarrow }\limits ^{\mathrm {d}}}\varGamma \\ \text{ and } \ \frac{\sqrt{k_n}}{\log ( k_n/(n(1-\beta _n)) )} \log \left( \frac{{\widetilde{\xi }}_{\beta _n}^{\star }}{\xi _{\beta _n}}\right)&{\mathop {\longrightarrow }\limits ^{\mathrm {d}}}\varGamma . \end{aligned}$$

A naive bias-correction strategy would thus focus on the bias incurred in the estimation of \(\gamma \), identifiable through the asymptotic distribution \(\varGamma \). However, Eqs. (7) and (8) reveal another source of bias, which is the use of the Weissman approximation itself to control the final terms in Eqs. (7) and (8) [note that the second term in Eq. (7) and the third term in Eq. (8) are asymptotically unbiased, see Theorem 2 in Daouia et al. (2018) and Theorem 2.4.8 p. 52 in de Haan and Ferreira (2006)]. Our contribution hereafter is to design fully bias-reduced extreme expectile estimators by eliminating both of these sources of bias.

Construction of bias-reduced extreme expectile estimators

We construct bias-reduced versions of the extrapolated estimators \({\widehat{\xi }}_{\beta _n}^{\star }\) and \({\widetilde{\xi }}_{\beta _n}^{\star }\) in two steps. Our methods will feature estimators of the second-order parameter \(\rho \), but also estimators of the auxiliary function A. Estimating this function without making any further assumption can be a difficult task; however, for most of the distributions satisfying condition \({\mathcal {C}}_2(\gamma ,\rho ,A)\) used for modelling purposes, the function A takes the form \(A(t)=b \gamma t^{\rho }\), for a certain nonzero constant b and \(\rho <0\). We assume in what follows that the function A is indeed of this form, which amounts to assuming that the underlying distribution belongs to the Hall–Welsh class in the sense of Gomes and Pestana (2007). We give a list of examples of classical heavy-tailed distributions in Table 1, containing among others the distributions we shall work with in our simulation study, with their respective values of \(\gamma \), \(\rho \) and b. The results in Table 1 can be checked in a straightforward manner using a general result on the link between second-order regular variation and asymptotic expansions of the probability density function; see Lemma 1 in Sect. A of the Supplementary Material document.

The function A can then be estimated using consistent estimators \({\overline{\gamma }}\), \({\overline{b}}\) and \({\overline{\rho }}\) of \(\gamma \), b and \(\rho \), respectively. We assume in Sect. 3.2.1 that such estimators are given; we shall explain in detail in Sect. 3.2.2 which estimators \({\overline{\gamma }}\) we consider. The estimators \({\overline{b}}\) and \({\overline{\rho }}\) are calculated directly for all procedures considered in this paper using the R package evt0 (see Manjunath and Caeiro (2013) and Sect. A in the Supplementary Material document for a brief summary of how these estimators are constructed). We start by dealing with the bias due to the extrapolation procedure itself, contained in the final terms of Eqs. (7) and (8).

Bias due to the extrapolation procedure

We deal with the nonrandom bias term in Eq. (7), and we write

$$\begin{aligned}&\left( \frac{n(1-\beta _n)}{k_n} \right) ^{\gamma } \frac{\xi _{\beta _n}}{\xi _{1-k_n/n}} = \underbrace{\left( \frac{n(1-\beta _n)}{k_n} \right) ^{\gamma } \frac{q_{\beta _n}}{q_{1-k_n/n}}}_{{\normalsize 1+B_{1,n}}} \nonumber \\&\quad \times \underbrace{\left( \gamma ^{-1}-1 \right) ^{-\gamma } \frac{q_{1-k_n/n}}{\xi _{1-k_n/n}}}_{{\normalsize 1+B_{2,n}}} \times \underbrace{\left( \gamma ^{-1}-1 \right) ^{\gamma } \frac{\xi _{\beta _n}}{q_{\beta _n}}}_{{\normalsize 1+B_{3,n}}}. \end{aligned}$$

By Theorem 2.3.9 p. 48 in de Haan and Ferreira (2006), the bias term \(B_{1,n}\) can be written as

$$\begin{aligned} B_{1,n} = \frac{( n(1-\beta _n)/k_n )^{-\rho } -1}{\rho } A( n/k_n ) (1+{\text {o}}(1)). \end{aligned}$$

We now focus on the other two bias terms \(B_{2,n}\) and \(B_{3,n}\) linking an expectile to its quantile counterpart at intermediate and extreme levels, respectively. It follows from the proof of Proposition 1 in Daouia et al. (2018) that

$$\begin{aligned} \frac{{\overline{F}}( \xi _{\alpha } )}{1-\alpha }&= \left( \gamma ^{-1}-1 \right) ( 1+r(\alpha ) ) \\ \nonumber \text{ with } 1+r(\alpha )&= \left( 1-\frac{{\mathbb {E}}(Y)}{\xi _{\alpha }} \right) \frac{1}{2 \alpha -1} \\ \nonumber&\quad \times \left( 1 +A \left( \frac{1}{{\overline{F}}( \xi _{\alpha } )} \right) \frac{(1+{\text {o}}(1))}{\gamma (1-\gamma -\rho )} \right) ^{-1} \end{aligned}$$

as \(\alpha \uparrow 1\). Using Lemma 1 in Daouia et al. (2020) together with the heavy tail assumption then entails

$$\begin{aligned} \frac{\xi _{\alpha }}{q_{\alpha }}= & {} \left( \gamma ^{-1}-1 \right) ^{-\gamma } ( 1+r(\alpha ) )^{-\gamma } \nonumber \\&\quad \times \left( 1+ \frac{ \frac{\left( \gamma ^{-1}-1 \right) ^{-\rho }}{( 1+r(\alpha ) )^{\rho }} -1}{\rho } A \left( \frac{1}{1-\alpha } \right) (1+{\text {o}}(1)) \right) \nonumber \\ \end{aligned}$$

as \(\alpha \uparrow 1\). With \(\alpha =1-k_n/n\) and \(\alpha =\beta _n\), this yields

$$\begin{aligned}&1+ B_{2,n} = ( 1+r(1-k_n/n) )^{\gamma } \\&\quad \times \left( 1+ \frac{ \frac{\left( \gamma ^{-1}-1 \right) ^{-\rho }}{\left( 1+r \left( 1-\frac{k_n}{n} \right) \right) ^{\rho }} -1}{\rho } A \left( \frac{n}{k_n} \right) (1+{\text {o}}(1)) \right) ^{-1} \\ \text{ and }&1+ B_{3,n} = ( 1+r(\beta _n) )^{-\gamma } \\&\quad \times \left( 1+ \frac{ \frac{\left( \gamma ^{-1}-1 \right) ^{-\rho }}{( 1+r(\beta _n) )^{\rho }} -1}{\rho } A \left( \frac{1}{1-\beta _n} \right) (1+{\text {o}}(1)) \right) . \end{aligned}$$

Each of these bias terms can be estimated. Recall our assumption that \(A(t)=b \gamma t^{\rho }\), and estimate \(B_{1,n}\) by

$$\begin{aligned} {\overline{B}}_{1,n} = \frac{( n(1-\beta _n)/k_n )^{-{\overline{\rho }}} -1}{{\overline{\rho }}} \; {\overline{b}} \; {\overline{\gamma }} (n/k_n)^{{\overline{\rho }}}. \end{aligned}$$

Let further \({\overline{Y}}_n\) denote the sample mean of \(Y_1,\ldots ,Y_n\), \({\overline{\xi }}_{1-k_n/n}\) be either the LAWS or indirect intermediate expectile estimator, and \({\overline{\xi }}_{\beta _n}^{\star }\) be the related extrapolated estimator (in our current implementation we use the LAWS estimator for \({\overline{\xi }}_{1-k_n/n}\) and its extrapolated version for \({\overline{\xi }}_{\beta _n}^{\star }\)). The remainder terms \(r(1-k_n/n)\) and \(r(\beta _n)\) are estimated by

$$\begin{aligned} 1+{\overline{r}}(1-k_n/n)&= \left( 1-\frac{{\overline{Y}}_n}{{\overline{\xi }}_{1-k_n/n}} \right) \frac{1}{1-2k_n/n} \\&\quad \times \left( 1+\frac{{\overline{b}} (\widehat{{\overline{F}}}_n ( {\overline{\xi }}_{1-k_n/n} ))^{-{\overline{\rho }}}}{1-{\overline{\gamma }}-{\overline{\rho }}} \right) ^{-1} \\ \text{ and } \ 1+ {\overline{r}}^{\star }(\beta _n)&= \left( 1-\frac{{\overline{Y}}_n}{{\overline{\xi }}_{\beta _n}^{\star }} \right) \frac{1}{2\beta _n-1} \\&\quad \times \left( 1+\frac{{\overline{b}} \left( {\overline{\gamma }}^{-1}-1 \right) ^{-{\overline{\rho }}}}{1-{\overline{\gamma }}-{\overline{\rho }}} (1-\beta _n)^{-{\overline{\rho }}} \right) ^{-1}. \end{aligned}$$

This yields estimators of \(B_{2,n}\) and \(B_{3,n}\) as

$$\begin{aligned} 1+ {\overline{B}}_{2,n}&= ( 1+{\overline{r}}(1-k_n/n) )^{{\overline{\gamma }}} \\&\quad \times \left( 1+ \frac{ \frac{\left( {\overline{\gamma }}^{-1}-1 \right) ^{-{\overline{\rho }}}}{\left( 1+{\overline{r}} \left( 1-\frac{k_n}{n} \right) \right) ^{{\overline{\rho }}}} -1}{{\overline{\rho }}} {\overline{b}} {\overline{\gamma }} \left( \frac{n}{k_n} \right) ^{{\overline{\rho }}} \right) ^{-1} \\ \text{ and } 1+ {\overline{B}}_{3,n}&= ( 1+{\overline{r}}^{\star }(\beta _n) )^{-{\overline{\gamma }}} \\&\quad \times \left( 1+ \frac{ \frac{\left( {\overline{\gamma }}^{-1}-1 \right) ^{-{\overline{\rho }}}}{( 1+{\overline{r}}^{\star }(\beta _n) )^{{\overline{\rho }}}} -1}{{\overline{\rho }}} {\overline{b}} {\overline{\gamma }} (1-\beta _n)^{-{\overline{\rho }}} \right) . \end{aligned}$$

We deduce from (7) and (10) that a version of the direct extrapolated estimator \({\widehat{\xi }}_{\beta _n}^{\star }\), corrected for the bias exclusively due to the heavy-tailed extrapolation, is

$$\begin{aligned} \nonumber&{\widehat{\xi }}_{\beta _n}^{\star } (1+{\overline{B}}_{1,n})(1+{\overline{B}}_{2,n})(1+{\overline{B}}_{3,n}) \\ \nonumber&\quad = \left( \frac{n(1-\beta _n)}{k_n} \right) ^{-{\overline{\gamma }}} {\widehat{\xi }}_{1-k_n/n} \\&\qquad \times (1+{\overline{B}}_{1,n})(1+{\overline{B}}_{2,n})(1+{\overline{B}}_{3,n}). \end{aligned}$$

A correction for the extrapolation bias in the indirect estimator is simpler. Indeed,

$$\begin{aligned}&\left( \frac{n(1-\beta _n)}{k_n} \right) ^{\gamma } \left( \gamma ^{-1}-1 \right) ^{\gamma } \frac{\xi _{\beta _n}}{q_{1-k_n/n}} \nonumber \\&\quad =\underbrace{\left( \frac{n(1-\beta _n)}{k_n} \right) ^{\gamma } \frac{q_{\beta _n}}{q_{1-k_n/n}}}_{{\normalsize 1+B_{1,n}}} \times \underbrace{\left( \gamma ^{-1}-1 \right) ^{\gamma } \frac{\xi _{\beta _n}}{q_{\beta _n}}}_{{\normalsize 1+B_{3,n}}}. \end{aligned}$$

From (14), a version of the indirect extrapolated estimator \({\widetilde{\xi }}_{\beta _n}^{\star }\), corrected for the bias exclusively due to the heavy-tailed extrapolation, is then

$$\begin{aligned} \nonumber&{\widetilde{\xi }}_{\beta _n}^{\star } (1+{\overline{B}}_{1,n})(1+{\overline{B}}_{3,n}) \\ \nonumber&\quad = \left( \frac{n(1-\beta _n)}{k_n} \right) ^{-{\overline{\gamma }}} \times \left( {\overline{\gamma }}^{-1}-1 \right) ^{-{\overline{\gamma }}} Y_{n-k_n,n} \\&\qquad \times (1+{\overline{B}}_{1,n})(1+{\overline{B}}_{3,n}). \end{aligned}$$

The methodology introduced here differs from the earlier bias reduction technique introduced in Girard et al. (2022) in a regression setup. In Girard et al. (2022), the bias term proportional to \(A(n/k_n)\) is not corrected because it is very difficult to correct accurately this source of bias in the conditional, nonparametric setup on which that paper focuses. In addition, the correction terms in the aforementioned paper rely on linearising the bias terms, whereas we keep the structure of the bias as intact as possible. This makes our correction term \(B_{2,n}\) more accurate than in this earlier attempt. The inclusion of the term \(B_{3,n}\) is also new; while it could be expected that this term only has a small influence because it relies on quantities calculated at a higher asymptotic order, it is our experience that its inclusion substantially improves finite-sample performance.

We now concentrate on reducing the bias due to the estimation of \(\gamma \). Combined with the general corrections in (13) and (15), this will result in a fully bias-corrected extrapolated estimator.

Bias reduction for tail index estimation: an expectile-based method

Numerous tail index estimators have been introduced and studied in the literature; a review of some of the most important estimators is given in de Haan and Ferreira (2006, Chapter 3). There are various techniques for the reduction of bias of such estimators, an excellent summary being given in the Introduction of Cai et al. (2013). Here our contribution is to propose a bias-reduced version of a purely expectile-based tail index estimator, our procedure being partly inspired by a method developed in, among others, Caeiro et al. (2005). To make the construction of this estimator easier, we start by briefly recalling how the technique of Caeiro et al. (2005) works. Consider the classical Hill estimator (Hill 1975):

$$\begin{aligned} {\widehat{\gamma }}_{k_n}^{\mathrm {H}} = \frac{1}{k_n} \sum _{i=1}^{k_n} \log \frac{Y_{n-i+1,n}}{Y_{n-k_n,n}}. \end{aligned}$$

It is known that under \({\mathcal {C}}_2(\gamma ,\rho ,A)\), and if, in addition, \(\sqrt{k_n} A(n/k_n)\rightarrow \lambda \in \mathbb {R}\), then (see Theorem 3.2.5 p. 74 in de Haan and Ferreira 2006)

$$\begin{aligned} \sqrt{k_n} \left( {\widehat{\gamma }}_{k_n}^{\mathrm {H}} - \gamma \right) {\mathop {\longrightarrow }\limits ^{\mathrm {d}}}{\mathcal {N}}\left( \frac{\lambda }{1-\rho }, \gamma ^2 \right) . \end{aligned}$$

In finite samples \(\lambda \approx \sqrt{k_n} A(n/k_n) = \sqrt{k_n} b \gamma (n/k_n)^{\rho }\), meaning that the pseudo-estimator (depending on the true unknown values of b and \(\rho \))

$$\begin{aligned} {\widehat{\gamma }}_{k_n}^{\mathrm {H}} \left( 1-\frac{b}{1-\rho } \left( \frac{n}{k_n} \right) ^{\rho } \right) \end{aligned}$$

should be asymptotically unbiased with the same variance as the Hill estimator. Caeiro et al. (2005) then plug in consistent estimators \({\overline{b}}\) of b and \({\overline{\rho }}\) of \(\rho \) and arrive at the bias-reduced Hill estimator

$$\begin{aligned} {\widehat{\gamma }}_{k_n}^{\mathrm {H},\mathrm {RB}} = {\widehat{\gamma }}_{k_n}^{\mathrm {H}} \left( 1-\frac{{\overline{b}}}{1-{\overline{\rho }}} \left( \frac{n}{k_n} \right) ^{{\overline{\rho }}} \right) . \end{aligned}$$

Theorem 3.1 of Caeiro et al. (2005) shows that \({\widehat{\gamma }}_{k_n}^{\mathrm {H},\mathrm {RB}}\) is indeed \(\sqrt{k_n}-\)asymptotically Gaussian with expectation zero and variance \(\gamma ^2\). The construction of this estimator essentially hinges on eliminating the bias by multiplying the original estimator by a quantity cancelling this bias.

We adapt here this construction to propose a bias-reduction procedure for the estimator

$$\begin{aligned} {\widehat{\gamma }}_{k_n}^{\mathrm {E}}= \left( 1+ \frac{n \widehat{{\overline{F}}}_n( {\widehat{\xi }}_{1-k_n/n} )}{k_n} \right) ^{-1}. \end{aligned}$$

The rationale behind this estimator, studied in a different context by Girard et al. (2022), is that, from (11),

$$\begin{aligned} \gamma = \left( 1+ \frac{{\overline{F}}( \xi _{\alpha } )}{1-\alpha } \frac{1}{1+r(\alpha )} \right) ^{-1} \approx \left( 1+ \frac{{\overline{F}}( \xi _{\alpha } )}{1-\alpha } \right) ^{-1} \end{aligned}$$

as \(\alpha \uparrow 1\). To find a bias-reduced version of this asymptotic proportionality tail index estimator \({\widehat{\gamma }}_{k_n}^{\mathrm {E}}\), we follow the above idea and use the sample counterpart \({\overline{r}}(1-k_n/n)\) of \(r(1-k_n/n)\) defined in Sect. 3.2.1: this yields a bias-reduced asymptotic proportionality tail index estimator as

$$\begin{aligned} {\widehat{\gamma }}_{k_n}^{\mathrm {E},\mathrm {RB}} = \left( 1+ \frac{n \widehat{{\overline{F}}}_n( {\widehat{\xi }}_{1-k_n/n} )}{k_n} \frac{1}{1+{\overline{r}}(1-k_n/n)} \right) ^{-1}. \end{aligned}$$

In our current implementation we take \({\overline{\xi }}_{1-k_n/n} = {\widehat{\xi }}_{1-k_n/n}\) and \({\overline{\gamma }} = {\widehat{\gamma }}_{k_n}^{\mathrm {H},\mathrm {RB}}\) inside \({\overline{r}}(1-k_n/n)\), specifically for the calculation of this bias-reduced version.

Our first main theoretical result gives the asymptotic normality and unbiasedness of \({\widehat{\gamma }}_{k_n}^{\mathrm {E},\mathrm {RB}}\).

Theorem 1

Suppose that \({\mathbb {E}}|Y_{-}|^{2+\delta }<\infty \) for some \(\delta >0\). Assume further that \({\mathcal {C}}_2(\gamma ,\rho ,A)\) holds with \(0<\gamma <1/2\), \(\rho <0\) and \(A(t)=b \gamma t^{\rho }\), and let \(k_n\) be a sequence such that \(k_n \rightarrow \infty \) and \(k_n/n \rightarrow 0\) as \(n \rightarrow \infty \). If \(\sqrt{k_n} A (n/k_n) \rightarrow \lambda _1 \in {\mathbb {R}}\), \(\sqrt{k_n}/q(1-k_n/n) \rightarrow \lambda _2 \in {\mathbb {R}}\), and \({\overline{\gamma }}\), \({\overline{\rho }}\) and \({\overline{b}}\) are consistent estimators of \(\gamma \), \(\rho \) and b such that \(({\overline{\rho }}-\rho ) \log (n)={\text {o}}_{{\mathbb {P}}}(1)\), then

$$\begin{aligned} \sqrt{k_n} ( {\widehat{\gamma }}_{k_n}^{\mathrm {E},\mathrm {RB}} - \gamma ) {\mathop {\longrightarrow }\limits ^{\mathrm {d}}}{\mathcal {N}} \left( 0, \frac{\gamma ^3 (1-\gamma )}{1-2\gamma } \right) . \end{aligned}$$

We note that the asymptotic variance in Theorem 1 is an unconditional version of the asymptotic variance found for a conditional analogue of \({\widehat{\gamma }}_{k_n}^{\mathrm {E}}\), see Theorem 4 in Girard et al. (2022).

We provide a comparison of the two tail index estimators \({\widehat{\gamma }}_{k_n}^{\mathrm {E},\mathrm {RB}^{}}\) and \({\widehat{\gamma }}_{k_n}^{\mathrm {H},\mathrm {RB}}\) in terms of variance in Fig. 1. The asymptotic variance of \({\widehat{\gamma }}_{k_n}^{\mathrm {E},\mathrm {RB}}\) is substantially smaller than that of \({\widehat{\gamma }}_{k_n}^{\mathrm {H},\mathrm {RB}}\) when \(\gamma \) is less than 0.35. The variance of \({\widehat{\gamma }}_{k_n}^{\mathrm {E},\mathrm {RB}}\) explodes, however, as \(\gamma \uparrow 1/2\), which is to be expected since the estimators \({\widehat{\gamma }}_{k_n}^{\mathrm {E}}\) and \({\widehat{\gamma }}_{k_n}^{\mathrm {E},\mathrm {RB}}\) are based on the intermediate LAWS estimator \({\widehat{\xi }}_{1-k_n/n}\), itself known to be asymptotically normal only when \(\gamma <1/2\) (see Daouia et al. 2018). In our implementation (and in particular in the calculation of the \({\overline{B}}_{j,n}\) in Sect. 3.2.1), one can choose either \({\overline{\gamma }}={\widehat{\gamma }}_{k_n}^{\mathrm {H},\mathrm {RB}}\) or \({\overline{\gamma }}={\widehat{\gamma }}_{k_n}^{\mathrm {E},\mathrm {RB}}\).

Fig. 1
figure 1

Asymptotic variances of \({\widehat{\gamma }}_{k_n}^{\mathrm {H},\mathrm {RB}}\) (black curve) and \({\widehat{\gamma }}_{k_n}^{\mathrm {E},\mathrm {RB}}\) (red curve) as functions of \(\gamma \in (0,1/2)\). (Color figure online)

Fully bias-reduced estimators and their asymptotic properties

Our final, fully bias-reduced estimators are members of the following two classes. The first class, which extrapolates the intermediate LAWS estimator, is derived from Eq. (13) and is defined as

$$\begin{aligned} {\widehat{\xi }}_{\beta _n}^{\star ,\mathrm {RB}}= & {} \left( \frac{n(1-\beta _n)}{k_n} \right) ^{-{\overline{\gamma }}} {\widehat{\xi }}_{1-k_n/n} \\&\times (1+{\overline{B}}_{1,n})(1+{\overline{B}}_{2,n})(1+{\overline{B}}_{3,n}) \end{aligned}$$

where \({\overline{\gamma }}\) is any bias-reduced estimator of the tail index \(\gamma \), and \({\overline{B}}_{1,n}\), \({\overline{B}}_{2,n}\) and \({\overline{B}}_{3,n}\) are defined in Sect. 3.2.1. The second class, based on extrapolating the indirect estimator, is derived from Eq. (15) and is

$$\begin{aligned}&{\widetilde{\xi }}_{\beta _n}^{\star ,\mathrm {RB}} = \left( \frac{n(1-\beta _n)}{k_n} \right) ^{-{\overline{\gamma }}} \left( {\overline{\gamma }}^{-1}-1 \right) ^{-{\overline{\gamma }}} Y_{n-k_n,n} \\&\quad \times (1+{\overline{B}}_{1,n})(1+{\overline{B}}_{3,n}) \end{aligned}$$

where again \({\overline{\gamma }}\) is any bias-reduced estimator of the tail index \(\gamma \). It should be noted that this second class of estimators has a nice interpretation in terms of a bias-reduced version for the Weissman extreme quantile estimator (Weissman 1978; Gomes and Pestana 2007): this estimator is, with our notation,

$$\begin{aligned} {\widetilde{q}}_{\beta _n}^{\star ,\mathrm {RB}} = \left( \frac{n(1-\beta _n)}{k_n} \right) ^{-{\overline{\gamma }}} Y_{n-k_n,n} \times \big (1+{\overline{B}}_{1,n}\big ). \end{aligned}$$

It follows that an equivalent expression for \({\widetilde{\xi }}_{\beta _n}^{\star ,\mathrm {RB}}\) is

$$\begin{aligned} {\widetilde{\xi }}_{\beta _n}^{\star ,\mathrm {RB}} = \left( {\overline{\gamma }}^{-1}-1 \right) ^{-{\overline{\gamma }}} {\widetilde{q}}_{\beta _n}^{\star ,\mathrm {RB}} \times \big (1+{\overline{B}}_{3,n}\big ). \end{aligned}$$

In other words, this estimator is obtained by using the asymptotic proportionality relationship (4) at level \(\alpha =\beta _n\), plugging in bias-reduced estimators of the tail index and extreme quantile involved, and finally correcting directly for the bias incurred using this proportionality relationship at the level \(\beta _n\) only.

We briefly explore the asymptotic properties of these bias-reduced estimators \({\widehat{\xi }}_{\beta _n}^{\star ,\mathrm {RB}}\) and \({\widetilde{\xi }}_{\beta _n}^{\star ,\mathrm {RB}}\). It was highlighted in Sect. 3.1 [see limit (9)] that extrapolated expectile estimators have a limiting distribution controlled by their tail index estimator. This is also true for their bias-reduced versions, as the following result shows.

Theorem 2

Assume that \({\mathcal {C}}_2(\gamma ,\rho ,A)\) holds with \(\rho <0\) and \(A(t)=b \gamma t^{\rho }\), and let \(k_n\), \(\beta _n\) be two sequences such that \(k_n \rightarrow \infty \), \(k_n/n \rightarrow 0\), \(n(1-\beta _n)/k_n \rightarrow 0\) and \(\log ( k_n/(n(1-\beta _n)) ) / \sqrt{k_n} \rightarrow 0\) as \(n \rightarrow \infty \). Assume further that \(\sqrt{k_n} A (n/k_n) \rightarrow \lambda _1 \in {\mathbb {R}}\), \(\sqrt{k_n}/q(1-k_n/n) \rightarrow \lambda _2 \in {\mathbb {R}}\), \(\sqrt{k_n} ({\overline{\gamma }}-\gamma ) {\mathop {\longrightarrow }\limits ^{\mathrm {d}}}\varGamma \) where \(\varGamma \) is a nondegenerate distribution, and \({\overline{\rho }}\) and \({\overline{b}}\) are consistent estimators of \(\rho \) and b such that \(({\overline{\rho }}-\rho ) \log (n)={\text {o}}_{{\mathbb {P}}}(1)\).

  1. (i)

    If \({\mathbb {E}}|Y_-|^2<\infty \) and \(0<\gamma <1/2\), then the extrapolated LAWS estimator satisfies

    $$\begin{aligned} \displaystyle \frac{\sqrt{k_n}}{\log (k_n/(n(1-\beta _n)))} \left( \frac{{\widehat{\xi }}_{\beta _n}^{\star ,\mathrm {RB}}}{\xi _{\beta _n}}-1 \right) {\mathop {\longrightarrow }\limits ^{\mathrm {d}}}\varGamma . \end{aligned}$$
  2. (ii)

    If \({\mathbb {E}}|Y_-|<\infty \) and \(0<\gamma <1\), then the extrapolated indirect estimator satisfies

    $$\begin{aligned} \displaystyle \frac{\sqrt{k_n}}{\log (k_n/(n(1-\beta _n)))} \left( \frac{{\widetilde{\xi }}_{\beta _n}^{\star ,\mathrm {RB}}}{\xi _{\beta _n}}-1 \right) {\mathop {\longrightarrow }\limits ^{\mathrm {d}}}\varGamma . \end{aligned}$$

It follows from our Theorem 2 and Corollaries 3 and 4 in Daouia et al. (2018) that the bias-reduced extrapolated expectile estimators have the same rates of convergence and weak limits as their standard counterparts. Note that the indirect estimator is applicable in a wider range of situations since it only requires a finite first moment of Y, contrary to the extrapolated LAWS estimator which essentially requires a finite second moment of Y.

We shall illustrate in Sect. 4 that, even though the bias-reduced versions have the same asymptotic properties as their standard counterparts, they generally have much better finite-sample properties. Before that, we explain how to select the important tuning parameter \(k_n\) appearing in the estimation of the tail index, intermediate expectile level, and bias correction terms.

Choice of the intermediate level \(k_n\)

The choice of the sequence \(k_n\) is a crucial point: a low \(k_n\) translates into a large variance, and a high \(k_n\) translates into a large bias. Choosing \(k_n\) therefore leads to solving a trade-off between the bias and variance of the tail index estimator to be used. In order to find the right balance, de Haan and Ferreira (2006, pp. 77–82) proposed a choice of \(k_n\) for the Hill estimator as a minimiser of an estimate of its Asymptotic Mean Squared Error. We propose to develop our own selection rule for \(k_n\), to be used in conjunction with the expectile-based bias-reduced asymptotic proportionality estimator \({\widehat{\gamma }}_{k_n}^{\mathrm {E},\mathrm {RB}}\), based on the ordinary asymptotic proportionality estimator \({\widehat{\gamma }}_{k_n}^{\mathrm {E}}\) also presented in Sect. 3.2.2. For this estimator, it holds that

$$\begin{aligned}&\sqrt{k_n} \left( {\widehat{\gamma }}_{k_n}^{\mathrm {E}}-\gamma -\frac{\gamma ( \gamma ^{-1}-1 )^{1-\rho }}{1-\gamma -\rho } A (n/k_n) \right. \\&\quad \left. -\frac{\gamma ^2 ( \gamma ^{-1}-1 )^{\gamma +1} {\mathbb {E}}(Y)}{q(1-k_n/n)} \right) {\mathop {\longrightarrow }\limits ^{\mathrm {d}}}{\mathcal {N}} \left( 0, \frac{\gamma ^3(1-\gamma )}{1-2\gamma } \right) . \end{aligned}$$

See Proposition 1 in Sect. B of the Supplementary Material document. Consequently, there are two sources of bias in the expectile-based estimator \({\widehat{\gamma }}_{k_n}^{\mathrm {E}}\): one proportional to \(A( n/k_n )\), having order \((n/k_n)^{\rho }\), and another proportional to \(1/q( 1-k_n/n )\), having order \((n/k_n)^{-\gamma }\). The leading term of bias will thus depend on \(\rho \) and \(\gamma \). The second source of bias, proportional to \(1/q( 1-k_n/n )\), can very accurately be eliminated; indeed, its expression only features \(\gamma \), \({\mathbb {E}}(Y)\) and \(q( 1-k_n/n )\), of which we have good estimators that converge at the rate \(\sqrt{k_n}\) or more. By contrast, the first source of bias features second-order parameters from the distribution of Y, whose estimators converge slowly (see e.g. p. 298 in Gomes et al. 2009 and p. 2638 in Goegebeur et al. 2010), and thus is more difficult to remove. In practice, this means that the trade-off to be solved when using the expectile-based asymptotic proportionality tail index estimator will essentially be between the bias due to the second-order quantity \(A( n/k_n )\) and the variance of the estimator. This gives us the idea of minimising the Partial Asymptotic Mean Squared Error

$$\begin{aligned}&\mathrm {PAMSE}(k_n) \\&\quad = \left( \frac{\gamma ( \gamma ^{-1}-1 )^{1-\rho }}{1-\gamma -\rho } A (n/k_n) \right) ^2 + \frac{\gamma ^3(1-\gamma )}{1-2\gamma } \times \frac{1}{k_n} \\&\qquad \propto \frac{b^2 ( \gamma ^{-1}-1 )^{1-2\rho }}{(1-\gamma -\rho )^2} \times \left( \frac{n}{k_n} \right) ^{2\rho } \\&\qquad + \frac{1}{1-2\gamma } \times \frac{1}{k_n}. \end{aligned}$$

It is readily checked that this function of \(k_n\) has a unique minimum, and that (viewing \(\mathrm {PAMSE}\) as a differentiable function of a single real variable) cancelling its first derivative leads to

$$\begin{aligned} k_n^{\mathrm {E}} = \left( \frac{ ( \gamma ^{-1}-1 )^{2 \rho -1} ( 1-\gamma -\rho )^2}{-2 \rho b^2 (1-2\gamma )} \right) ^{1/(1-2\rho )} n^{-2\rho /(1-2\rho )}. \end{aligned}$$

This optimal value depends on the unknown \(\gamma \), \(\rho \) and b; in practice we use the estimated value \({\widehat{k}}_n^{\mathrm {E}} = \)

$$\begin{aligned}&\min \left( \left\lfloor \left( \frac{ ( {\overline{\gamma }}^{-1}-1 )^{2 {\overline{\rho }}-1} ( 1-{\overline{\gamma }}-{\overline{\rho }} )^2}{-2 {\overline{\rho }} {\overline{b}}^2 (1-2{\overline{\gamma }})} \right) ^{1/(1-2{\overline{\rho }})}\right. \right. \nonumber \\&\times \left. \left. n^{- \frac{2{\overline{\rho }}}{1-2{\overline{\rho }}}} \right\rfloor , \left\lfloor \frac{n}{2} \right\rfloor -1 \right) . \end{aligned}$$

In Eq. (16), the estimators \({\overline{\rho }}\) and \({\overline{b}}\) are the same as in Sect. 3.2, and \({\overline{\gamma }}\) is the bias-reduced Hill estimator \({\widehat{\gamma }}_{k_n}^{\mathrm {H},\mathrm {RB}}\) with the corresponding estimated AMSE-optimal choice of \(k_n\), that is

$$\begin{aligned} {\widehat{k}}_n^{\mathrm {H}} = \left\lfloor \left( \frac{(1-{\overline{\rho }})^2}{-2 {\overline{\rho }} {\overline{b}}^2} \right) ^{1/(1-2 {\overline{\rho }})} n^{-2 {\overline{\rho }}/(1-2 {\overline{\rho }})} \right\rfloor . \end{aligned}$$

The fact that we force our selected \({\widehat{k}}_n^{\mathrm {E}}\) to be less than \(\lfloor n/2 \rfloor \) is due to the presence of the multiplicative term \((1-2k_n/n)^{-1}\) in our bias reduction methodology (featuring in \({\overline{r}}(1-k_n/n)\)). Since \(n^{-2\rho /(1-2\rho )} = {\text {o}}(n)\) for any \(\rho <0\), this restriction disappears with arbitrarily high probability as \(n\rightarrow \infty \). We recommend this choice \({\widehat{k}}_n^{\mathrm {E}}\) when the expectile-based asymptotic proportionality estimator \({\widehat{\gamma }}_{k_n}^{\mathrm {E},\mathrm {RB}}\) is used.

The Expectrem package

We have implemented our methods in an R package called Expectrem, freely downloadable at This package contains the following functions and data sets relevant to this paper:

  • Basic functions for the estimation: Fbarhat returns the empirical estimator of the survival function, and expect provides the empirical LAWS expectile estimator at a given level. Basic population expectile calculations: enorm, et, elog, epareto, egpd and eburr respectively return the expectiles of the normal, Student, logistic, Pareto, Generalised Pareto (GP) and Burr distributions.

  • Tail index estimation: The estimator \({\widehat{\gamma }}_{k_n}^{\mathrm {E}}\) is implemented in the function tindexp with argument br=FALSE (default). If br=TRUE, the bias-reduced estimator \({\widehat{\gamma }}_{k_n}^{\mathrm {E},\mathrm {RB}}\) is returned. The other optional argument is the intermediate level k, set at \(k={\widehat{k}}_n^{\mathrm {E}}\) by default.

  • Extreme expectile estimation: The direct and indirect extreme expectile estimators \({\widehat{\xi }}_{\beta _n}^{\star }\) and \({\widetilde{\xi }}_{\beta _n}^{\star }\) are computed in the function extExpect with arguments method="direct" and method="indirect" respectively, and br=FALSE (default). If br=TRUE, the bias-reduced estimators \({\widehat{\xi }}_{\beta _n}^{\star ,\mathrm {RB}}\) and \({\widetilde{\xi }}_{\beta _n}^{\star ,\mathrm {RB}}\) are returned instead. Argument estim="Hill" (default) calls the Hill estimator (function mop in the package evt0), and estim="tindexp" calls tindexp; setting br=TRUE calls the bias-reduced versions of these estimators. The choice of \(k_n\) is also an option through k, and by default \(k_n={\widehat{k}}_n^{\mathrm {H}}\) (if estim="Hill") or \(k_n={\widehat{k}}_n^{\mathrm {E}}\) (if estim="tindexp").

  • Extreme quantile estimation: The estimator \({\widetilde{q}}_{\beta _n}^{\star }\) is computed in the function extQuant with argument br=FALSE (default). If br=TRUE, the bias-reduced estimator \({\widetilde{q}}_{\beta _n}^{\star ,\mathrm {RB}}\) is returned instead. The other arguments are those of extExpect.

  • Data sets: austria, belgium, commerzbank, finland, france, greece, italy, namibia, netherlands, newzealand, secura and southafrica, as described in Sect. 5, with URLs pointing to the sources for these data sets.

Simulation study

We study the finite-sample performance of our estimators on simulated data in order to assess the importance of bias reduction in extreme expectile estimation. For that purpose and in order to get a good overview of practical performance, we consider the following heavy-tailed distributions for Y (see Table 1):

  • A Burr distribution with tail index \(\gamma >0\) and second-order parameter \(\rho <0\), i.e. \({\overline{F}}(y)=( 1+y^{-\rho /\gamma } )^{1/\rho }\) for \(y>0\). The interesting point here is that the choices of \(\gamma \) and \(\rho \) are free, meaning that for a fixed \(\gamma \) we can make the second-order parameter \(\rho \) vary in order to generate scenarios with various degrees of difficulty in the estimation. We consider here \(\rho =-5,-1,-0.5\), corresponding respectively to an easy, medium, and hard estimation problem.

  • A Generalised Pareto Distribution (GPD) with tail index \(\gamma >0\) and unit scale, i.e. \({\overline{F}}(y)= ( 1+\gamma y )^{-1/\gamma }\) for \(y>0\). Here \(\rho =-\gamma \). For \(\gamma \) close to 0, \(\rho \) will then also be close to 0, meaning that we expect the estimation problem to be difficult when the data are generated from this distribution.

For each of these distributions (three Burr distributions and one GPD), we consider the cases \(\gamma =0.1,0.2,0.3,0.4\). This gives 16 cases in total. In each case, we simulate \(N=1000\) data sets \((Y_1,\ldots ,Y_n)\) of \(n=1{,}000\) independent realisations of Y, with survival distribution function \({\overline{F}}\). We estimate the expectile of level \(\beta _n=1-5/n=0.995\) using five methodologies:

  1. (i)

    The bias-reduced extrapolated LAWS estimator \({\widehat{\xi }}_{\beta _n}^{\star ,\mathrm {RB}}\) with the expectile-based, bias-reduced tail index estimator \({\widehat{\gamma }}_{k_n}^{\mathrm {E},\mathrm {RB}}\),

  2. (ii)

    The bias-reduced extrapolated indirect estimator \({\widetilde{\xi }}_{\beta _n}^{\star ,\mathrm {RB}}\) with the expectile-based, bias-reduced tail index estimator \({\widehat{\gamma }}_{k_n}^{\mathrm {E},\mathrm {RB}}\),

  3. (iii)

    The bias-reduced extrapolated LAWS estimator \({\widehat{\xi }}_{\beta _n}^{\star ,\mathrm {RB}}\) with the bias-reduced Hill estimator \({\widehat{\gamma }}_{k_n}^{\mathrm {H},\mathrm {RB}}\),

  4. (iv)

    The bias-reduced extrapolated indirect estimator \({\widetilde{\xi }}_{\beta _n}^{\star ,\mathrm {RB}}\) with the bias-reduced Hill estimator \({\widehat{\gamma }}_{k_n}^{\mathrm {H},\mathrm {RB}}\),

  5. (v)

    (As a benchmark) The extrapolated LAWS estimator \({\widehat{\xi }}_{\beta _n}^{\star }\) (without bias reduction) with the bias-reduced Hill estimator \({\widehat{\gamma }}_{k_n}^{\mathrm {H},\mathrm {RB}}\).

Comparing methods (i) and (iii) on the one hand, and (ii) and (iv) on the other hand, allows us to see the influence of the choice of tail index estimator. Comparing (iii), (iv) and (v) makes it possible to assess the benefit of the bias reduction method in the expectile extrapolation. To get a further idea of the difference between using the bias-reduced tail index estimator \({\widehat{\gamma }}_{k_n}^{\mathrm {E},\mathrm {RB}}\) introduced in the current paper and the bias-reduced Hill estimator \({\widehat{\gamma }}_{k_n}^{\mathrm {H},\mathrm {RB}}\), we also record the values of \({\widehat{\gamma }}_{k_n}^{\mathrm {E},\mathrm {RB}}\) and \({\widehat{\gamma }}_{k_n}^{\mathrm {H},\mathrm {RB}}\).

We assess finite-sample performance by computing the following quantities:

  • The relative bias, variance and mean-squared error of the extrapolated expectile estimators. For the bias-reduced extrapolated LAWS estimator \({\widehat{\xi }}_{\beta _n}^{\star ,\mathrm {RB}}\), that is

    $$\begin{aligned}&{\text {RBias}}({\widehat{\xi }}_{\beta _n}^{\star ,\mathrm {RB}}) = \frac{1}{N} \sum _{j=1}^N \left( \frac{{\widehat{\xi }}_{\beta _n}^{\star ,\mathrm {RB},(j)}}{\xi _{\beta _n}} - 1 \right) \\&\text{ and } {\text {RMSE}}({\widehat{\xi }}_{\beta _n}^{\star ,\mathrm {RB}}) = \frac{1}{N} \sum _{j=1}^N \left( \frac{{\widehat{\xi }}_{\beta _n}^{\star ,\mathrm {RB},(j)}}{\xi _{\beta _n}} - 1 \right) ^2 \end{aligned}$$

    (where \({\widehat{\xi }}_{\beta _n}^{\star ,\mathrm {RB},(j)}\) is calculated on the jth sample) and \({\text {RVar}}=\text{ RMSE }-{\text {RBias}}^2\) is the relative variance. Similarly for the indirect estimator \({\widetilde{\xi }}_{\beta _n}^{\star ,\mathrm {RB}}\) and the non-bias-reduced estimator \({\widehat{\xi }}_{\beta _n}^{\star }\).

  • The (classical) bias, variance and mean-squared error of the estimators \({\widehat{\gamma }}_{k_n}^{\mathrm {E},\mathrm {RB}}\) and \({\widehat{\gamma }}_{k_n}^{\mathrm {H},\mathrm {RB}}\).

All these quantities are calculated for \(k_n\) chosen to be (in each sample) the selected value \({\widehat{k}}_n^{\mathrm {E}}\) or \({\widehat{k}}_n^{\mathrm {H}}\) as appropriate, depending on whether the estimator \({\widehat{\gamma }}_{k_n}^{\mathrm {E},\mathrm {RB}}\) or \({\widehat{\gamma }}_{k_n}^{\mathrm {H},\mathrm {RB}}\) is used. The results, referring to our experiments with the Burr distribution with second-order parameter \(\rho =-5, -1, -0.5\) and the Generalised Pareto distribution, are reported in Tables C.1, C.2, C.3 and C.4 respectively (see Sect. C of the Supplementary Material document). For each value of \(k_n\in \{2,3,\ldots ,450\}\), we also record, and report, median expectile and tail index estimates across all N replicates, along with the corresponding log-mean-squared errors, in Figs. C.1, C.2, C.3, C.4 and C.5 in this same Sect. C of the Supplementary Material document.

We conclude from this simulation study that, on our tested cases, the bias reduction scheme is very effective: as a consequence, the RMSE of the bias-reduced estimators is often one and sometimes two orders of magnitude lower than the RMSE of the standard extrapolated estimators. This is true across a wide range of values of \(k_n\), as shown in Figs. C.1, C.2 and C.3, where it is seen that overall the bias-reduced estimators based on \({\widehat{\gamma }}_{k_n}^{\mathrm {E},\mathrm {RB}}\) seem to have an advantage in terms of bias, while those based on \({\widehat{\gamma }}_{k_n}^{\mathrm {H},\mathrm {RB}}\) have a lower MSE overall. There does not seem to be a clear winner between (bias-reduced) direct and indirect estimators. Similar results, not reported here for the sake of brevity, were observed when estimating extreme expectiles at the more extreme level \(1-1/(2n)=0.9995\).

It also appears (from considering the case of the Burr distribution with \(\rho =-0.5\) and the GPD distribution) that the bias-reduced estimator \({\widehat{\gamma }}_{k_n}^{\mathrm {E},\mathrm {RB}}\) is particularly interesting, including in terms of MSE, for values of \(\rho \) close to 0, and is competitive otherwise; note that for large \(|\rho |\), the Burr distribution gets very close to the Pareto distribution for which the Hill estimator is the Maximum Likelihood estimator and known to be optimal, so it is not reasonable to expect that the estimator \({\widehat{\gamma }}_{k_n}^{\mathrm {E},\mathrm {RB}}\) would be more accurate than the bias-reduced Hill estimator in such cases. The estimator \({\widehat{\gamma }}_{k_n}^{\mathrm {E},\mathrm {RB}}\) appears to be a useful complement to available tail index estimation devices.

As a complement, and in order to evaluate specifically the influence of a change in the bulk of the distribution upon the compared estimators, we examine the following model for Y:

  • A Fisher distribution with \((\nu _1,\nu _2)\) degrees of freedom, i.e. having probability density function

    $$\begin{aligned} f(t) = \dfrac{(\nu _1/\nu _2)^{\nu _1/2}}{B(\nu _1/2,\nu _2/2)} t^{\nu _1/2-1}(1+\nu _1 t/\nu _2)^{-(\nu _1+\nu _2)/2} \end{aligned}$$

    for \(t>0\). This distribution has tail index \(\gamma =2/\nu _2\), which varies in \(\{ 0.1,0.2,0.3,0.4 \}\), and second-order parameter \(\rho =-2/\nu _2\). Here, even though the tail index and second-order parameter are constant in \(\nu _1\), the actual bias component is not because b is a nontrivial function of \(\nu _1\), see Table 1. We consider here \(\nu _1 = 1,2,5,10\), with smaller values of \(\nu _1\) corresponding to larger positive values of b and hence to larger amounts of bias (note that the quantity \(b=b(\nu _1,\nu _2)\) diverges to infinity as \(\nu _1\downarrow 0\) when \(\nu _2\) is fixed).

We again simulate \(N=1000\) data sets \((Y_1,\ldots ,Y_n)\) of \(n=1000\) independent realisations of Y and we estimate the expectile of level \(\beta _n=1-5/n=0.995\). For each value of \(k_n\in \{2,3,\ldots ,450\}\), we record and report median expectile estimates across all N replicates in Figs. C.6 and C.7, see Sect. C of the Supplementary Material document. It is readily seen that our proposed estimators provide a very substantial improvement upon the standard, not bias-reduced alternative. The indirect estimator \({\widetilde{\xi }}_{\beta _n}^{\star ,\mathrm {RB}}\) based on \({\widehat{\gamma }}_{k_n}^{\mathrm {E},\mathrm {RB}}\) seems to perform especially well, even when \(\nu _1\le 2\). This difficult setup corresponds to those situations when the shape of the Fisher distribution is very dissimilar to the shape of the Pareto distribution upon which the extrapolation methodology is based. In such cases where the bias is very large, using the bias-reduced expectile-based tail index estimator \({\widehat{\gamma }}_{k_n}^{\mathrm {E},\mathrm {RB}}\), rather than the bias-reduced Hill estimator \({\widehat{\gamma }}_{k_n}^{\mathrm {H},\mathrm {RB}}\), appears to be particularly beneficial. It was pointed out to us by a referee that in each tested case (Burr, GPD, Fisher), the bias component b was positive; we defer to future research the question of studying the performance of the proposed bias reduction approaches depending on the sign of b.

Applications on real data

We apply our methodology to three data sets, from insurance, economics, and finance, as a way to illustrate the applicability of our expectile and tail index estimators.

Reinsurance premium estimation

Reinsurance is a very important way of mitigating risk associated with high-impact events such as extreme climate episodes. Reinsurance contracts typically involve two insurance companies A and B; by the terms of the contract, company A transfers to company B (totally or partially) the risk associated to events involving large claims. Here we focus on the case when risk is totally transferred, which is also called excess-of-loss reinsurance. Under such a policy, when a claim occurs, company A pays the claim amount up to a certain amount R decided in the reinsurance contract, called retention level, and company B underwrites all losses above that amount R. In other words, if the total claim amount is Y, company A pays \(\min (Y,R)\), and company B pays \(\max (Y-R,0)\). A crucial task to decide the terms of a reinsurance contract is to accurately price this contract, which leads to the calculation of the so-called reinsurance premium. A first, natural approach to do this is to use the net premium principle (see e.g. Chapters 4 and 5 in Kaas et al. 2008), namely

$$\begin{aligned} \varPi (R)={\mathbb {E}}( \max (Y-R,0) )=\int _R^{\infty } {\overline{F}}(x)\mathrm{{d}}x, \end{aligned}$$

where \({\overline{F}}\) is the survival function of Y. However, paying company B this net average premium would not protect that company from a catastrophic loss much higher than its average value. A solution to this problem developed in the actuarial literature over the last 25 years has been to consider more conservative premium principles, including the distorted premiums introduced in Wang (1996):

$$\begin{aligned} \varPi _g(R)=\int _R^{\infty } g({\overline{F}}(x))\mathrm{{d}}x, \end{aligned}$$

where \(g : [0,1] \rightarrow [0,1]\) is a nondecreasing concave function such that \(g(0)=0\) and \(g(1)=1\), called the distortion function. The choice \(g(x)=x\) leads to the net premium principle, but there are several reasonable ways of choosing a function g leading to a more conservative (i.e. higher) premium, such as the Dual Power function, or the Proportional Hazards function (we refer to, among others, Wang 1995 and Chapter 3 of Dickson 2016).

Since in reinsurance the retention level R should be considered as a high (and therefore rarely observed) level of claim amount, the calculation of the reinsurance premium is very closely linked to the right tail of the distribution of the claim amount Y. This motivated Vandewalle and Beirlant (2006) to develop an extreme value theory-based method for the estimation of \(\varPi _g(R)\) (a different, somewhat linked theory for Wang distortion risk measures conditionally on the loss being high is provided in El Methni and Stupfler 2017). In particular, they proved that if Y is heavy-tailed with tail index \(\gamma \) and \(g(1/\cdot )\) is regularly varying with tail index \(\delta \), i.e. \(g(1/(ty))/g(1/t) \rightarrow y^{\delta }\) as \(t \rightarrow \infty \), where \(\delta<-\gamma <0\), then the following limiting relationship holds between the distorted premium, the retention level and the probability of exceeding that level:

$$\begin{aligned} \lim _{R\rightarrow +\infty } \frac{\varPi _g(R)}{R \,g ( {\overline{F}} (R) )} = \frac{1}{-\delta \gamma ^{-1}-1}. \end{aligned}$$

An interesting version of that relationship is found when \(R=\xi _{\beta }\) for \(\beta \uparrow 1\). In that case, using Eq. (11), we get

$$\begin{aligned} \lim _{\beta \uparrow 1} \frac{\varPi _g(\xi _{\beta })}{ \xi _{\beta }\,g ( 1-\beta )} = \frac{(\gamma ^{-1}-1)^{-\delta } }{-\delta \gamma ^{-1}-1}. \end{aligned}$$

In the case of the net premium, \(g(x)=x\), so \(\delta =-1\) and we find \(\varPi _g(\xi _{\beta }) = \varPi (\xi _{\beta }) \sim (1-\beta ) \xi _{\beta }\); in this asymptotic equivalence, the tail index \(\gamma \) does not appear anymore, and the expectile is in some sense an asymptotic inverse of the function \(R\mapsto \varPi (R)/R\) that represents the proportion of the retention level R paid on average per claim by company B (in other words, if \(\varPi (R)/R = \pi \), then company B contributes \(\pi R\) to the payment of the average claim).

Given this relevance of expectiles to premium calculation for large claims, we propose to estimate the reinsurance premium \(\varPi _g(R)\) (for a large retention level R) using Eq. (17) and our bias-reduced extreme expectile estimation methodology. For that purpose, we consider the well-known Secura Belgian Re data used in Vandewalle and Beirlant (2006), available in our package Expectrem as well as several other R packages such as ReIns (Reynkens et al. 2020), CASdatasets (Dutang and Charpentier 2019) and ltmix (Blostein and Miljkovic 2019). This data set contains \(n=370\) inflation-adjusted automobile claim amounts (from 1988 to 2001), larger than . We consider two premium principles: the net premium (\(g(x)=x\)) and Dual Power (\(g(x)=1-(1-x)^{\kappa }\)) principles, and to allow us to compare our results with those of Vandewalle and Beirlant (2006), we take \(\kappa =1.366\). This choice was already recommended in Wang (1996). We estimate the associated premiums \(\varPi _g(R)\) with the statistic

$$\begin{aligned} {\widehat{\varPi }}_g^{\star }(\xi _{\beta }) = \frac{({\overline{\gamma }}^{-1}-1)^{-\delta }}{-\delta {\overline{\gamma }}^{-1}-1} {\widehat{\xi }}_{\beta }^{\star ,\mathrm {RB}}\,g ( 1-\beta ). \end{aligned}$$

Here \({\widehat{\xi }}_{\beta }^{\star ,\mathrm {RB}}\) denotes the bias-reduced LAWS extrapolated estimator calculated using either our bias-reduced asymptotic proportionality tail index estimator \({\widehat{\gamma }}_{k_n}^{\mathrm {E},\mathrm {RB}}\) or its bias-reduced Hill counterpart \({\widehat{\gamma }}_{k_n}^{\mathrm {H},\mathrm {RB}}\), and \({\overline{\gamma }}\) is taken to be the same tail index estimator as the one used in the extrapolation. We also compare this estimator with the one in which the bias-reduced extreme expectile estimator is replaced by the standard, non-bias-reduced extrapolated LAWS estimator calculated with \({\widehat{\gamma }}_{k_n}^{\mathrm {H},\mathrm {RB}}\) (and thus \({\overline{\gamma }}={\widehat{\gamma }}_{k_n}^{\mathrm {H},\mathrm {RB}}\) everywhere). The estimated reinsurance premiums are represented in Fig. 2 on a fine grid of values of \(\beta _n\); this yields curves of estimates of \(\varPi _g(R)\) in the tail region \(R\rightarrow \infty \), that we compare to the premium curve obtained by Vandewalle and Beirlant (2006). Indirect estimators are here almost identical to their LAWS counterparts, and are not reported for the sake of readability.

Fig. 2
figure 2

Secura Belgian Re insurance data, estimated distorted premium \(\varPi _g(R)\) as function of the retention level \(R=\xi _{\beta }\) for \(\beta \) ranging from \(1-10/n \approx 0.973\) to \(1-1/(8n) \approx 0.9997\) (here \(n=370\)). The premiums are estimated using \({\widehat{\xi }}_{\beta _n}^{\star ,\mathrm {RB}}\) and \({\widehat{\gamma }}_{k_n}^{\mathrm {H},\mathrm {RB}}\) (solid blue curve), \({\widehat{\xi }}_{\beta _n}^{\star ,\mathrm {RB}}\) and \({\widehat{\gamma }}_{k_n}^{\mathrm {E},\mathrm {RB}}\) (solid red curve) and \({\widehat{\xi }}_{\beta _n}^{\star }\) and \({\widehat{\gamma }}_{k_n}^{\mathrm {H},\mathrm {RB}}\) (dotted blue curve). The black curve is constructed by linear interpolation using the estimates found in Vandewalle and Beirlant (2006). Left panel: net premium principle (\(g(x)=x\)). Right panel: Dual Power principle (\(g(x)=1-(1-x)^{\kappa }\)) with \(\kappa =1.366\). (Color figure online)

We draw two conclusions from Fig. 2. First, our bias-reduced expectile-based estimators constructed from Formula (17) are at first quite conservative, but become very close to those of Vandewalle and Beirlant (2006) when the retention level is large, confirming the accuracy of our (bias-reduced) extreme expectile estimators. Interestingly, the point estimate based on our proposed tail index estimator \({\widehat{\gamma }}_{k_n}^{\mathrm {E},\mathrm {RB}}\) is slightly less conservative than the others for the largest values of R that we consider. While policymakers and regulators would favour higher (i.e. more pessimistic) estimates such as those given by Vandewalle and Beirlant (2006), more optimistic (i.e. lower) assessments of risk may be interesting for insurance companies, because lower premiums paid by consumers translate into improved competitivity on insurance markets. Second, it is clearly seen that without the bias reduction scheme that we propose, the expectile-based estimates seem to be very poor and a long way off the other estimates we consider. This example therefore clearly emphasises the importance of the bias reduction methodology proposed in this paper as far as expectile-based estimation is concerned.

Approximation of the Gini index

We now showcase how our methodology can be applied in economics through the example of the estimation of the Gini index. This economic indicator measures the statistical dispersion (and therefore inequality) of income within a country: the Gini index of a country with n workers having respective incomes \(Y_1,\ldots ,Y_n\) is given by

$$\begin{aligned} {\overline{G}}= \frac{\sum _{i=1}^n \sum _{j=1}^n |Y_i-Y_j|}{2 n \sum _{i=1}^n Y_i}. \end{aligned}$$

A higher Gini index means higher inequality of income within the sampled population. Of course, in practice n is very large (of the order of millions, if not a billion) and income data is typically very sensitive, so to estimate the Gini index of a country using the above formula, it is generally the case that a representative survey of incomes representing all the categories of workers is carried out. Ensuring representativity in this context can be extremely difficult as well as time- and labour-intensive. In particular, it is the case that substantial left-censoring or left-truncation can be present, as it is reasonable to imagine that accurately sampling from the lowest-paid workers is hard, for example because of job unstability, or labour market law violations from employers including minimum wage underpayment or illegal employment of foreign workers. Accurately representing the left tail of the income distribution, which is key if the above definition of the Gini index is to be used, can therefore be a tall order.

An alternative solution putting more weight on the right tail is to model the distribution of income within a country by a heavy-tailed distribution (see for example Gardes and Girard 2021). This kind of approach has a long history in labour economics, see for instance Singh and Maddala (1976) and McDonald (1984). A particularly interesting model uses the Burr distribution with parameters \(\gamma \) (the tail index) and \(\rho \) (the second-order parameter). It can then be shown that the Gini index should be

$$\begin{aligned} G=G(\gamma ,\rho )=1-\frac{\varGamma ( -1/\rho ) \varGamma ( (\gamma -2)/\rho )}{\varGamma ( -2/\rho ) \varGamma ( (\gamma -1)/\rho )}. \end{aligned}$$

Here \(\varGamma (\cdot )\) is Euler’s Gamma function; see e.g. Chotikapanich and Griffiths (2000). This way of modelling the Gini index has the advantage to use only the right tail of the distribution of income, and is thus more robust to sampling inaccuracies in the left tail.

We propose here to estimate the Gini index for several countries using Formula (18) and the bias-reduced tail index estimates, and to compare our results with official Gini indices calculated by intelligence or economic agencies. The countries and data considered are the following:

  • The synthetic Eurostat data set of incomes for Austria (\(n=5977\)), Belgium (\(n=6159\)), France (\(n=11{,}131\)), Finland (\(n=11{,}370\)), Greece (\(n=7439\)) and the Netherlands (\(n=10{,}131\)).

  • A data set of \(n=8156\) Italian incomes for the year 2014, available from the Bank of Italy.

  • A survey of \(n=9656\) wages in Namibia during the period 2009–2010, provided by the Namibia Statistics Agency.

  • A synthetic data set of \(n=11{,}315\) incomes in New Zealand during the year 2003, collected by the official Statistics New Zealand agency.

  • The Filipino Family Income and Expenditure data set of \(n=41{,}544\) incomes measured in the Philippines in 2015 by the Philippine Statistics Authority.

  • The Living Conditions Survey 2014–2015 in South Africa, containing \(n=19{,}286\) wages.

The Gini coefficient is estimated with

$$\begin{aligned}&{\widehat{G}}\left( {\widehat{\gamma }}_{k_n}^{\mathrm {H},\mathrm {RB}}\right) = G\left( {\widehat{\gamma }}_{k_n}^{\mathrm {H},\mathrm {RB}},{\overline{\rho }} \right) \\&\text{ and } {\widehat{G}}\left( {\widehat{\gamma }}_{k_n}^{\mathrm {E},\mathrm {RB}} \right) = G \left( {\widehat{\gamma }}_{k_n}^{\mathrm {E},\mathrm {RB}},{\overline{\rho }} \right) , \end{aligned}$$

where the parametric form \(G(\gamma ,\rho )\) of the Gini index is defined in (18) and \({\overline{\rho }}\) is the second-order parameter estimator we use throughout in our bias reduction scheme (see Sect. A of the Supplementary Material document). Our estimates are reported in Table 2, where they are compared with official Gini indices calculated by the World Bank, CIA and/or Eurostat as appropriate, as well as with their versions \({\widehat{G}} \left( {\widehat{\gamma }}_{k_n}^{\mathrm {H}} \right) \) and \({\widehat{G}}\left( {\widehat{\gamma }}_{k_n}^{\mathrm {E}}\right) \) that do not feature any bias reduction.

Table 2 Estimated Gini indices per country using the estimators \({\widehat{G}}({\widehat{\gamma }}_{k_n}^{\mathrm {H},\mathrm {RB}})\) (reported along with \({\widehat{G}}({\widehat{\gamma }}_{k_n}^{\mathrm {H}})\) between brackets) and \({\widehat{G}}({\widehat{\gamma }}_{k_n}^{\mathrm {E},\mathrm {RB}})\) (reported along with \({\widehat{G}}({\widehat{\gamma }}_{k_n}^{\mathrm {H}})\) between brackets), and official Gini indices (reported along with year of publication)
Fig. 3
figure 3

Commerzbank daily log-returns data between March 6, 2012 and July 28, 2016, sample size \(n=1048\). In the top left panel, daily (positive) log-returns (black curve) and ARMA-GARCH log-volatility estimates \(t\mapsto \log {\widehat{\sigma }}_t\) (red bold curve, smoothed using the R function smooth.spline with smoothing parameter \(\lambda =5\times 10^{-7}\)). In the top right panel, dynamic extreme expectile (level \(\beta _n=0.99855 \approx 1-2/n\)) and quantile (level \(\beta '_n=0.99\)) estimates on day \(n+1\) given past observations as functions of \(Y_n=y_n\): estimates \({\widehat{\xi }}_{\beta _n}^{\star ,\mathrm {RB}}(Y_{n+1}|{\mathcal {F}}_n)\) and \({\widetilde{\xi }}_{\beta _n}^{\star ,\mathrm {RB}}(Y_{n+1}|{\mathcal {F}}_n)\) computed with \({\widehat{\gamma }}_{k_n}^{\mathrm {H},\mathrm {RB}}\) (respectively solid and dotted black curves) and corresponding estimates computed with \({\widehat{\gamma }}_{k_n}^{\mathrm {E},\mathrm {RB}}\) (respectively solid and dotted blue curves), estimate \({\widehat{\xi }}_{\beta _n}^{\star }(Y_{n+1}|{\mathcal {F}}_n)\) computed with \({\widehat{\gamma }}_{k_n}^{\mathrm {H},\mathrm {RB}}\) (green curve), estimate \({\widetilde{q}}_{\beta _n'}^{\star ,\mathrm {RB}}(Y_{n+1}|{\mathcal {F}}_n)\) computed with \({\widehat{\gamma }}_{k_n}^{\mathrm {H},\mathrm {RB}}\) (red curve) and \({\widehat{\gamma }}_{k_n}^{\mathrm {E},\mathrm {RB}}\) (orange curve). Bottom left (respectively bottom right): Exponential QQ-plot of the log-spacings \(\log ( {\widehat{\varepsilon }}_{n-i+1,n}/{\widehat{\varepsilon }}_{n- k^*,n} )\), \(1\le i\le k^* =81\) (respectively \(1\le i\le k^* =49\)). The straight line has slope \({\widehat{\gamma }}_{k_n}^{\mathrm {H},\mathrm {RB}}=0.321\) (respectively \({\widehat{\gamma }}_{k_n}^{\mathrm {E},\mathrm {RB}}=0.345)\). (Color figure online)

The first conclusion we can draw is that the bias reduction scheme applied to the asymptotic proportionality tail index estimator is very effective: the non-bias-reduced estimates are typically far from their bias-reduced counterparts as well as from official estimates, and when the non-bias-reduced estimate is sensible (in the example of the Philippines and South Africa), the bias-reduced estimate is not unreasonable either. The second conclusion is that the bias-reduced estimates based on \({\widehat{\gamma }}_{k_n}^{\mathrm {E},\mathrm {RB}}\) are competitive and sometimes even closer to the average Gini index (computed using the average Gini indices from our official sources) than the estimate using the bias-reduced Hill estimator. This shows that, if the Burr model is to be applied on this data set, it is likely that the bias-reduced approach we propose is beneficial, and therefore our bias-reduced asymptotic proportionality tail index estimator is a valuable resource in practical applications. Of course, the validity of the Burr model is itself an assumption; a more principled method, such as a censored likelihood approach, may in general fit the data better by virtue of being more flexible. Unlike the method adopted in this illustration, this would not allow one to plug in the tail index and second-order parameter estimators in a simple fashion.

An analysis of financial returns

Our final real data example focuses on financial returns. The fact that expectiles can, in financial contexts, be interpreted in terms of the gain-loss ratio, makes them interesting in portfolio management. Recently Bellini and Di Bernardino (2017) have shown the practical interest of estimating extreme expectiles of series of financial log-returns; in particular, they recommend the estimation of the expectile of level \(\beta =0.99855\), following their observation that it coincides with the quantile of level \(\beta '=0.99\) in the standard Gaussian case. In this example, we consider the series of the daily negative log-returns of the Commerzbank stock prices on the DAX30 stock exchange between March 6, 2012 and July 28, 2016, resulting in a sample \(Y_1,\ldots ,Y_n\) of size \(n=1{,}048\) plotted in the top left panel of Fig. 3. To reduce the serial dependence in the observations, we filter our time series using an ARMA(1, 1)-GARCH(1, 1) model:

$$\begin{aligned}&Y_t = \mu + \phi Y_{t-1} + u_t + \theta u_{t-1}, \\&\text{ where } u_t =\sigma _t \varepsilon _t \text{ is } \text{ such } \text{ that } \sigma _t^2 = {\mathfrak {c}}+{\mathfrak {a}} u_{t-1}^2+{\mathfrak {b}} \sigma _{t-1}^2, \end{aligned}$$

see Sect. 5.2 p. 100 of Francq and Zakoïan (2010). Here \(\mu , \phi ,\theta \in {\mathbb {R}}\) and \({\mathfrak {a}},{\mathfrak {b}},{\mathfrak {c}}>0\) are unknown coefficients, and \((\varepsilon _t)\) is an unobserved independent nonconstant white noise sequence, i.e. such that \(\mathbb {E}(\varepsilon )=0\), \(\mathbb {E}(\varepsilon ^2)=1\) and \(\mathbb {P}(\varepsilon ^2=1)<1\). When \(|\phi |,|\theta |<1\), under suitable conditions, this model has a stationary, nonanticipative solution, see Theorem 2.4 p. 30 of Francq and Zakoïan (2010); this is in particular the case if \({\mathfrak {a}}+{\mathfrak {b}}<1\). In this case, at time t, \(\sigma _t\) is a function of the past of the process (up to time \(t-1\)) only; we let \({\mathcal {F}}_n\) be the sigma-algebra generated by the ARMA-GARCH process up to time n. By positive homogeneity of expectiles, the conditional expectile for the next day given the observations up to time n is then

$$\begin{aligned} \xi _{\beta }(Y_{n+1}|{\mathcal {F}}_n)= \mu + \phi Y_n + \sigma _{n+1} \, \xi _{\beta } (\varepsilon ) + \theta u_n. \end{aligned}$$

We estimate here an extreme conditional expectile for tomorrow given our knowledge of today, that is, the quantity \(\xi _{\beta _n}(Y_{n+1}|{\mathcal {F}}_n)\) with \(\beta _n=\beta =0.99855\). We first estimate all the parameters and predict the residuals \({\widehat{u}}_i\) using the function garchFit in the R package fGarch (Wuertz 2020) and the option @residuals, and then we obtain predictions \({\widehat{\varepsilon }}_i\) of the innovations by fitting a pure GARCH model directly to the \({\widehat{u}}_i\) and applying garchFit(...)@residuals. Following the theory developed in Girard et al. (2021), we treat the residuals \({\widehat{\varepsilon }}_i\) from the model as independent and identically distributed copies of \(\varepsilon \) for the estimation of the tail index \(\gamma \) of \(\varepsilon \), and \(\xi _{\beta _n} (\varepsilon )\). The independence assumption was checked using a series of Ljung-Box independence tests on residuals and their squares, with the lowest p value being 0.39. Evidence that \(\varepsilon \) is indeed heavy-tailed is gathered in the two bottom panels of Fig. 3 using exponential QQ-plots of the log-spacings. The estimated tail indices of the residuals are very similar: \({\widehat{\gamma }}_{k_n}^{\mathrm {H},\mathrm {RB}} \approx 0.321\) and \({\widehat{\gamma }}_{k_n}^{\mathrm {E},\mathrm {RB}} \approx 0.345\) (selected \(k_n\) values are \({\widehat{k}}_n^{\mathrm {H}}=81\) and \({\widehat{k}}_n^{\mathrm {E}}=49\) respectively). The garchFit routine also provides an estimate \({\widehat{\sigma }}_n\) of the conditional standard deviation on day n. For a prediction on day \(n+1\), we estimate the volatility by

$$\begin{aligned} {{\widehat{\sigma }}_{n+1}^2 = {\widehat{\sigma }}_{n+1}^2({\widehat{u}}_n^2,{\widehat{\sigma }}_n^2) = \widehat{{\mathfrak {c}}}+\widehat{{\mathfrak {a}}} {\widehat{u}}_n^2+\widehat{{\mathfrak {b}}} {\widehat{\sigma }}_n^2.} \end{aligned}$$

This eventually yields the conditional extreme expectile estimates

$$\begin{aligned} {\widehat{\xi }}_{\beta _n}^{\star ,\mathrm {RB}}(Y_{n+1}|{\mathcal {F}}_n)&= {\widehat{\mu }} + {\widehat{\phi }} Y_n + {\widehat{\sigma }}_{n+1} \, {\widehat{\xi }}_{\beta _n}^{\star ,\mathrm {RB}} (\varepsilon ) + {\widehat{\theta }} {\widehat{u}}_n, \\ {\widetilde{\xi }}_{\beta _n}^{\star ,\mathrm {RB}}(Y_{n+1}|{\mathcal {F}}_n)&= {\widehat{\mu }} + {\widehat{\phi }} Y_n + {\widehat{\sigma }}_{n+1} \, {\widetilde{\xi }}_{\beta _n}^{\star ,\mathrm {RB}} (\varepsilon ) + {\widehat{\theta }} {\widehat{u}}_n . \end{aligned}$$

Our key observation now is that, if \({\widehat{\mu }}\), \({\widehat{\phi }}\), \({\widehat{\theta }}\) and the past \((Y_t)_{t\le n-1}\) of the process are considered as fixed, then \({\widehat{u}}_n= Y_n - {\widehat{\phi }} Y_{n-1} - {\widehat{\theta }} {\widehat{u}}_{n-1} - {\widehat{\mu }}\) is an affine function of \(Y_n\), and \({\widehat{\sigma }}_{n+1}^2\) is a quadratic function of \({\widehat{u}}_n\) and hence of \(Y_n\). Each of our extreme expectile estimates can therefore be considered as a function of the observation \(Y_n=y_n\); these functions are represented in the top right panel of Fig. 3 as a way of evaluating the influence of the nth observation on the dynamic extreme expectile prediction for the next day. Our estimates \({\widehat{\xi }}_{\beta _n}^{\star ,\mathrm {RB}}(Y_{n+1}|{\mathcal {F}}_n)\) and \({\widetilde{\xi }}_{\beta _n}^{\star ,\mathrm {RB}}(Y_{n+1}|{\mathcal {F}}_n)\) are calculated using either \({\widehat{\gamma }}_{k_n}^{\mathrm {H},\mathrm {RB}}\) or \({\widehat{\gamma }}_{k_n}^{\mathrm {E},\mathrm {RB}}\), and are compared to their counterpart using the standard, non-bias-reduced Weissman estimate \({\widehat{\xi }}_{\beta _n}^{\star }(Y_{n+1}|{\mathcal {F}}_n)\) extrapolated with \({\widehat{\gamma }}_{k_n}^{\mathrm {H},\mathrm {RB}}\). They are also compared with dynamic bias-reduced Weissman quantile estimates of level \(\beta '_n=\beta '=0.99\), calculated using either \({\widehat{\gamma }}_{k_n}^{\mathrm {H},\mathrm {RB}}\) or \({\widehat{\gamma }}_{k_n}^{\mathrm {E},\mathrm {RB}}\):

$$\begin{aligned} {\widetilde{q}}_{\beta '_n}^{\star ,\mathrm {RB}}(Y_{n+1}|{\mathcal {F}}_n)= {\widehat{\mu }} + {\widehat{\phi }} Y_n + {\widehat{\sigma }}_{n+1} \, {\widetilde{q}}_{\beta '_n}^{\star ,\mathrm {RB}} (\varepsilon ) + {\widehat{\theta }} {\widehat{u}}_n, \end{aligned}$$

where the expression of \({\widetilde{q}}_{\beta '_n}^{\star ,\mathrm {RB}} (\varepsilon )\) (adapted here with the use of residuals) is given in Sect. 3.3.

Estimated expectiles are substantially larger than estimated quantiles; this justifies further the non-Gaussian behaviour of the returns, and also means that a risk assessment based on expectiles using the guidelines provided by Bellini and Di Bernardino (2017) in the Gaussian case would be more conservative than if it were based on quantiles. The bias-reduced expectile estimates give similar results, and the non-bias-reduced counterpart is visually larger, suggesting that bias plays some role in this example: the bias-reduced estimates strike a middle ground between an assessment of risk provided by extreme quantiles, which would be liberal, and a conservative assessment of financial risk given by the non-bias-reduced extreme expectile estimates.