Threshold Selection in Univariate Extreme Value Analysis

Threshold selection plays a key role for various aspects of statistical inference of rare events. Most classical approaches tackling this problem for heavy-tailed distributions crucially depend on tuning parameters or critical values to be chosen by the practitioner. To simplify the use of automated, data-driven threshold selection methods, we introduce two new procedures not requiring the manual choice of any parameters. The first method measures the deviation of the log-spacings from the exponential distribution and achieves good performance in simulations for estimating high quantiles. The second approach smoothly estimates the asymptotic mean square error of the Hill estimator and performs consistently well over a wide range of distributions. The methods are compared to existing procedures in an extensive simulation study and applied to a dataset of financial losses, where the underlying extreme value index is assumed to vary over time. This application strongly emphasizes the importance of solid automated threshold selection.


Introduction
Extreme value analysis of heavy-tailed distributions is an important model in various applications. In seismology and climatology, for example, statistics of extremes is used to study earthquakes (Beirlant et al., 2018) or heavy precipitation (Carreau et al., 2017). Another important field of research is analysing high financial losses, which becomes particularly interesting if the losses depend on covariates (Chavez-Demoulin et al., 2016;Hambuckers et al., 2018). In this situation an automated threshold selection procedure could bring additional benefits by enabling the selection of the threshold depending on a covariate. We will discuss this possibility in more detail in Section 5. To mathematically investigate the behaviour of heavy tails, we consider random variables from the domain of attraction (DoA) of a Fréchet distribution. Let X 1 , . . . , X n be independent identically distributed (i.i.d.) random variables with distribution function F , where F is in the DoA of an extreme value distribution (evd) G γ with extreme value index γ > 0. This means there exist sequences a n > 0 and b n real, s.t. lim n→∞ F n (a n x + b n ) = G γ (x) := exp −x −1/γ .
In this situation the following first order condition holds, i.e. the survival function 1 − F is regularly varying with index −1/γ. Distributions fulfilling this condition are called Pareto-type distributions, because they only differ from the Pareto distribution by a slowly varying function F (x), i.e. 1 − F (x) = x −1/γ F (x). We can interpret the quotient in (1) as a conditional probability, and it follows directly that Thus, for a sufficiently large threshold t the data above this threshold can be modelled by a Pareto or an exponential distribution. In this article we concentrate on the exponential approximation and utilize it for inference on the extreme value index. It is common to consider the threshold t = X (n−k,n) and choose the sample fraction k instead of t, where X (1,n) ≤ · · · ≤ X (n,n) denote the order statistics of a sample of size n. In this case, a natural estimator for γ under the exponential approximation of the log-spacings Y (i,k) := log(X (n−i+1,n) ) − log(X (n−k,n) ) is their mean, the Hill estimator (Hill, 1975),γ The Hill estimator is still among the most popular and well-known estimators for the extreme value index, although its sample path as a function in k can be highly unstable and estimation therefore crucially depends on the choice of the sample fraction k. This dependence highlights the difficulties in estimating γ: even from univariate i.i.d. observations from F ∈ DoA(G γ ), estimation is hard, since only few observations contain information about the extreme value distribution G γ . To select a threshold above which the data can be used for statistical inference about the tail is one of the most fundamental problems in the field of extreme value analysis.
Due to the importance of this task, the appropriate choice of the threshold has been discussed extensively in extreme value research over the last decades, and suggested solutions cover a variety of methodologies. We give a short summary on different types of approaches and stress the specific difficulties that arise. We mainly concentrate on methods we compare in our simulation study in Section 4. More comprehensive reviews about threshold selection can be found in Scarrott and MacDonald (2012) and Dey and Yan (2016). One basic concept in threshold selection is data visualisation, which is also discussed more deeply in Kratz and Resnick (1996) and Drees et al. (2000). Popular graphical diagnostics used in this context are the Zipf plot, Hill plot, QQ-plot or the mean-excess plot to name a few. A major drawback of these methods is their subjectivity due to the necessarily personal interpretation of the plot. Further, it is a burden to choose each threshold manually, especially in high dimensional settings or when analysing many samples. Easier ways to select the sample fraction are rules-of-thumb such as using the upper 10% of the data (DuMouchel, 1983) or k = √ n (Ferreira et al., 2003). However, these suggestions are neither theoretically justified nor data driven. Reiss and Thomas (2007) present a procedure that tries to find a region of stability among the estimates of the extreme value in-dex. Their method depends on a tuning parameter, whose choice is further analysed in Neves and Fraga Alves (2004). To our knowledge no theoretical analysis exists for this approach. Besides these and similar heuristic approaches, there is a class of theoretically motivated procedures that target the optimal sample fraction for specific estimation tasks, such as quantile estimates (Ferreira et al., 2003), estimation of high probabilities (Hall and Weissman, 1997) or the Hill estimator, see below. We also mention two other methodologies. First, there are suggestions that utilize comparing the empirical distribution to the fitted generalized Pareto distribution (GPD) via goodness-of-fit tests (Bader et al., 2018) or by minimizing the distance between them (Pickands, 1975;Gonzalo and Olmo, 2004;Clauset et al., 2009), where the latter approach is theoretically analysed by Drees et al. (2018). Further, Goegebeur et al. (2008) propose a family of kernel statistics to test for exponentiality in order to select a threshold. Of particular interest to us are methods that aim to estimate the sample fraction k opt which minimizes the asymptotic mean square error (AMSE) of the Hill estimator. To construct an estimator for k opt , Drees and Kaufmann (1998) utilize the Lepskii method and an upper bound on the maximum random fluctuation ofγ k around γ. To apply their approach it is necessary to choose several tuning parameters and to obtain consistent initial estimates for γ and a second order parameter ρ. They recommend specific choices of the parameters based on a numerical study and we employ their proposals in our simulations. However, the choice of these parameters is not data-driven. In Guillou and Hall (2001), a test statistic Q k is constructed based on an accumulation of log-spacings, which takes values around 1 as long as the bias of the Hill estimator is not significantly large. Their statistic depends on a tuning parameter as well, and a critical value to test Q k against has to be chosen. Again we adopt the parameter choice suggested in their simulation study. Danielsson et al. (2001) introduce a double bootstrap approach to estimate the optimal sample fraction. They need to choose the number of bootstrap samples and a parameter n 1 . For n 1 , a data-driven but computationally expensive selection method is provided, where the whole bootstrap procedure is repeated for various possible values of n 1 . Another estimator for k opt is given by Beirlant et al. (2002), which employs least squares estimates from an exponential regression approach. The method depends on an estimate for ρ and a sample fraction k 0 . To avoid the choice of k 0 they suggest taking the median of the estimates over a range of values, e.g. k 0 ∈ {3, . . . , n/2}. A different approach is taken by Goegebeur et al. (2008), who use the properties of a test statistic regarding bias estimation to construct an estimator for the AMSE/γ and minimize it with respect to k. If one fixes ρ = −1, as they suggest in their simulations chapter, there is no further tuning parameter to be chosen. However, no result about consistency ofk in the sense ofk/k opt P → 1 is known in contrast to the approaches in Drees and Kaufmann (1998), Guillou and Hall (2001), Danielsson et al. (2001) and Beirlant et al. (2002).
In this paper we contribute to the problem of threshold selection by introducing two new methods. The first one presented in Section 2 is inspired by the idea of testing the exponential approximation. We estimate the integrated square error (ISE) of the exponential density under the assumption that the log-spacings are indeed exponentially distributed. The error functional we obtain, denoted as inverse Hill statistic (IHS), is very easy to compute and does not depend on any tuning parameters. Since this criterion is variable for small k, it can be additionally smoothed to improve the performance. The minimizing sample fraction of IHS is asymptotically smaller than k opt , as it is stricter against deviation from the exponential approximation. This estimator performs remarkably well for adaptive quantile estimation on finite samples, as illustrated in our simulation study. In our second approach we suggest a smooth estimator for the AMSE of the Hill estimator, called SAMSEE (smooth AMSE estimator). This estimator is constructed by a preliminary estimate of γ using the generalized Jackknife approach in Gomes et al. (2000) and a bias estimator for the Hill estimator introduced in Section 3. By minimizing SAMSEE we estimate the optimal sample fraction k opt . For estimation, the choice of a large sample fraction K is necessary, for which we present a data-driven selection procedure in Section 3. SAMSEE utilizes the idea of fixing ρ = −1, which is justified by good performance in simulations and leads to a simpler and more robust estimator. However, the estimator can also be adjusted to any ρ by including a consistent estimatorρ, as described in Section 3.1. After introducing our two novel threshold selection methods in Sections 2 and 3 we compare these methods to various other approaches in an numerical analysis in Section 4. In Section 5 the importance of automated threshold selection procedures is illustrated in an application, where we nonparametrically estimate an extreme value index that varies over time. The proof of Theorem 3, which describes the asymptotic behaviour of our bias estimator, and auxiliary theoretical results can be found in Appendix A.

IHS -The inverse Hill statistic
In this section we introduce the first threshold selection procedure by analysing the integrated square error (ISE) between the exponential density h γ and its parametric estimator hγ k employing the Hill estimator, The first term of ISE is constant and thus plays no role for selecting k. The last term of ISE is known, but the second term is not. Therefore, we cannot minimize ISE directly. Instead, we want to estimate and minimize its expectation under the exponential approximation. This is based on the idea of considering the hypothesis H 0 that the log-spacings Y (i,k) are indeed exponentially distributed. Under H 0 the Hill estimator is gamma distributed, see Lemma 1, and the mean of ISE (MISE) can be calculated explicitly. We observe that MISE is a decreasing function in k under the exponential approximation, where C(k) := 2 exp(k)k k Γ(1 − k, k) and Γ(a, b) denotes the upper incomplete gamma function. The function C(k) converges to 1 very fast, s.t. we obtain This provides us with an unbiased estimator for the first term in (4) under H 0 . However, due to the high variability for small k, we instead want to find an estimator of the form w/γ k for some w depending on k that minimizes the MSE under the exponential approximation. To do so, we approximate its MSE in the following way, The approximation depends on similar functions as C(k), which quickly become constant. The MSE in (5) is minimized for w = (k − 2)/k. Thus, we suggest the inverse Hill statistic to estimate MISE(k) − (2γ) −1 and the threshold selected via minimizing IHS,k IHS := arg min 1<k<n IHS(k).
By minimizing IHS we select a sample fraction where IHS starts increasing and contradicts H 0 by behaving contrarily to MISE under the exponential approximation. This criterion can be compared to hypothesis testing with a large significance level α, which implies seeking high confidence when deciding to not reject H 0 . Further properties ofk IHS are analysed theoretically in Section 2.1 and for finite samples in a numerical study in Section 4. Note that the performance of IHS depends on the bias of the Hill estimator being positive and increasing, see Section 2.1. However, the bias can be negative for some non-standard distributions. In case of a negative bias, we instead suggest to use, The two cases can easily be distinguished by analysis of the Hill estimator for large k. Both IHS and IHS − are justified by asymptotic results in Section 2.1. Figure 1 illustrates that IHS is highly varying for small k, which makes automatic threshold choices more variable. To control this problematic behaviour we smooth the IHS. More specifically, we want to estimate E[IHS] by considering the regression problem where σ > 0 and E[ k ] = 0. Due to the structure of the Hill estimator, the random variables k are highly dependent, which needs to be taken into account in estimation. In our simulations, we apply a Bayesian nonparametric procedure introduced by Serra et al. (2018) which simultaneously estimates mean and covariance and is available in the R-package eBsc. The approach provides a smooth estimator for the expectation of IHS -denoted as sIHS -comprising less variation for small k. This way we can improve the performance by selecting a more suitable threshold, as illustrated in Figure 1. Of course, one can also use other smoothing procedures suitable for dependent data (Opsomer et al., 2001;Krivobokova and Kauermann, 2007;Lee et al., 2010). We finally want to remark on the relation between IHS and ISE, which is given by This equation points out that minimizing IHS does not minimize ISE, as IHS takes an additional bias term into account. If the bias of the Hill estimator is positive, IHS selects smaller k (larger thresholds) than ISE. This is not surprising, because we estimate the expectation of the ISE under the hypothesis that the exponential approximation holds. This is a much more conservative error functional, meaning it is more strict against deviation from the exponential distribution.
In conclusion, with IHS we do not aim to estimate k opt but to find a sample fraction where we can be very certain that the exponential approximation still holds. The impact of this consideration is illustrated in simulations and an application in Sections 4 and 5.

Theorectical analysis of IHS
In order to understand the IHS asymptotically we consider the second order condition, for x > 0 and with second order parameter ρ < 0. Here, A(t) denotes a function converging to zero as t goes to infinity and |A| is regularly varying where F denotes the left inverse of the distribution function F . In this setting the following asymptotic normality statements for the Hill estimatorγ k hold.
Theorem 1 (Theorem 3.2.5 in de Haan and Ferreira (2006)). Let X 1 , . . . , X n be i.i.d. random variables with distribution function F ∈ DoA(G γ ) for γ > 0. If (7) holds and k is an intermediate sequence, i.e. k → ∞ and k/n → 0 as n → ∞, then Theorem 2. Under the conditions of Theorem 1, it holds that Proof. Applying the delta method to Thm. 1.
Following the reasoning in de Haan and Ferreira (2006), page 78, the minimizing point of the AMSE can be found explicitly if considering A(t) = ct ρ with c = 0. In this special case the minimizing sample fraction can be expressed as Under the same assumption we can calculate the minimizing point k IHS of the asymptotic expectations of IHS and IHS − . Let AE denote the asymptotic expectation referring to the expectation of the limiting distribution in Thm. 2. Then It is easy to check that the same formula holds for IHS − if c is replaced by its absolute value. Further note that by Lemma 2 it is sufficient to  (9) is plotted as a function in ρ for γ = c = 1 and n = 500 (solid), n = 5000 (dashed) and n = 50000 (dotted).
consider intermediate sequences when determining the minimizing sequence.
Comparing k opt and k IHS for a fixed ρ > −∞ we obtain that as n → ∞ and for a constant d depending on ρ, γ and c. This supports what equation (6) already suggested: minimizing IHS gives asymptotically a smaller k than k opt . Thus, k IHS asymptotically performs suboptimally for the Hill estimator but still leads to a consistent sequence of estimates. For finite samples the ratio crucially depends on ρ, and k IHS can be even larger than k opt , as illustrated in Figure 2. The graphic presents the quotient of the two sample fractions as a function in the second order parameter ρ for different samples sizes. The parameters c and γ are fixed to 1, as they have a weaker impact on the proportion. It also holds that k IHS /k opt → 1, as ρ → −∞, since both sample fractions converge to n in this case.
Although k IHS is of smaller order than k opt asymptotically, the simulation study in Section 4 shows thatk IHS works remarkably well when used for quantile estimation. We consider the following quantile estimator for the The sample fraction k opt also minimizes the asymptotic relative MSE of q k (p), see e.g. Theorem 4.3.8 in de Haan and Ferreira (2006). For finite samples however, the quantile estimator seems to benefit from k IHS . This has different reasons, two of which are illustrated by we see the empirical expectation of IHS, the empirical versions of the MSE ofγ k and the relative MSE of the quantile estimator, as used in Theorem 4.3.8 in de Haan and Ferreira (2006). We observe that k IHS (blue dot) is indeed smaller than k opt (black) but so is the minimizer of MSEQ (pink) as well. On the right we see a plot of the empirical E[IHS] and MSE ofγ k for Loggamma distributed samples of size 5000. This graphic highlights the similarities between MSE and IHS for the boundary case ρ = 0. These observations indicate whyk IHS outperforms other methods that try to minimize the MSE of the Hill estimator when adaptively estimating q(p) by (10) on most of our exemplary distributions and sample sizes n = 500 and n = 5000, see Section 4.

SAMSEE -The smooth AMSE estimator
In this section we illustrate a way to smoothly estimate the AMSE of the Hill estimator. Via minimizing this AMSE estimator, called SAMSEE, we obtain an estimator for k opt . By this means, we extend previous methods which also estimate k opt by estimating the AMSE itself. From Thm. 1 it is easy to see that the AMSE, which is the asymptotic variance plus the asymptotic squared bias, equals Thus, to estimate the AMSE as a function in k we employ two estimators, one for γ and one for the bias term as a combination of ρ and A. First we explain how we estimate γ and then we define the bias estimator. This bias estimator has a quite smooth sample path in k, and it depends on the choice of a large sample fraction K, for which we afterwards provide a data-driven selection procedure. Note that, for the moment, we assume that the second order parameter ρ is equal to −1 to motivate the construction of the AMSE estimator. The idea of misspecifying ρ to simplify estimation -via avoiding the additional uncertainty through estimating ρ or selecting an influential tuning parameter -was already used, for example, by Gomes et al. (2000), Drees and Kaufmann (1998) and Goegebeur et al. (2008). It is also motivated by the simulations in Section 3.1. For γ we consider the generalized Jackknife estimatorγ GJ k introduced by Gomes et al. (2000) as γ G 1 n . This estimator is defined by where Y (i,k) denotes the log-spacings as in equation (3). Note, thatγ V,k is the de Vries estimator introduced under this name in de Haan and Peng (1998) andγ k is the Hill estimator as above. The generalized Jackknife estimator has a reduced bias compared to the Hill estimator and is even asymptotically unbiased if ρ = −1, see (2.11) in Gomes et al. (2000). This property is useful here, since the bias estimatorb up,K,k defined in the following performs optimally for ρ = −1 as well. Furthermore, the same large sample fraction K can be used forγ GJ K andb up,K,k . To construct this bias estimator, we study the following averages of Hill estimators,γ where k < K. Plotting these averages illustrates how they smoothly frame the sample path of the Hill estimator. Especially the upper meanγ up,K,k The estimatorb up,K,k is indeed a sensible estimator for a bias function, since follows for ρ = −1 from Theorem 3. Danielsson et al. (2001) use (γ V,k −γ k ) to access the bias ofγ k and apply a double bootstrap procedure to stabilize this highly varying estimate. We use the difference of two estimators for γ as well, but now consider averaging to smooth the bias estimate. The idea to average the Hill estimator in order to smooth the Hill plot and decrease the variance is also studied in Resnick and Stǎricǎ (1997). It remains to choose an appropriate K in order to complete SAMSEE and to estimate the optimal sample fraction k opt . We need K to be large enough to allow for minimization over all relevant k and small enough to be an intermediate sequence itself (see Theorem 3 for this condition). To find such a K we use the following relation between the estimators, This provides us with a relatively stable function in k,γ V,k +b up,K,k , that has the same asymptotic expectation as the highly non-smooth Hill estimator.
We want to find an intermediate sequence K for which (16) holds and thus define to measure the deviation from approximation (16) uniformly over all k ≤ K.
Based on this, we suggest to choose In this way we select a K * where the asymptotic approximation (16) is most stable, since we minimize the local variation of E 2 (K). Simulations suggest that this criterion is not sensitive to slightly increasing the region of stability {K − h, . . . , K + h} from h = 2 to h = 5 or 10 depending on the sample size.
Now we finally combine the previously described estimators to approach the AMSE in (12) under the assumption that ρ = −1. With K * in (18) and the property ofb up,K,k in (15), we obtain an estimator for the AMSE of the Hill estimator and for k opt by k SAMSEE := argmin 1<k<K * SAMSEE(k). On the left, SAMSEE is displayed for a Fréchet sample with parameters γ = 1/2 and ρ = −1. On the right, the Hill plot of the same sample is presented for all k ≤ K * = 388.
This smooth estimate of the AMSE can be useful beyond the context of threshold selection. For extreme value mixture models or Bayesian threshold selection approaches, SAMSEE could be used to construct a transition function between bulk and tail distribution or an empirical prior for the threshold, respectively, see Scarrott and MacDonald (2012) for a review on mixture models.

SAMSEE if ρ = −1
We next want to analyse SAMSEE in the broader context of an unknown second order parameter ρ. The first thing to note is that the generalized Jackknife estimator is no longer unbiased in this situation. Secondly, the behaviour of our bias estimator changes, as it is described in the following Theorem.
Proof. The proof can be found at the end of Appendix A.
From Theorem 3 follows that For ρ = −1 the function δ −1 (c) is equal to 1. If ρ = −1, we can observe that δ bends our bias estimator and it will therefore increase slightly too fast or too slow. We can still apply SAMSEE in this situation and select K * from (18). However, approximation (16) does not hold anymore and instead the following holds, The absolute value of the error described by (20) is high if δ ρ strongly differs from 1 and the bias term A is large. If ρ = −1, δ ρ indeed deviates from 1 and we minimize the error by minimizing the bias. This is why applying (18) in this case leads to a small K * . On the other hand, if δ ρ = 1, the approximation stays valid for an increasing bias and K * will typically be larger.
In this way we can construct an estimator for k opt in the general setting of Pareto-type distributions.
In Table 1, we present the results of a simulation study indicating for which distributions it is beneficial to useρ instead of ρ = −1. We estimate ρ using the estimatorρ (1) suggested in Theorem 1 in Drees and Kaufmann (1998). The results indicate that, in general, it is sensible to fix ρ = −1 in SAMSEE, since only for the Cauchy distribution usingρ performs slightly better regarding bias and RMSE. This confirms the observations already made by others (Gomes et al., 2000;Drees and Kaufmann, 1998;Goegebeur et al., 2008), that it is often recommendable to select ρ = −1 instead of allowing for further variability by including an additional estimator.

Simulation study
In the following we numerically analyse the performance of eight threshold selection methods on heavy-tailed distributions with very different tail behaviour. The simulation study is based on the following distributions: • the Student-t distribution with 6 degrees of freedom, which corresponds to γ = 1/6 and ρ = −1/3, • the Fréchet distribution with parameter α = 2 and distribution function F (x) = exp(−x −α ) for x > 0, which implies γ = 1/2 and ρ = −1, • the standard Cauchy distribution leading to a tail behaviour with γ = 1 and ρ = −2, • the Loggamma distribution with γ = 1 and ρ = 0 and density function • the Burr distribution with a parametrisation such that γ = 2, ρ = −1 and distribution function • a logarithmically perturbed Pareto distribution of the random variable g(U ) with γ = 1 and ρ = −1, where U ∼ Unif(0, 1) and g(x) = x −1 / log(x −1 ). This distribution is denoted as negBias due to its negative bias in the Hill estimator.
On these distributions we evaluate the methods by their root mean square error (RMSE) when adaptively estimating γ with the Hill estimator relative to the RMSE obtained using k opt , , where E n denotes the empirical expectation. These efficiency quotients are also used by, e.g., Guillou and Hall (2001), Gomes et al. (2000) and Drees and Kaufmann (1998). The smaller the quotient the better the threshold selection procedure performs compared to the asymptotically optimal sample fraction k opt . Furthermore, we study the efficiency in quantile estimation with the estimator defined in (10) for p = 0.001, .
Since we do not know the true minimizer k opt of the AMSE, we utilize an empirical version suggested by Gomes et al. (2000). Following their approach we approximate k opt by the mean of 20 independent replicates of k opt , which is the minimizer of the empirical MSE based on 1000 samples, i.e.k opt = argmin We compare these efficiency values for eight different threshold selection methods. Most of the considered approaches are constructed for adaptive estimation of γ applying the Hill estimator. This includes one procedure that looks for a stable region among the Hill estimates, while the others aim to estimate k opt . The only exception is the IHS approach discussed in Section 2, which is motivated to minimize the deviation from the exponential approximation. We still evaluate the performance of this procedure in the same simulations, although it is not primarily tailored for the specific applications. In total, the following methods are considered: sIHS: IHS smoothed by using the eBsc package, see Section 2, SAM: SAMSEE procedure with ρ = −1 as defined by (19) in Section 3, GH: method by Guillou and Hall (2001) utilizing c crit = 1.25 and p = 1, DK: procedure by Drees and Kaufmann (1998) with fixed ρ = −1, GO: approach by Goegebeur et al. (2008)

defined in their equation (3.3)
with fixed ρ = −1, DB: double bootstrap approach by Danielsson et al. (2001) with the choice n 1 = 120 if n = 500 and n 1 = 1000 if n = 5000, B: method by Beirlant et al. (2002) with ρ = −1, RT: method by Reiss and Thomas (2007) with β = 0 as suggested by Neves and Fraga Alves (2004  When looking at the results for estimating γ adaptively for n = 500 and n = 5000 in Table 2, we observe a very diverse picture of methods performing best. Overall we get the impression that SAMSEE together with the approach by Goegebeur et al. (2008) performs most stable over the variety of distributions. This is interesting, because those are the methods which depend least on tuning parameters. The performance of the approaches GH, DK and B is comparable, but we obtain from Table 2 that on average over all distributions the SAMSEE procedure is superior. For estimating a high quantile SAMSEE also performs convincingly, see Table 3, but additionally sIHS and the approach by Danielsson et al. (2001) show very good efficiency values. They are closely followed by B and GO. Looking at the average performance over all distributions, SAMSEE performs best again. However, if we exclude the negBias distribution, sIHS n = 500 SAM GH DK GO DB B RT sIHS Student-t (6) 1.09 2.30 1.23 1.60 1.01 1.04 1.16 1.10 Fréchet (2)   works superior on average.
In conclusion, we can see that SAMSEE performs very efficiently and comparable to k opt over all exemplary distributions. It works especially well for estimating a high quantile. Only in the case of estimating γ for the Cauchy distribution it performs worse than DK and GO, but still better than most other approaches. Recalling the results of the simulation on the influence of ρ in Table 1 in Section 3.1, it is not very surprising that SAMSEE performs slightly weaker in this situation. There, the Cauchy distribution is the only example we considered that benefits from estimating ρ instead of fixing it to −1.
From Table 3 we furthermore observe that sIHS is a strong choice when estimating high quantiles from small samples with n up to 5000. However, the performance when estimating γ is quite variable and it seems that sIHS does not perform particularly well for distributions with a small second order parameter (ρ ≤ −1). This behaviour is already discussed in Section 2.1 and highlighted in Figure 2: sIHS selects smaller k than optimal for the Hill estimator, especially if ρ is in the regime between −1 and −8.
The reason why some approaches perform worse on quantiles than they do on γ is that the estimatorq k (p) defined in (10) depends onγ k in the exponent and is thus very sensitive to overestimation in case of γ > 1. Hence, when estimating a high quantile, an estimateγ k that is too large will lead to an even stronger overestimation of the quantile. This is why a few outliers among the γ estimates can already cause much higher EFF q values.

Application to varying extreme value index
In this section we analyse our new procedures in a financial application, where we study operational losses of a bank. We are, of course, particularly interested in the distributional properties of very high losses. It has been discussed before that it is reasonable to assume the distribution of such extreme losses being heavy-tailed (Chavez-Demoulin et al., 2016;Moscadelli, 2004) and to change with the financial market over time (Hambuckers et al., 2018;Cope et al., 2012). In this context, we want to estimate how the extreme value index changes depending on the univariate covariate time. For this task, we utilize the approaches presented in Sections 2 and 3 for locally optimal selection of a threshold. The observations of interest are operational losses from the Italian bank UniCredit from 2005 to 2014. In Hambuckers et al. (2018) the data is analysed in a regularized generalized Pareto regression approach including several firm-specific, macroeconomic and financial indicators as covariates. This approach describes the dependence of the GPD parameters on various covariates via parametric functions. We consider an easier and more direct approach to study the temporal dependence of the extreme value index without taking into account possible interference by other covariates. Our aim is to estimate the time dependent extreme value index γ(t) non-parametrically with a simple ad hoc estimator that extends the estimator from de Haan and Zhou (2017) by employing our threshold selection procedures sIHS and SAMSEE. We present the estimator in Section 5.1 and the results we obtain when applying this estimator to the dataset of operational losses in Section 5.2.

Estimating a varying extreme value index
In de Haan and Zhou (2017), the authors already discussed estimating a trend in the extreme value index non-parametrically. They consider n independent random variables X i ∼ F i/n , where F s ∈ DoA(G γ(s) ) for s ∈ [0, 1].
To address this problem, they introduce the following estimator for γ(s), which locally applies the Hill estimator and is based on a global sample  Figure 6: The true extreme value index γ(s) = 1 + s (red) next to the averaged estimators over 1000 samples. The estimator from de Haan and Zhou (2017) is in black and its empirical 95% confidence interval is dashed. Our modified estimator employing SAMSEE is blue with dotted empirical confidence bounds.
where I n (s) is the h-neighbourhood of s, i.e. I n (s) := {i : |i/n − s| ≤ h}. This estimator depends on the choice of the bandwidth h and the global sample fraction k, which is then rescaled to 2kh for the individual regions I n (s). A small bandwidth h leads to very high variability inγ k (s) and a large value of h smooths out all interesting features. Thus, the choice of h should balance these two effects.
We suggest a modification of their estimator, where we locally estimate an optimal thresholdk(s), i.e.
To compare these two approaches, we repeat the simulation presented in Figure 2 (i) in de Haan and Zhou (2017) on samples of size n = 5000 with X i ∼ Fréchet(1/γ(i/n)) and γ(s) = 1 + s. Figure 6 illustrates the benefits of locally optimizing the threshold via SAMSEE from Section 3, as it strongly tightens the empirical confidence interval around the average, which is obtained from the 2.5% and 97.5% quantiles among 1000 estimates.

Functional extreme value index of operational losses
The operational losses in the dataset of UniCredit are grouped by the type of event that caused the specific loss. We consider the event type CPBP, which provides sufficient observations for our local estimation approach. The CPBP losses are caused by clients, products and business practices related to derivatives or other financial instruments.
First we want to test if the extreme value index is constant over time. Using the test T4 from Einmahl et al. (2016), we can reject the null hypotheses with a p-value that is virtually zero and thus are confident that the extreme value index of the losses is indeed varying over time.
We apply the new methodology from (23) to these losses via estimatingk with sIHS from Section 2 and the SAMSEE approach from Section 3. Figure  7 shows the estimates we obtain for the event type CPBP. It is clearly visible that both procedures yield similar estimates for most time points and that the simple ad hoc estimators recover an increase of the severity of high losses during the financial and Euro crisis from 2008 to 2011. A similar overall trend in the extreme value index can also be identified in the estimates of Hambuckers et al. (2018) for CPBP. For a more extensive discussion of the data and results of the more complex model including further covariates we refer to Hambuckers et al. (2018).

A Theoretical results and proof of Theorem 3
Lemma 1 (Distribution of the Hill estimator). Let X 1 , . . . , X n i.i.d.
∼ F for F ∈ DoA(G γ ) with γ > 0. Then the following distributional representation for the Hill estimator holds, where G k ∼ Γ(k, γ/k) and for b n,k it holds that Proof. From the first order condition follows that U = F (1−1/x) is regularly varying with index γ and there exists a slowly varying function U , such that U (x) = x γ U (x). Let P 1 , P 2 , . . . be i.i.d. random variables with distribution function 1 − 1/y. Note that U (P i ) D = X i . We define Y (k−i,k) := log(X (n−i,n) ) − log(X (n−k,n) ), for which it follows that U (P (n−i,n) ) U (P (n−k,n) ) = γ log P (n−i,n) P (n−k,n) + log U (P (n−i,n) ) U (P (n−k,n) ) .
Note that log(P i ) is standard exponentially distributed. By Lemma 3.2.3 in de Haan and Ferreira (2006) follows for i.i.d. standard exponential random . Hence, we obtain for the Hill estimator that where G k ∼ Γ(k, γ/k) as the sum of i.i.d. exponentials and b n,k denotes the second average. If k/n → 0, P (n−k,n) → ∞ almost surely by Lemma 3.2.1 in de Haan and Ferreira (2006). Since U is slowly varying, b n,k converges to zero almost surely.
1. If k is finite, Proof. We proof these three alternative statements to obtain that the minimizing sequence k = k n is an intermediate sequence.
1. Let k be finite, by Lemma 1 holds thatγ k D = G k + b k,n , where G k ∼ Γ(k, γ/k) and b k,n → 0 as n → ∞. Thus, as n → ∞ it follows that .
2. The statement follows from the consistency of the Hill estimator and the continuous mapping theorem.
3. Let k → ∞ and k/n → c > 0, then it holds by Lemma 1 thatγ k Lemma 3. Let E 1 , . . . , E n be i.i.d. standard exponential random variables and k s.t. 1 ≤ k ≤ n and k → ∞ as n → ∞. We define the following random variables, Proof. We use the Cramér-Wold device that gives us a joint normal limit distribution if all linear combinations have an univariate normal limit distribution. For a 1 , a 2 , a 3 ∈ R we study (a 1 P k,n + a 2 Q k,n + a 3 R k,n ) .
Condition 3) holds with as k → ∞ and for a constant c > 0, since the exponential distribution has finite moments and k i=1 (e k i+1 ) 3 /k ≈ 6. Thus, we obtain that This is the limiting distribution of the sum in (24) and also follows from the joint normal distribution.
∼ F for F ∈ DoA(G γ ) with γ > 0 and P 1 , P 2 , . . . be i.i.d. random variables with distribution function 1 − 1/y. We defineγ If the second order condition lim t→∞ U (tx) holds for ρ < 0 and x > 0, then Proof. The results in (26) and (27) are already stated in the proof of Theorem 1 in de Haan and Peng (1998). To prove (28) we follow the proof of the asymptotic normality of the Hill estimator in de Haan and Ferreira (2006). Let A 0 be such that A(t)/A 0 (t) → 1, as t → ∞. Then, for each > 0 there exists a t 0 such that for t ≥ t 0 and x ≥ 1 the inequality in Theorem B.2.18 in de Haan and Ferreira (2006) holds. For t = P n−k,n and x = P (n−i,n) /P (n−k,n) we obtain that The second term can be approximated by and for the third term holds as k → ∞. Note that for E 1 , . . . , E n i.i.d. standard exponential random variables follows by Rényi's representation that log P (n−i+1,n) P (n−k,n) This distributional equality enables the following transformations, Thus, E i e k i+1 − 1 = 2γ + γ (P k,n + R k,n ) / √ k.
Combining the above arguments as in the proof of Theorem 3.2.5 in de Haan and Ferreira (2006) gives (28).
Lemma 5. For k → ∞, k/n → 0 and k/K → c with 0 < c < 1, where R k,n and P k,n are defined in Lemma 3.
Proof. Let E 1 , E 2 , . . . be i.i.d. standard exponential random variables, where Cov(E i , E j ) is equal to 1 if i = j and 0 otherwise. Then Cov(R K,n , R k,n ) = Cov where c k ≈ c again denotes that c k → c as k → ∞.
In the same way we obtain Cov(R K,n , P k,n ) = Theorem 4. Letγ k := 1 k k i=1γ i denote the average over Hill estimates. Further let X 1 , . . . , X n be i.i.d. random variables with distribution function with U (x) := F 1 − 1 x holds and k → ∞ and k/n → 0 as n → ∞, with λ := lim k→∞ √ kA(n/k).
Proof. First we have to rewrite the average over the Hill estimator, where Y E is defined in Lemma 4. Following the proof of Lemma 4 it holds that the last term above is in distribution equal to γ k log P (n,n) P (n−k,n) + A(P (n−k,n) ) k P (n,n) P (n−k,n) ρ − 1 ρ + o p (1) |A(P (n−k,n) )| k P (n,n) P (n−k,n) ρ+ .
Here the random variable R k,n is defined in Lemma 3 and we know that R k,n has a normal limit distribution. With Lemma 3 and Lemma 5 we obtain the following variance,