On a relationship between randomly and non-randomly thresholded empirical average excesses for heavy tails

Motivated by theoretical similarities between the classical Hill estimator of the tail index of a heavy-tailed distribution and one of its pseudo-estimator versions featuring a non-random threshold, we show a novel asymptotic representation of a class of empirical average excesses above a high random threshold, expressed in terms of order statistics, using their counterparts based on a suitable non-random threshold, which are sums of independent and identically distributed random variables. As a consequence, the analysis of the joint convergence of such empirical average excesses essentially boils down to a combination of Lyapunov’s central limit theorem and the Cramér-Wold device. We illustrate how this allows to improve upon, as well as produce conceptually simpler proofs of, very recent results about the joint convergence of marginal Hill estimators for a random vector with heavy-tailed marginal distributions. These results are then applied to the proof of a convergence result for a tail index estimator when the heavy-tailed variable of interest is randomly right-truncated. New results on the joint convergence of conditional tail moment estimators of a random vector with heavy-tailed marginal distributions are also obtained.


Introduction and motivation
Heavy-tailed random variables appear in numerous fields of statistical applications of extreme value analysis, such as insurance and finance (see e.g. p.9 of Embrechts et al. (1997)), geoscience (see Section 1.3.5 of Beirlant et al. (2004)) and analysis of teletraffic data (see Section 8 of Resnick (2007)). A typical goal of extreme value analysis, in such contexts, is the estimation of extreme quantiles of a univariate random variable of interest, such as the daily log-return of a stock market index, the magnitude of earthquakes in a given region, or the size of data packets transferred in a computer network. A well-established procedure for extreme quantile estimation is Weissman's extrapolation method (Weissman 1978), whose essential requirement is a consistent estimator of the tail index of the underlying heavy-tailed distribution. Recent studies geared towards insurance and finance have been advocating for the use and estimation of extreme versions of alternatives to quantiles, such as Conditional Tail Moments (El Methni et al. 2014), Wang distortion risk measures (introduced in Wang (1996), and studied recently in El Methni and Stupfler (2017Stupfler ( , 2018), L p −quantiles (introduced in Chen (1996), and studied recently in Daouia et al. (2018bDaouia et al. ( , 2019 and extremiles (introduced and studied in Daouia et al. (2018a)). All these quantities, however, can be shown to have the heavy-tailed behaviour displayed by tail quantiles, and as such, their estimators at extreme levels can be constructed via straightforward adaptations of Weissman's method by relying on tail index estimators as well. Tail index estimation is therefore a central question in the statistical analysis of heavy-tailed distributions.
The most popular and well-known tail index estimator is arguably the Hill estimator (Hill 1975), which is also the maximum likelihood estimator if the underlying statistical model is purely Pareto. Assuming that X 1 , . . . , X n is a random sample of copies of a variable X having a heavy-tailed distribution with tail index γ , the Hill estimator of γ is log X n−i+1,n X n−k,n .
Here, k = k(n) is such that k → ∞ and k/n → 0, and X 1,n ≤ X 2,n ≤ · · · ≤ X n,n denote the order statistics related to X 1 , . . . , X n . It is well-known that, under an appropriate second-order condition on X controlling the gap between the underlying distribution and the Pareto distribution, this estimator is √ k−asymptotically normal; a variety of proofs of this result are available, including arguments based on Rényi's representation of order statistics (see pp.74-75 in de Haan and Ferreira (2006)), the tail empirical process (see Proposition 9.3 p.302 in Resnick (2007)), or a representation of log-spacings in terms of independent exponential random variables (see Section 4.4 in Beirlant et al. (2004)). A common aim of these approaches is to deal with the structure of the Hill estimator in terms of top order statistics, which are significantly harder to handle than the original X 1 , . . . , X n . Although this theoretical difficulty is now well-understood, it is further compounded if one wants to show the joint asymptotic normality of marginal Hill estimators for a random vector with heavy-tailed marginal distributions, since one then also needs to take into account the asymptotic dependence structure within the underlying multivariate distribution.
Such joint convergence results have only been proven very recently by Dematteo and Clémençon (2016), Kinsvater et al. (2016) and Hoga (2018). The methods of proof therein rely on advanced theoretical methodologies, namely multivariate vague convergence in Dematteo and Clémençon (2016) and Kinsvater et al. (2016) and multivariate empirical process theory in Hoga (2018), as well as on various ad hoc technical conditions. These joint asymptotic results have been found useful for testing tail homogeneity across marginals of a multivariate heavy-tailed distribution in environmental or financial contexts (see Kinsvater et al. (2016) and Hoga (2018)) and constructing improved tail index estimators by pooling (see Section 3.2 in Dematteo and Clémençon (2016)).
The Hill estimator can be written in several different ways. When there are no ties in the sample X 1 , . . . , X n (this is for example the case when the distribution function of X is continuous), one has k = n i=1 1 {X i >X n−k,n } . Noting then that n } suggests that a major difficulty in the theoretical analysis of the Hill estimator lies in the fact that the high threshold X n−k,n , used to guarantee the consistency of the estimator by retaining only the high values in the sample, is random. Indeed, if we could, in the asymptotic analysis of γ (k), replace the random quantity X n−k,n , which is nothing but the empirical quantile at level 1−k/n, by the unknown but non-random population quantile q(1 − k/n), we would find the pseudo-estimator which is conceptually far easier to analyse than γ (k): for independent X 1 , . . . , X n , this pseudo-estimator is a ratio of sums of independent variables constructed on the X i , and thus can easily be handled by a combination of Lyapunov's central limit theorem and the Cramér-Wold device. The striking and perhaps unexpected point here is that the asymptotic distributions of γ (k) and γ (k) are identical; this can be seen by comparing Theorem 3.2.5 in de Haan and Ferreira (2006) and Theorem 4.3.1 in Goldie and Smith (1987). On this basis, one may therefore ask whether a relationship of the form can be shown. Such a relationship has not, to the best of our knowledge, been proven so far in the literature. Its validity is not obvious either, since γ (k) is obtained from γ (k) by replacing X n−k,n with q(1 − k/n), and we know that X n−k,n /q(1 − k/n) − 1 only converges to 0 at the rate √ k (see Theorem 2.4.1 in de Haan and Ferreira (2006)), which is the common rate of convergence of γ (k) and γ (k). Let us also point out straightaway that although Eq. 1 would give an additional proof of the convergence of the Hill estimator γ (k), this is not where its full value lies, since the asymptotic properties of the Hill estimator are well-known. It really becomes useful when, for instance, analysing the joint convergence of marginal Hill estimators, since, in contrast to the joint convergence of the randomly thresholded versions, the joint convergence of the non-randomly thresholded versions is easy to obtain (under an appropriate upper tail dependence condition) by a combination of standard central limit theory and the Cramér-Wold device.
The proof of Eq. 1 is the motivation for this paper. More precisely, the objective of the paper is to embed the Hill estimator in a wider class of average excesses and then provide a simple representation of those empirical average excesses above a high random threshold in terms of their pseudo-estimator versions with a non-random threshold. The class of average excesses we consider includes the Hill estimator, and can be modified in a very simple way to encompass Conditional Tail Moments (CTMs). We show in particular how the results of Dematteo and Clémençon (2016), on the joint asymptotic normality of marginal Hill estimators for a random vector with heavy-tailed marginal distributions, can be recovered and generalised under weaker assumptions and by elementary techniques. We shall then highlight a couple of applications of this joint asymptotic normality result, including to the obtention of the asymptotic properties of the tail index estimator introduced by Gardes and Stupfler (2015) when the variable of interest is randomly right-truncated. The question of the convergence of this estimator was considered first by Gardes and Stupfler (2015) under restrictive assumptions, and then by Benchaira et al. (2015) using a delicate theoretical argument based on the weighted tail copula process and a joint tail assumption on the observed pair. Our results will make it possible to unify and extend the results of Gardes and Stupfler (2015) and Benchaira et al. (2015), without neither resorting to the former's technical conditions nor to the latter's advanced methodology and joint dependence condition. In doing so, we will also be able to give a very simple expression of the asymptotic variance of the limiting normal distribution. Finally, and motivated by the ideas of Hoga (2018), we shall apply our results to find joint convergence results on empirical CTMs that may be of independent interest, for instance in testing whether certain tail moments of two asymptotically dependent variables are equal. We will highlight how such results nicely complement earlier results proven by El Methni et al. (2014) andStupfler (2017).
The outline of the paper is the following. Our framework and main results are stated in Section 2. Applications of our results, to the joint convergence of Hill and CTM estimators, and to the convergence of a tail index estimator under random righttruncation are presented in Section 3. Section 4 concludes and discusses possible extensions. Proofs are deferred to a Supplementary Material document.

Framework and main results
We assume in this section that the data is made of independent copies X 1 , . . . , X n of a random variable X. We denote by F the distribution function of X, by F = 1 − F the related survival function, and by q the left-continuous inverse of F (that is, q(τ ) = inf{t ∈ R | F (t) ≥ τ }). We assume for ease of presentation that the distribution function F is continuous, so that, with probability 1, there are no ties in the sample X 1 , . . . , X n . This is not restrictive for our purposes, as our applications of the main results presented here focus on the obtention of joint convergence results for several estimators of extreme value indicators pertaining to a multivariate random vector with heavy-tailed marginal distributions. Such an endeavour indeed typically requires the standardisation of the marginal distributions to uniform distributions, in order to obtain a meaningful description of the relevant extremal dependence structure. Our results can still be shown, at the expense of extra technical details, if F is only continuous in a neighbourhood of infinity.
Our main assumption throughout is that X has a heavy-tailed distribution. In other words, denoting by U(t) := q 1 − t −1 the tail quantile function of X, we assume that U is a regularly varying function with index γ > 0, namely Since our motivating question is the obtention of the asymptotic representation (1), we shall work under a refinement of this regular variation condition that allows for the derivation of asymptotic properties of extreme value estimators. We choose here to work with the following second-order extreme value condition, which is extensively used in e.g. de Haan and Ferreira (2006): The function U is second-order regularly varying in a neighbourhood of +∞ with index γ > 0, second-order parameter ρ ≤ 0 and an auxiliary function A having constant sign and converging to 0 at infinity, that is, where the right-hand side should be read as x γ log x when ρ = 0. Let us now define the central concepts this paper will focus on. Again, let us recall our motivation, which is to link the Hill estimator γ (k) to its pseudo-estimator version γ (k). One interpretation of the Hill estimator γ (k) is that it is a sample version of the average log-excess E(log X − log U(n/k) | X > U(n/k)) (see p.104 of Beirlant et al. (2004), and p.69 of de Haan and Ferreira (2006)); it should also be clear that γ (k) is a pseudo-estimator of this average log-excess. This motivates us to carry forward the notion of average excess with the following definition.
Definition 1 Let X be a heavy-tailed random variable and f be a continuous function on a neighbourhood of infinity such that for some t 0 , the quantity E(|f (X)| | X > t 0 ) is well-defined and finite. For any t ≥ t 0 , we define the average f -excess of X above level t to be and the empirical average f -excess of X above level t to be We also define the expected f -shortfall above level t as and the empirical expected f -shortfall above level t as With this definition, letting f = log, we find The other main example allowed by Definition 1 which we will consider in this paper is obtained by choosing f = f a : x → x a for some a > 0. We then get, for any where CTM a is the Conditional Tail Moment of order a introduced in El Methni et al. (2014). This definition makes sense for any a < 1/γ , since all conditional tail moments of X of order smaller than 1/γ are finite (a rigorous statement is Exercise 1.16 in de Haan and Ferreira (2006)). The class of CTMs has recently been used by El Methni et al. (2014) and El Methni and Stupfler (2017Stupfler ( , 2018 for extreme risk assessment purposes in environmental, financial, and actuarial contexts. It includes in particular, for a = 1, the expected shortfall, whose relevance to actuarial science and interpretation has been extensively discussed recently, see e.g. Brazauskas et al. (2008), Wüthrich and Merz (2013) and Emmer et al. (2015). An empirical counterpart of CTM a (U (n/k)) = ES f a (U (n/k)) is then Common features of the above two examples are that they crucially hinge on the notion of average f −excess, and most importantly that the derivative of the function f involved in their construction is a power function. This observation provides the motivation for our first main result, which provides an asymptotic relationship between AE f (X n−k,n ) and AE f (U (n/k)) when f is regularly varying at infinity.
Assume finally that f is continuously differentiable in a neighbourhood of infinity, ultimately increasing, and that f is regularly varying with index a − 1, where 0 ≤ 2aγ < 1. Then we have: In particular Theorem 1 provides an asymptotic representation of the empirical average f −excess AE f (X n−k,n ), whose expression hinges on top order statistics, in terms of sums of independent and identically distributed random variables constructed on the X i in a simple way. Note that, due to the continuity of F , we have F (U(n/k)) = k/n; writing F (U(n/k)) instead of k/n, as we will do in this section, is to emphasise that the second term on the right-hand side is centred. Assumptions k = k(n) → ∞, k/n → 0 and √ kA(n/k) = O(1) are standard for the asymptotic analysis of extreme value estimators; the assumption 0 ≤ 2aγ < 1, meanwhile, ensures that AE f (U (n/k)) is √ k−asymptotically normal (see Lemma 4 in the Supplementary Material document), and therefore that Theorem 1 provides an expression of AE f (X n−k,n ) in terms of AE f (U (n/k)) and F n (U (n/k)) that is meaningful for the asymptotic analysis.
In particular, setting f = log (and therefore a = 0) in Theorem 1 allows us to show that the motivating representation (1), of the Hill estimator in terms of sums of independent and identically distributed random variables, holds indeed. This is the focus of the following corollary.

Corollary 1 Suppose that X satisfies condition
Corollary 1 can of course be used to provide yet another proof of the asymptotic normality of the Hill estimator, via the Lyapunov central limit theorem and the Cramér-Wold device applied to get the √ k−asymptotic normality of γ (k). This is not, however, where the value of Corollary 1 lies, if only because its proof uses an approximation of the tail empirical process x → F n (xU (n/k)) by a Gaussian process, which can itself be used to provide a direct proof of the asymptotic normality of the Hill estimator (see pp.162-163 of de Haan and Ferreira (2006)). A much more relevant impact of Corollary 1 lies in its potential for the analysis of the joint convergence of several Hill estimators. For instance, if X 1 , . . . , X n and Y 1 , . . . , Y n are independent copies of random variables X and Y , which are both heavy-tailed and satisfy a second-order condition, then we have under suitable assumptions on k that: The benefit of writing this is that while showing directly the joint convergence of the random pair on the left-hand side is difficult and appears to require advanced theoretical arguments (see Dematteo and Clémençon (2016) and Hoga (2018)), the convergence of the right-hand side is much easier to obtain since it is nothing but a pair of (ratios of) sums of independent and identically distributed random variables. We will return to this in Section 3 to show how this observation leads to conceptually simple proofs of the joint asymptotic normality of several Hill estimators.
Theorem 1 is somewhat tedious to apply if the focus is on an average shortfall, such as in the case of a CTM. The following theorem provides an analogue of Theorem 1 specifically dedicated to the analysis of ES f (X n−k,n ), under a slightly stronger condition on f .

Theorem 2 Work under the conditions of Theorem 1, under the additional assumption that
In particular Setting f = f a , we find back the asymptotic variance of the empirical estimator ES f a (X n−k,n ) of CTM a (U (n/k)); see Theorem 1 of El Methni et al. (2014) in the regression case. The condition 2aγ < 1, part of Theorems 1 and 2, also naturally appears in the asymptotic normality results of El Methni et al. (2014) and ensures in particular that the conditional tail variance of X a is finite. Again, the main value of the above result does not lie in the fact that it gives the asymptotic distribution of the empirical average expected shortfall, but rather in that it allows one to establish the joint convergence of several of those estimators with no conceptual difficulty. With an eye on the latter, we state the following corollary of Theorem 2, which is easier to use in practice.

Corollary 2 Work under the conditions of Theorem 2. Then:
Let us point out that Corollary 2 gives an asymptotic representation of ES f (X n−k,n ) as a single sum of independent, identically distributed and centred random variables. Its use does not even involve any linearisation (unlike Theorems 1 and 2), making it particularly simple to apply.
Our objective in the rest of this paper is to show how the main theoretical results of this section can be applied to finding a solution to two theoretical questions: the joint convergence of marginal Hill estimators, and the joint convergence of marginal Conditional Tail Moments. We shall also explore how our result on the joint convergence of Hill estimators can be fruitfully applied to solve the question of the convergence of a specific tail index estimator, introduced by Gardes and Stupfler (2015) to tackle the case when the variable of interest is randomly right-truncated. Before that, we conclude this section on a generalisation of Theorem 1 and Corollary 1 to the case when the order statistic X n−k,n is replaced by an arbitrary √ k−consistent estimator U(n/k) of the quantile U(n/k). In this context, AE f ( U(n/k)) cannot be represented by sums of independent, identically distributed and centred random variables anymore, but interestingly Corollary 1 still stands.
Theorem 3 Work under the conditions of Theorem 1. Then we have:

Corollary 3 Suppose that X satisfies condition
We observe that Theorem 3 and Corollary 3 are indeed generalisations of Theorem 1 and Corollary 1: when U(n/k) = X n−k,n , one has F n ( U(n/k)) = F n (X n−k,n ) = k/n = F (U(n/k)) by continuity of F and γ (k) = γ (k).

Joint convergence of marginal Hill estimators
, be a sample of independent copies of a random vector X = X (1) , X (2) , . . . , X (d) such that each component X (j ) has a continuous distribution function F j and satisfies condition C 2 (γ j , ρ j , A j ). Our first goal is to establish a joint convergence result for the Hill estimators of the γ j built on this sample, that is: where k j = k j (n) → ∞, with k j /n → 0. This theoretical question is addressed in Dematteo and Clémençon (2016) and further discussed in Kinsvater et al. (2016) under the assumption γ j = γ for all j ∈ {1, . . . , d}.
Since each γ j (k j ) is built on top order statistics from the corresponding X (j ) , our objective of analysing the joint convergence of the γ j (k j ) calls for some sort of extremal dependence assumption between the X (j ) . We shall work under the following condition on the pairwise upper tail dependence between any two components X (j ) and X (l) : This condition appears, among others, in Cai et al. (2015) as well as, in a slightly different form, in Hoga (2018). It is a convenient way to describe the asymptotic dependence structure of the bivariate random vector (X (j ) , X (l) ), while being weaker than a pairwise bivariate regular variation assumption (in the sense of e.g. Resnick (1987)) and a fortiori weaker than a multivariate regular variation assumption on the random vector X such as the one of Dematteo and Clémençon (2016). The function R j,l is sometimes called the tail copula of (X (j ) , X (l) ) (see Schmidt and Stadtmüller (2006)). We mention that, as noted by Cai et al. (2015), the function R j,l characterises the stable tail dependence function of the pair (X (j ) , X (l) ) as defined by Drees and Huang (1998).
Second-order regular variation of each marginal distribution and the above pairwise upper tail dependence assumption turn out to be sufficient to obtain the joint convergence of the γ j (k j ), as the next result shows.
Set c = (c 1 , c 2 , . . . , c d ). Then we have: Let us briefly highlight that the structure of the asymptotic bias component b(c) is quite strongly constrained by the values of the second-order parameters ρ j : for all its components to be non-zero (or equivalently, for all the constants λ j to be non-zero), all the second-order parameters ρ j should be equal. This is due to the fact that the k j are proportional and each function |A j | is regularly varying with index ρ j . For the same reason, if ρ * = max 1≤j ≤d ρ j , then b j (c) = 0 whenever j is such that ρ j < ρ * .
Our Theorem 4 contains Theorem 3.3 and Corollary 3.4 in Dematteo and Clémençon (2016), which are stated under more restrictive conditions, including equality of all tail indices γ j , a multivariate regular variation assumption on X, von Mises conditions on each marginal distribution of X, and a uniform analogue of condition J (R) [note also that Corollary 3.4 in Dematteo and Clémençon (2016) should, with their notation, read i,j = c i c j ν i,j c 1/α i , c 1/α j in the case i < j; as stated, their covariance matrix may fail to be positive semi-definite, for instance for d = 2 and small c 2 . Compare with Proposition 1 in Kinsvater et al. (2016)]. Theorem 4 also contains Proposition 1 in Jiang et al. (2017), which is limited to the case d = 2 and X (2) = −X (1) . Another result related to Theorem 4 is Proposition 3 in Hoga (2018), although the present result and that of Hoga (2018) are more difficult to compare since the latter is stated within the particular context of time series analysis. Our proof of Theorem 4 is also conceptually less involved than that of Dematteo and Clémençon (2016), which rests upon a multivariate functional central limit theorem (see Theorem 7.1 and Corollary 7.2 therein). Finally, and without taking the time series framework into account, the result of Hoga (2018) is based on delicate arguments involving a multivariate, joint Skorokhod construction of Gaussian approximations for marginal tail empirical processes. Our proof, meanwhile, rests on the standard Lyapunov central limit theorem and Corollary 1, whose proof is based only on a univariate Gaussian approximation of the tail empirical process. It should nonetheless be made clear once again that Proposition 3 in Hoga (2018) holds in a framework of β−mixing time series, and as such is not restricted to independent and identically distributed observations, unlike our Theorem 4. Section 4 discusses a possible way of extending Theorem 4 to the time series context as well as the implications this would have on the other results presented within this paper.
Results such as Theorem 4 may be applied to define a test of tail homogeneity, that is, equality of tail indices across marginals. If there is no evidence to reject this assumption, one may then define improved estimators of the common value of the tail index by pooling together the marginal Hill estimators. These ideas are considered in Dematteo and Clémençon (2016) and Kinsvater et al. (2016). To illustrate how our results can be applied to such problems, we state a corollary of Theorem 4 in the case d = 2 of a bivariate distribution with heavy-tailed marginals.

Corollary 4 Suppose that X and Y have continuous distribution functions F X and F Y , which satisfy conditions C 2 (γ X , ρ X , A X ) and C 2 (γ Y , ρ Y , A Y ), respectively. Assume that there is a function
Let k X = k X (n), k Y = k Y (n) be such that k X , k Y → ∞ and:

Let also
denote the ranks of observations X i and Y i and R k (x, y) be the nonparametric estimator of R(x, y) defined by (see e.g. Drees and Huang (1998)). Then the following hold: (i) Unless c = 1 and (X, Y ) are asymptotically perfectly dependent (in the sense that R(x, y) = min(x, y)), we have: and has the minimal possible asymptotic variance among the class of convex combinations of γ X (k X ) and γ Y (k Y ).
Corollary 4(i) complements Proposition 2 in Kinsvater et al. (2016) in the case d = 2: the latter is stated under significantly stronger assumptions, although it includes the possibility of different sample sizes for the X and Y samples, which may be relevant in specific applied setups. Corollary 4(ii) provides, under the tail homogeneity condition γ X = γ Y = γ , a simple expression of the convex combination of γ X (k X ) and γ Y (k Y ) that is optimal for the estimation of γ in terms of asymptotic variance. It therefore constitutes, in the case d = 2, an explicit version of the BEAR estimator of Dematteo and Clémençon (2016, pp.159-160). With our notation, this estimator can be written, for a general value of d, as and V is a consistent estimator of the asymptotic covariance matrix V defined in Theorem 4. Let us highlight that the estimator γ (k X , k Y ) is analysed here under weaker conditions than those of Dematteo and Clémençon (2016) and, due to the particular choice d = 2, without having to resort to a numerical optimisation routine for the calculation of the estimator. It is worth noting that since the estimator analysed in Corollary 4(ii) indeed has an asymptotic variance which is lower than each of the asymptotic variances of γ X (k X ) and γ Y (k Y ), and we can quantify this improvement. More precisely, since the function r → [min(c, 1) − r] 2 1 + c − 2r is decreasing on [0, min(c, 1)], the improvement in asymptotic variance brought by the use of the pooled estimator γ (k X , k Y ) gets stronger as the asymptotic dependence structure of (X, Y ) gets closer to asymptotic independence. In the case of asymptotic independence, we have This is rather intuitive: since γ (k X , k Y ) is then essentially the weighted average of two independent quantities, made respectively of k X independent log-excesses from X (with weighting c/(1 + c)) and k Y ≈ k X /c independent log-excesses from Y (with weighting 1/(1 + c)), each with individual variance γ 2 by Corollary 1, we should expect the total asymptotic variance to be just as we found through our rigorous asymptotic analysis.

Convergence of a tail index estimator for right-truncated samples
We now illustrate how our results can be used to complete the analysis of the convergence of a tail index estimator when the available data is subject to random right-truncation. The context we consider is the following: let (Y 1 , T 1 ), . . . , (Y n , T n ) be n independent copies of a random pair (Y, T ), where Y and T are independent, nonnegative, and have continuous marginal distribution functions F Y and F T . Assume also that Y and T have heavy-tailed distributions with tail indices γ Y and γ T . In the random right-truncation problem considered here, it is assumed that the pair (Y i , T i ) is observed if and only if Y i ≤ T i ; otherwise, no information on this pair is available at all. The objective is to estimate γ Y . This problem has only been considered very recently, starting with Gardes and Stupfler (2015), who were ultimately interested in the estimation of extreme quantiles of Y . Several studies have since then proposed alternative techniques for tail index estimation in this context; we refer to Benchaira et al. (2016a, b), Worms and Worms (2016) and Haouas et al. (2017). The random right-truncation context should not be mistaken for random right-censoring, where the available information is made of the pairs (min(Y i , T i ), 1 {Y i ≤T i } ), 1 ≤ i ≤ n. The latter context has received a substantial amount of attention over the last decade: we refer to Beirlant et al. (2007Beirlant et al. ( , 2010Beirlant et al. ( , 2016, Einmahl et al. (2008), Gomes andNeves (2011), Ndao et al. (2014), Sayah et al. (2014), Worms and Worms (2014), Brahimi et al. (2015), Ndao et al. (2016), Stupfler (2016), Dierckx et al. (2018) and Stupfler (2019).
Our focus in this section is to revisit the asymptotic properties of the estimator of Gardes and Stupfler (2015) using the tools we have developed in Sections 2 and 3.1. Let us introduce some notation beforehand: let N be the total (random) number of observed pairs It is straightforward to show the following: • N has a binomial distribution with parameters n and p := P(Y ≤ T ), where we assume throughout this section that p > 0; It is also reasonably easy to show that in this context, F * Y and F * T are heavy-tailed, with tail indices γ * Y := γ Y γ T /(γ Y +γ T ) and γ T (see Lemma 3 in Gardes and Stupfler (2015), as well as the Introduction of Benchaira et al. (2015)). Rewriting this equality as Here (k m ) and (k m ) are two nonrandom sequences of integers and (k N , k N ) := (k m , k m ) given N = m. Gardes and Stupfler (2015) examine the asymptotic distribution of this estimator, under the condition that k m /k m → 0 or k m /k m → 0 as m → ∞. This technical restriction was imposed because the analysis of the dependence between the two Hill estimators γ * Y (k N ) and γ T (k N ) is difficult; assuming that either k m /k m → 0 or k m /k m → 0 ensures that one of the estimators converges at a slower rate than the other, and therefore imposes its asymptotic distribution to γ Y (k N , k N ). Benchaira et al. (2015) subsequently studied the asymptotic distribution of this estimator when k m = k m and under a condition on the asymptotic dependence between Y * and T * . The main result of this section, which we state now, examines the general case k m /k m → c ∈ [0, ∞].
Theorem 5 Assume that Y * and T * respectively satisfy conditions C 2 (γ * Y , ρ * Y , A * Y ) and C 2 (γ T , ρ * T , A * T ). Let (k m ) and (k m ) be two sequences of integers tending to infinity such that Then the following hold, as n → ∞: With convergences (i) and (ii), we essentially find back Theorem 3 in Gardes and Stupfler (2015), which was stated under a slightly different set of second-order extreme value conditions. The value of our result, however, mainly resides in the general convergence result (iii), whose proof, unlike that of Theorem 2.1 in Benchaira et al. (2015), does not hinge on a Gaussian construction for the weighted tail copula process. Most importantly for practical setups, and contrary to Theorem 2.1 in Benchaira et al. (2015), Theorem 5(iii) does not rely on any assumption on the form of asymptotic dependence between Y * and T * . The reason for this is that the combination of the independence assumption between Y and T with the heavytailed framework is actually sufficient to ensure that Y * and T * are asymptotically independent, in the sense that for any (x 1 , x 2 ) ∈ [0, ∞) 2 (see the proof of Theorem 5). With this in mind, our result offers a simplified expression of the asymptotic variance of γ Y (k N , k N ), compared to the one provided in Benchaira et al. (2015). We also point out that, taking this asymptotic independence result into account, the asymptotic distribution we find in the case c = 1 essentially coincides with that of Benchaira et al. (2015), although it should be pointed out that the bias term μ therein should read like a difference rather than a sum (this is revealed by inspecting Equation (3.10) therein) and their variance σ 2 should be divided by 2 (otherwise the Gaussian representation stated early in their Theorem 2.1 would contradict their asymptotic normality result). Finally, our result includes the possibility of taking proportional sequences k N and k N , which is useful since one may obtain better finite-sample performance by selecting a value k N different from k N if the marginal distributions of Y * and T * have very different second-order parameters ρ * Y and ρ * T (a related point about the estimation of a common tail index based on several samples of data is made in the Introduction of Dematteo and Clémençon (2016).

Joint convergence of marginal conditional tail moments
Another consequence of our theoretical results can be formulated in terms of the joint convergence of high marginal Conditional Tail Moments (CTMs). As in Section 3.1, , be a sample of independent copies of a random vector X = X (1) , X (2) , . . . , X (d) such that each component X (j ) has a continuous distribution function F j and satisfies condition C 2 (γ j , ρ j , A j ). We let also, for 1 ≤ j ≤ d, k j = k j (n) → ∞ be such that k j /n → 0. The CTM of order a j of the variable X (j ) above its high quantile U j (n/k j ) is then With the language of Definition 1, this quantity is exactly the expected f a j −shortfall of X (j ) above the quantile U j (n/k j ), where f a j (x) = x a j for x > 0. Following the ideas of Section 2, its empirical counterpart is This estimator is studied in El Methni et al. (2014) and El Methni and Stupfler (2017). The asymptotic results therein provide information on the joint convergence of estimators of several CTMs of a single variable X. Our next result below adopts the point of view of joint convergence of these estimators across marginals.

Theorem 6
Work under the conditions of Theorem 4, with k j A j (n/k j ) → λ j ∈ R replaced by the weaker assumption k j A j (n/k j ) = O(1). Let also a 1 , a 2 , . . . , a d > 0 be such that 2a j γ j < 1 for any j . Then A corollary of Theorem 6 is the following result on the joint convergence of several CTM estimators CTM a j (U (n/k)) = 1 k k i=1 X a j n−i+1,n of a single heavy-tailed random variable X. This corollary immediately follows from the fact that the random pair (X, X) is asymptotically perfectly dependent, in the sense that lim Corollary 5 Suppose that X satisfies condition C 2 (γ, ρ, A). Let k = k(n) → ∞ be such that k/n → 0 and √ kA(n/k) = O(1) . Let also a 1 , a 2 , . . . , a d > 0 be such that γ < 1/2a j for any j . Then This result complements Theorem 2 in El Methni and Stupfler (2017), for distortion functions all equal to the identity function. A related result, in the presence of a finite-dimensional covariate, is Theorem 1 in El Methni et al. (2014).
While Theorem 6 is informative regarding the structure of the dependence between CTM estimators at high levels, its scope may however be limited from the practical point of view, as the focus in applied setups is usually on risk measures at out-ofsample levels. To put it differently, one would generally want to estimate a CTM of the form CTM (j ) a j (U j (1/p n )) = E([X (j ) ] a j | X (j ) > U j (1/p n )) where p n → 0 is such that np n = O(1), a typical choice then being p n = 1/n (see e.g. recently Cai et al. (2015) and Gong et al. (2015)). In this context, following El Methni et al. (2014)  Here γ j (k j ) is the j th marginal Hill estimator introduced in Section 3.1. A combination of Theorems 4 and 6 makes it possible to obtain the following joint asymptotic normality result for these extrapolated estimators across marginals. a 1 , a 2 , . . . , a d > 0 be such that 2a j γ j < 1 for any j . Assume also that ρ j < 0 for any j , and that np n → C < ∞ and √ k 1 / log(k 1 /[np n ]) → ∞. Then

Corollary 6 Work under the conditions of Theorem 4. Let
with the notation of Theorem 4.
In the case a 1 = a 2 = · · · = a d , such a result may then be used to test the equality of CTMs across marginals. For instance, if d = 2 and a 1 = a 2 = 1, one may test whether the two variables X (1) and X (2) have the same Expected Shortfall at high levels. The possibility of assessing the equality of financial tail risk within a multivariate context using such results is explored in Sections 3 and 4 in Hoga (2018); a related result is Proposition 3 in Hoga (2018), albeit for a different type of estimator of the Expected Shortfall, and in a time series context.

Conclusion and discussion
Writing empirical average f −excesses and empirical expected f −shortfalls in terms of their pseudo-estimator counterparts with non-random threshold, as in Theorems 1 and 2, makes it possible to obtain joint convergence theorems for certain extreme value estimators under weak assumptions, and with conceptually simple proofs. The results of this paper, however, are limited to independent and identically distributed random variables. As Hoga (2018) argues, assessing tail homogeneity of the marginals of a random vector is of particular interest in certain financial applications, where it is important to develop asymptotic theory for dependent but stationary sequences. Since the cornerstone of the proofs of Theorems 1 and 2 is a weighted Gaussian approximation of the univariate tail empirical process, it is reasonable to expect that analogues of these two results can also be shown in this kind of framework: for instance, Drees (2003) proves such an approximation in the case of β−mixing sequences under certain regularity conditions and upper bounds on the size of clusters of exceedances. This would therefore result in analogues of Theorems 1 and 2 and ultimately of Theorems 4 and 6. We note though that the expression of the asymptotic covariance matrices in such analogues of Theorems 4 and 6 would certainly be more complicated than in the present paper, which, among others, makes the derivation of a testing procedure of tail homogeneity a difficult task. Hoga (2018) avoids this issue by resorting to a self-normalised test statistic in the spirit of Shao (2010), although the limiting distribution does not have a simple expression. An interesting open question is to, based on an analogue of Theorem 4 in a dependent and stationary setting, construct an asymptotically chi-squared test statistic of tail homogeneity and contrast its finite-sample performance with that of the test statistic of Hoga (2018).
It would also be potentially very fruitful to extend the present results to the regression case, in order to allow a general theoretical analysis of conditional, randomly thresholded tail index estimators based on kernel smoothing. There has been a significant body of work in the area of conditional extreme value analysis over the last decade, among which we can refer to Daouia et al. (2011Daouia et al. ( , 2013, Goegebeur et al. (2014Goegebeur et al. ( , 2015. However, the estimators introduced in the first two of these papers only take a fixed number of order statistics into account, and the theoretical results of the latter two only apply to a non-randomly thresholded conditional version of the Hill estimator. By contrast, the estimators of Stupfler (2013) and  are built on non-random thresholds and their asymptotic normality is studied, but they do not include the possibility of kernel smoothing. The explanation for this is that kernel smoothing adds dependent and non-stationary weightings to the log-spacings appearing in the randomly thresholded estimator, making its theoretical analysis a very difficult mathematical task. Developing an extension of the present theoretical results to the regression case would allow to bridge the gap between the aforementioned studies, and thus provide stronger theoretical ground for conditional extreme value analysis based on kernel smoothing.

Supplementary Material A Supplementary
Material document contains all necessary proofs of the results of this paper.