Skip to main content

Estimation of extremes for Weibull-tail distributions in the presence of random censoring

Abstract

The Weibull-tail class of distributions is a sub-class of the Gumbel extreme domain of attraction, and it has caught the attention of a number of researchers in the last decade, particularly concerning the estimation of the so-called Weibull-tail coefficient. In this paper, we propose an estimator of this Weibull-tail coefficient when the Weibull-tail distribution of interest is censored from the right by another Weibull-tail distribution: to the best of our knowledge, this is the first one proposed in this context. A corresponding estimator of extreme quantiles is also proposed. In both mild censoring and heavy censoring (in the tail) settings, asymptotic normality of these estimators is proved, and their finite sample behavior is presented via some simulations.

This is a preview of subscription content, access via your institution.

References

  1. Brahimi, B., Meraghni, D., Necir, A.: Approximations to the tail index estimator of a heavy-tailed distribution under random censoring and application. Math. Methods Statist. 24, 266–279 (2015)

    MathSciNet  Article  MATH  Google Scholar 

  2. Brahimi, B., Meraghni, D., Necir, A.: Nelson-Aalen tail product-limit process and extreme value index estimation under random censorship. Unpublished manuscript, available on the ArXiv archive : arXiv:1502.03955v2 (2016)

  3. Brahimi, B., Meraghni, D., Necir, A., Soltane, L.: Tail empirical process and a weighted extreme value index estimator for randomly right-censored data. Unpublished manuscript, available on the ArXiv archive : arXiv:1801.00572(2018)

  4. Beirlant, J., Dierckx, G., Guillou, A., Fils-Villetard, A.: Estimation of the extreme value index and extreme quantiles under random censoring. Extremes 10, 151–174 (2007)

    MathSciNet  Article  MATH  Google Scholar 

  5. Beirlant, J., Broniatowski, M., Teugels, J., Vynckier, P.: The mean residual life function at great age : applications to tail estimation. Journal of Statistical Planning and Inference 45, 21–48 (1995)

    MathSciNet  Article  MATH  Google Scholar 

  6. Beirlant, J., Goegebeur, Y., Segers, J., Teugels, J.: Statistics of extremes: theory and applications. Wiley (2004)

  7. Beirlant, J., Guillou, A., Toulemonde, G.: Peaks-over-threshold modeling under random censoring. Communications in Statistics - Theory and Methods 39, 1158–1179 (2010)

    MathSciNet  Article  MATH  Google Scholar 

  8. Beirlant, J., Bardoutsos, A., de Wet, T., Gijbels, I.: Bias reduced tail estimation for censored Pareto type distributions. Stat. Prob. Lett. 109, 78–88 (2016)

    MathSciNet  Article  MATH  Google Scholar 

  9. Beirlant, J., Maribe, G., Verster, A.: Penalized bias reduction in extreme value estimation for censored Pareto-type data, and long-tailed insurance applications. Insurance Math. Econom. 78, 114–122 (2018)

    MathSciNet  Article  MATH  Google Scholar 

  10. Beirlant, J., Worms, J., Worms, R.: Asymptotic distribution for an extreme value index estimator in a censorship framework. Journal of Statistical Planning and Inference 202, 31–56 (2019)

    MathSciNet  Article  MATH  Google Scholar 

  11. Bingham, N.H., Goldie, C.M., Teugels, J.L.: Regular variation. Cambridge University Press, Cambridge (1987)

    Book  MATH  Google Scholar 

  12. Csorgo, S.: Probability theory. Independence, interchangeability, martingales. Ann. Stat. 24(6), 2744–2778 (1996)

    Article  Google Scholar 

  13. de Haan, L., Ferreira, A.: Extreme value theory : an introduction springer science + business media (2006)

  14. Diebolt, J., Gardes, L., Girard, S., Guillou, A.: Bias-reduced Estimators of the Weibull tail-Coefficient. Test 17, 311–331 (2008)

    MathSciNet  Article  MATH  Google Scholar 

  15. Dierckx, G., Beirlant, J., De Waal, D., Guillou, A.: A new estimation method for Weibull-type tails based on the mean excess function. Journal of Statistical Planning and Inference 139, 1905–1920 (2009)

    MathSciNet  Article  MATH  Google Scholar 

  16. Einmahl, J., Fils-Villetard, A., Guillou, A.: Statistics of Extremes under Random Censoring. Bernoulli 14, 207–227 (2008)

    MathSciNet  Article  MATH  Google Scholar 

  17. Gardes, L., Girard, S.: Estimating extreme quantiles of Weibull-tail distributions. Communications in Statistics : Theory and Methods 34, 1065–1080 (2005)

    MathSciNet  Article  MATH  Google Scholar 

  18. Girard, S.: A Hill type estimator of the Weibull-tail coefficient. Communications in Statistics : Theory and Methods 33(2), 205–234 (2004a)

    MathSciNet  Article  MATH  Google Scholar 

  19. Girard, S.: A Hill type estimator of the Weibull-tail coefficient. HAL archive version : hal-00724602 (2004b)

  20. Goegebeur, Y., Guillou, A.: Goodness-of-fit testing for Weibull-type behavior. Journal of Statistical Planning and Inference 140, 1417–1436 (2010)

    MathSciNet  Article  MATH  Google Scholar 

  21. Goegebeur, Y., Beirlant, J., de Wet, T.: Generalized kernel estimators for the Weibull-Tail coefficient. Communications in Statistics : Theory and Methods 39, 3695–3716 (2010)

    MathSciNet  Article  MATH  Google Scholar 

  22. Gomes, M.I., Neves, M.M.: Estimation of the extreme value index for randomly censored data. Biotechnol. Lett. 48(1), 1–22 (2011)

    Google Scholar 

  23. Klein, J.P., Moeschberger, M.L.: Data sets for survival analysis - techniques for censored and truncated data. Springer Second Edition (2005)

  24. Ndao, P., Diop, A., Dupuy, J.-F.: Nonparametric estimation of the conditional tail index and extreme quantiles under random censoring. Comput. Stat. Data Anal. 79, 63–79 (2014)

    MathSciNet  Article  MATH  Google Scholar 

  25. Ndao, P., Diop, A., Dupuy, J.-F.: Nonparametric estimation of the conditional extreme-value index with random covariates and censoring. Journal of Statistical Planning and Inference 168, 20–37 (2016)

    MathSciNet  Article  MATH  Google Scholar 

  26. Reiss, R.: Approximate distribution of order statistics. Springer-Verlag (1989)

  27. Reynkens, T., Verbelen, R., Beirlant, J., Antonio, K.: Modelling censored losses using splicing: a global fit strategy with mixed Erlang and extreme value distributions. Insurance Math. Econom. 77, 65–77 (2017)

    MathSciNet  Article  MATH  Google Scholar 

  28. Sayah, A., Yahia, D., Brahimi, B.: On robust tail index estimation under random censorship. Afrika Statistika 9, 671–683 (2014)

    MathSciNet  Article  MATH  Google Scholar 

  29. Stupfler, G.: Estimating the conditional extreme-value index in presence of random right-censoring. J. Multivar. Anal. 144, 1–24 (2016)

    Article  MATH  Google Scholar 

  30. Stupfler, G.: On the study of extremes with dependent random right-censoring. Extremes 22, 97–129 (2019)

    MathSciNet  Article  MATH  Google Scholar 

  31. Worms, J., Worms, R.: New estimators of the extreme value index under random right censoring, for heavy-tailed distributions. Extremes 17(2), 337–358 (2014)

    MathSciNet  Article  MATH  Google Scholar 

  32. Worms, J., Worms, R.: Moment estimators of the extreme value index for randomly censored data in the Weibull domain of attraction. Unpublished manuscript, available on the ArXiv archive, arXiv:1506.03765 (2015)

  33. Worms, J., Worms, R.: Extreme value statistics for censored data with heavy tails under competing risks. Metrika 81(7), 849–889 (2018)

    MathSciNet  Article  MATH  Google Scholar 

  34. Zhou, M.: Some properties of the Kaplan-Meier estimator for independent non identically distributed random variables. Ann. Statist. 19(4), 2266–2274 (1991)

    MathSciNet  Article  MATH  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Rym Worms.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix

Let us here first summarize the contents of the Appendix. It is composed of 3 main parts.

Part A contains the proof of Theorem 1 : after showing that the statistic Δn (defined in formula (A.3)) is the main contributor to the behavior of \(\hat \theta _{X,k}\), three propositions are then stated, and proved. Two important lemmas are also stated in the proof of the first and main proposition (which describes the asymptotic distribution of Δn) : the first one (Lemma 1) is concerned with all the “remainder” terms, and the second one (Lemma 2) is concerned with the asymptotic distribution of the proportion \(\hat p_{k}\) of uncensored observations in the tail, depending on the position of 𝜃X with respect to 𝜃C. These 2 lemmas are proved in parts C.2 and C.3 of the Appendix.

Part B is then devoted to the proof of Theorem 2.

Part C finally contains other lemmas which are repeatedly useful in the first two parts. In Appendix C.1, the important Lemmas 3 and 4 describe sharp second order properties of the different slowly varying functions which are handled in this work, and of the theoretical probability function p(⋅) of being uncensored in the tail. In Appendix C.4, the useful Lemmas 5, 6 and 7 are stated (they are issued from the literature, but are provided for ease of reference).

Appendix A. Proof of Theorem 1

Remind that

$$ \hat{\theta}_{X,k} = \frac{ \frac{1}{k} {\sum}_{j=1}^{k} \left( \log Z_{n-j+1,n} - \log Z_{n-k,n} \right)} { \frac{1}{k} {\sum}_{j=1}^{k} \left( \log \hat{{\Lambda} }_{nF}(Z_{n-j+1,n}) - \log \hat{{\Lambda}0}_{nF}(Z_{n-k,n}) \right)}. $$

Introducing E1,…, Enn independent standard exponential random variables, such that \(Z_{i}={\Lambda }^{-}_{H}(E_{i})\), we have, since \({\Lambda }^{-}_{H}(x)= x^{\theta _{Z}} l(x)\) and \({\Lambda }_{F} \circ {\Lambda }^{-}_{H} (x)= x^{a} \tilde {l}(x)\) with l and \(\tilde {l}\) slowly varying at infinity,

$$ \begin{array}{@{}rcl@{}} \log Z_{n-j+1,n} - \log Z_{n-k,n} & =& \theta_{Z} \log \left( \frac{E_{n-j+1,n}}{E_{n-k,n}} \right) + \log \left( \frac{l(E_{n-j+1,n})}{l(E_{n-k,n})} \right) \end{array} $$
(A.1)
$$ \begin{array}{@{}rcl@{}} \log {\Lambda}_{F}(Z_{n-j+1,n}) - \log {\Lambda}_{F}(Z_{n-k,n}) & =& a \log \left( \frac{E_{n-j+1,n}}{E_{n-k,n}} \right) + \log \left( \frac{\tilde{l}(E_{n-j+1,n})}{\tilde{l}(E_{n-k,n})} \right), \end{array} $$
(A.2)

Now, let

$$ M_{n}= \frac{1}{k} \sum\limits_{j=1}^{k} \log \left( \frac{E_{n-j+1,n}}{E_{n-k,n}} \right), $$

and

$$ {\Delta}_n= \frac{1}{k} \sum\limits_{j=1}^{k} \log \left( \frac{\hat{{\Lambda} }_{nF}(Z_{n-j+1,n})}{ {\Lambda} _F(Z_{n-j+1,n})} \frac{ {\Lambda} _F(Z_{n-k,n}) }{\hat{{\Lambda} }_{nF}(Z_{n-k,n})} \right). $$
(A.3)

Since the denominator in the expression for \(\hat \theta _{X,k}\) above equals

$$ \begin{array}{@{}rcl@{}} &&\frac{1}{k} \sum\limits_{j=1}^{k} \left( \log \hat{{\Lambda} }_{nF}(Z_{n-j+1,n}) - \log \hat{{\Lambda} }_{nF}(Z_{n-k,n}) \right) \\&&= \frac{1}{k} \sum\limits_{j=1}^{k} \log {\Lambda}_{F}(Z_{n-j+1,n}) - \log {\Lambda} _{F}(Z_{n-k,n}) + {\Delta}_{n}, \end{array} $$

we obtain, using (A.1), (A.2) and relation 𝜃X = 𝜃Z/a,

$$ \begin{array}{@{}rcl@{}} \hat{\theta}_{X,k} -\theta_{X} & = & \displaystyle \frac{\theta_{Z} M_{n} + R_{n,l}}{a M_{n} + R_{n,\tilde{l}} +{\Delta}_{n} } -\theta_{X}\\ & =& \displaystyle \theta_{X} \frac{\theta^{-1}_{X} R_{n,l} - R_{n,\tilde{l}} - {\Delta}_{n}}{a M_{n} + R_{n,\tilde{l}} +{\Delta}_{n}}\\ &= & \displaystyle -\frac{\theta_{X}}{a} {\Delta}_{n} \left( M_{n} + a^{-1} R_{n,\tilde{l}} +a^{-1} {\Delta}_{n} \right)^{-1} + \frac{R_{n,l} - \theta_{X} R_{n,\tilde{l}}}{a M_{n} + R_{n,\tilde{l}} +{\Delta}_{n}}, \end{array} $$

where

$$ R_{n,l} = \displaystyle \frac{1}{k} \sum\limits_{j=1}^{k} \log \left( \frac{l(E_{n-j+1,n})}{l(E_{n-k,n})} \right) \text{and} R_{n,\tilde{l}} = \displaystyle \frac{1}{k} \sum\limits_{j=1}^{k} \log \left( \frac{\tilde{l}(E_{n-j+1,n})}{\tilde{l}(E_{n-k,n})} \right). $$
(A.4)

We thus have the following representation, which shows that the behavior of the estimation error is essentially based on the behavior of the statistic Δn :

$$ \sqrt{k}L_{nk}^{-b}\left( \hat{\theta}_{X,k} -\theta_{X}\right) = \left( -\frac{\theta_{X}}{a}\right) \sqrt{k} L_{nk}^{1-b} {\Delta}_{n} D_{n}^{-1} + \left( \sqrt{k} L_{nk}^{1-b} R_{n,l} - \theta_{X} \sqrt{k} L_{nk}^{1-b} R_{n,\tilde{l}} \right) (aD_{n})^{-1} $$

where the denominator \(D_{n}=L_{nk} M_{n} + a^{-1}L_{nk} R_{n,\tilde {l}} + a^{-1}L_{nk} {\Delta }_{n}\) will turn out to converge to 1. It is now clear that the proof of Theorem 1 then follows from the combination of the following three propositions, the first one being the most important and the longest to establish. These propositions are proved in the next three subsections.

Proposition 1

Under the conditions of Theorem 1 we have, as n tends to infinity,

$$ \begin{array}{@{}rcl@{}} {\Delta}_n \overset{d}{=} \frac{1+o_{\mathbb{P}}(1)}{L_{nk}} \left( \left( L_{nk}^{1-a} \frac{\hat{p}_k}{\tilde{c}} -a \right) - a \left( \bar{E}_n -1 \right) \right) - k^{-1/2} L_{nk}^{b-1}\tilde{\alpha}\left( 1+\frac{1}{\rho}\right) (1+o_{\mathbb{P}}(1)) ~~~~ \end{array} $$
(A.5)

and

$$ \sqrt{k} L_{nk}^{1-b} {\Delta}_{n} \overset{d}{\longrightarrow} N\left( m_{\Delta},\frac{a}{\tilde{c}}\right), $$

where\(\bar E_{n}= \frac {1}{k} {\sum }_{i=1}^{k} E_{i}\) (sample mean of standard exponential variables), and

$$ \hat{p}_{k} := \frac{1}{k} \sum\limits_{j=1}^{k} \delta_{n-j+1,n} \quad \text{and}\quad m_{\Delta} = \left\{ \begin{array}{ll} \displaystyle - \tilde{\alpha} \left( 1+\frac{1}{\rho}\right) -\frac{\theta_{X}}{\theta_{C}} \frac{c_{G}}{{c_{F}^{d}}} \alpha^{\prime} & \text{ if } \theta_{X} < \theta_{C}, \\ 0 & \text{ if } \theta_{X} \geqslant \theta_{C} . \end{array} \right. $$

Please note that the exponential variables Ei appearing in the statement of Proposition 1 above are not the same as those introduced at the beginning of this Section.

Proposition 2

Under the conditions of Theorem 1 we have, as n tends to infinity,

$$ \sqrt{k} L_{nk}^{1-b} R_{n,l} \overset{\mathbb{P}}{\longrightarrow} \left\{ \begin{array}{ll} \alpha & \text{ if } \theta_{X} < \theta_{C} , \\ 0 & \text{ if } \theta_{X} \geqslant \theta_{C}, \end{array} \right. \quad {and} \quad \sqrt{k} L_{nk}^{1-b} R_{n,\tilde{l}} \overset{\mathbb{P}}{\longrightarrow} \left\{ \begin{array}{ll} \tilde{\alpha} & \text{ if } \theta_{X} < \theta_{C} , \\ 0 & \text{ if } \theta_{X} \geqslant \theta_{C} . \end{array} \right. $$

Proposition 3

Under conditionH1, we have\( L_{nk} M_{n} \overset {\mathbb {P}}{\longrightarrow } 1 \), asn tends to infinity.

Remark 1

First, remind that a = 1 and \(\tilde c=1\) when 𝜃X < 𝜃C. Let us highlight that the convergence in distribution of \(\sqrt {k} L_{nk}^{1-b}{\Delta }_{n}\) stated in Proposition 1 comes from the confrontation between the two terms appearing in the representation (A.5) of Δn : the term in \(\hat {p}_{k}\) and the term involving the exponential sample mean. The convergence in distribution of the term involving \(\hat {p}_{k}\) is detailed in Lemma 2 in Subsection Appendix A.1; this will be the leading term only when 𝜃X > 𝜃C (in this setting, the constant b is positive and thus the exponential term vanishes). When 𝜃X < 𝜃C, it will only generate a possible bias, and when 𝜃X = 𝜃C it participates to the asymptotic normality along with the exponential term.

The following corollary is then stated, concerning the statistic RLn defined in Eq. 9 and discussed thereafter. Note that this corollary probably holds under weaker conditions.

Corollary 1

Under the conditions of Theorem 1, asn, we have\(RL_{n} \overset {\mathbb {P}}{\longrightarrow } a\).

Its proof is short, so we will provide it here. With the same notations as in the previous page, we have readily

$$ RL_{n} = \left( \frac{1}{k} \sum\limits_{j=1}^{k} \log\log(n/j) -\log\log(n/k) \right)^{-1} \frac 1{L_{nk}} (aL_{nk} M_{n} + L_{nk} R_{n,\tilde{l}} + L_{nk}{\Delta}_{n} ), $$

where the mean inside the large brackets is equivalent to 1/Lnk (see Girard2004b formula (15), for a proof). The proof of Corollary 1 thus follows from Propositions 1, 2 and 3.

A.1. Proof of Proposition 1

Starting from the definition of Δn in Eq. A.3, we introduce the first remainder term \( R_{1,k}^{({\Delta })}\) by writing

$$ \begin{array}{@{}rcl@{}} {\Delta}_{n} & = & \displaystyle \frac{1}{k} \sum\limits_{j=1}^{k} \log \left( \frac{\hat{{\Lambda} }_{nF}(Z_{n-j+1,n})}{ {\Lambda}_{F}(Z_{n-j+1,n})} \frac{ {\Lambda}_{F}(Z_{n-k,n}) }{\hat{{\Lambda} }_{nF}(Z_{n-k,n})} \right)\\ & = & \displaystyle \frac{1}{k} \sum\limits_{j=1}^{k} \left( \frac{\hat{{\Lambda} }_{nF}(Z_{n-j+1,n})}{ {\Lambda}_{F}(Z_{n-j+1,n})} \frac{ {\Lambda}_{F}(Z_{n-k,n}) }{\hat{{\Lambda} }_{nF}(Z_{n-k,n})} - 1\right) + R_{1,k}^{({\Delta})}. \end{array} $$

Now, using the definition of \(\hat {{\Lambda } }_{nF}\) in Eq. 4, we obtain

$$ \begin{array}{@{}rcl@{}} &&\frac{1}{k} \sum\limits_{j=1}^{k} \left( \hat{{\Lambda} }_{nF}(Z_{n-j+1,n}) - \hat{{\Lambda} }_{nF}(Z_{n-k,n}) \right) \\&&\quad= \frac{1}{k} \sum\limits_{j=1}^{k} \sum\limits_{i=j}^{k} \frac{\delta_{n-j+1,n}}{j}= \frac{1}{k} \sum\limits_{j=1}^{k} \delta_{n-j+1,n} = \hat{p}_{k}. \end{array} $$

Hence, it can easily be checked that

$$ \begin{array}{@{}rcl@{}} \displaystyle \frac{ \hat{{\Lambda} }_{nF}(Z_{n-k,n})}{{\Lambda}_{F}(Z_{n-k,n})} \left( {\Delta}_{n} - R_{1,k}^{({\Delta})} \right) = \displaystyle \frac{ \hat{p}_{k}}{{\Lambda}_{F}(Z_{n-k,n})} - \frac{1}{k} \sum\limits_{j=1}^{k} \left( \frac{{\Lambda}_{F}(Z_{n-j+1,n})}{{\Lambda}_{F}(Z_{n-k,n})} -1 \right) + R_{2,k}^{({\Delta})}, \end{array} $$

where

$$ R_{2,k}^{({\Delta})} = \frac{1}{{\Lambda}_{F}(Z_{n-k,n})} \frac{1}{k} \sum\limits_{j=1}^{k} \left( \hat{{\Lambda} }_{nF}(Z_{n-j+1,n}) - {\Lambda}_{F}(Z_{n-j+1,n}) \right) \left( \frac{{\Lambda}_{F}(Z_{n-k,n})}{{\Lambda}_{F}(Z_{n-j+1,n})} -1 \right). $$

Since, ∀ 1 ≤ jk + 1, \({\Lambda }_{F}(Z_{n-j+1,n}) = ({\Lambda }_{F} \circ {\Lambda }^{-}_{H}) (E_{n-j+1,n}) = E_{n-j+1,n}^{a} \tilde {l}(E_{n-j+1,n})\), where \(\tilde {l}\) is slowly varying and tends to \(\tilde {c}\) at infinity (cf Lemma 3 in Appendix C.1), then

$$ \begin{array}{@{}rcl@{}} \frac{{\Lambda}_{F}(Z_{n-j+1,n})}{{\Lambda}_{F}(Z_{n-k,n})} -1 &=& \left( \frac{E_{n-j+1,n}}{E_{n-k,n}} \right)^{a} \frac{\tilde{l}(E_{n-j+1,n})}{\tilde{l}(E_{n-k,n})} -1 = \left( \left( \frac{E_{n-j+1,n}}{E_{n-k,n}} \right)^{a} -1 \right) \\&&+ \left( \frac{E_{n-j+1,n}}{E_{n-k,n}} \right)^{a} \left( \frac{\tilde{l}(E_{n-j+1,n})}{\tilde{l}(E_{n-k,n})} -1 \right), \end{array} $$

and, introducing \((\tilde {E}_{1}, \ldots , \tilde {E}_{k})\)k independent standard exponential random variable such that, according to Lemma 5, \((E_{n-j+1,n}-E_{n-k,n})_{1 \leqslant j \leqslant k} \overset {d}{=} (\tilde {E}_{k,k}, {\ldots } , \tilde {E}_{1,k})\), we can write

$$ \frac{ \hat{{\Lambda} }_{nF}(Z_{n-k,n})}{{\Lambda}_{F}(Z_{n-k,n})} \left( {\Delta}_{n} - R_{1,k}^{({\Delta})} \right) \overset{d}{=} \frac{ \hat{p}_{k}}{\tilde{c} E_{n-k,n}^{a}} + R_{3,k}^{({\Delta})} - \frac{1}{k} \sum\limits_{j=1}^{k} \left( a \frac{\tilde{E}_{k-j+1,k}}{E_{n-k,n}} \right) + R_{4,k}^{({\Delta})} + R_{5,k}^{({\Delta})} + R_{2,k}^{({\Delta})}, $$

where

$$ \begin{array}{@{}rcl@{}} R_{3,k}^{({\Delta})} & = & \displaystyle \frac{ \hat{p}_{k}}{E_{n-k,n}^{a}} \left( \frac{1}{\tilde{l}(E_{n-k,n})} - \frac{1}{\tilde{c}} \right)\\ R_{4,k}^{({\Delta})} & = & \displaystyle - \frac{1}{k} \sum\limits_{j=1}^{k} \left( \frac{E_{n-j+1,n}}{E_{n-k,n}} \right)^{a} \left( \frac{\tilde{l}(E_{n-j+1,n})}{\tilde{l}(E_{n-k,n})} -1\right)\\ R_{5,k}^{({\Delta})} & = & \displaystyle - \frac{1}{k} \sum\limits_{j=1}^{k} \left\{ \left( \left( 1+ \frac{\tilde{E}_{k-j+1,k}}{E_{n-k,n}} \right)^{a}-1 \right) - a \frac{\tilde{E}_{k-j+1,k}}{E_{n-k,n}} \right\}. \end{array} $$

Let us summarize :

$$ {\Delta}_{n} \overset{d}{=} \frac{{\Lambda}_{F}(Z_{n-k,n})}{\hat{{\Lambda} }_{nF}(Z_{n-k,n})} \left( \left( \frac{ \hat{p}_{k}}{\tilde{c} E_{n-k,n}^{a}} - \frac{a}{E_{n-k,n}} \frac{1}{k} \sum\limits_{j=1}^{k} \tilde{E}_{j} \right) + \sum\limits_{i=2}^{5} R_{i,k}^{({\Delta})} \right) + R_{1,k}^{({\Delta})}. $$

But

$$ \frac{ \hat{p}_{k}}{\tilde{c} E_{n-k,n}^{a}} - \frac{a}{E_{n-k,n}} \frac{1}{k} \sum\limits_{j=1}^{k} \tilde{E}_{j} = \frac{1}{E_{n-k,n}} \left( \left( L_{nk}^{1-a} \frac{\hat{p}_{k}}{\tilde{c}} -a \right) - a \left( \bar{E}_{n} -1 \right) \right) + R_{6,k}^{({\Delta})}, $$

where \( \bar {E}_{n} = \frac {1}{k} {\sum }_{j=1}^{k} \tilde {E}_{j}\) and

$$ R_{6,k}^{({\Delta})}= \frac{ \hat{p}_{k}}{\tilde{c} E_{n-k,n}} \left( E_{n-k,n}^{1-a} - L_{nk}^{1-a} \right). $$

Finally,

$$ \begin{array}{@{}rcl@{}} {\Delta}_n \overset{d}{=} \frac{{\Lambda} _F(Z_{n-k,n})}{\hat{{\Lambda} }_{nF}(Z_{n-k,n})} \left( \frac{1}{E_{n-k,n}} \left( \left( L_{nk}^{1-a} \frac{\hat{p}_k}{\tilde{c}} -a \right) - a \left( \bar{E}_n -1 \right) \right) + \sum\limits_{i=2}^6 R_{i,k}^{({\Delta})} \right) + R_{1,k}^{({\Delta})}. \end{array} $$
(A.6)

The following lemma, proved in Appendix C.2 shows that that \(\sqrt {k} L_{nk}^{1-b} {\sum }_{i=1}^{6} R_{i,k}^{({\Delta })}\) tends to a constant.

Lemma 1

Under the assumptions of Theorem 1, as n tends to infinity,

$$ \sqrt{k} L_{nk}^{1-b} R_{i,k}^{({\Delta})} \overset{\mathbb{P}}{\longrightarrow} 0, \text{ for } j \in \{ 1,2,5,6 \} $$
$$ \sqrt{k} L_{nk}^{1-b} R_{3,k}^{({\Delta})} \overset{\mathbb{P}}{\longrightarrow} -\frac{\tilde{\alpha}}{\rho} \text{ if } \theta_{X} < \theta_{C} \text{ and } 0 \text{ if } \theta_{X} \geqslant \theta_{C}. $$
$$ \sqrt{k} L_{nk}^{1-b} R_{4,k}^{({\Delta})} \overset{\mathbb{P}}{\longrightarrow} -\tilde{\alpha} \text{ if } \theta_{X} < \theta_{C} \text{ and } 0 \text{ if } \theta_{X} \geqslant \theta_{C}. $$

Moreover, we have \(\sqrt {k} \left (\bar {E}_{n} -1 \right ) \overset {d}{\longrightarrow } N(0,1)\), and, according to Lemmas 6 and 7, both \(\frac {L_{nk}}{E_{n-k,n}} \) and \(\frac {{\Lambda }_{F}(Z_{n-k,n})}{\hat {{\Lambda } }_{nF}(Z_{n-k,n})} \) tend to 1 as n → +. Hence

$$ \begin{array}{@{}rcl@{}} \sqrt{k} L_{nk}^{1-b} {\Delta}_n \overset{d}{=} (1 + o_{\mathbb{P}}(1)) \left( \!D_n - a \sqrt{k} L_{nk}^{-b}\!\left( \bar{E}_n - 1 \right) \right) + (1 + o_{\mathbb{P}}(1)) \sum\limits_{i=1}^6 R_{i,k}^{({\Delta})}, \end{array} $$
(A.7)

where

$$ D_{n}= \sqrt{k}L_{nk}^{-b}\left( L_{nk}^{1-a} \frac{\hat{p}_{k}}{\tilde{c}} -a \right), \text{ with } b= (1-a)/2. $$

It remains to study the behavior of Dn, which is done in the following Lemma which is proved in Appendix C.3.

Lemma 2

Under the assumptions of Theorem 1, we have, asn → +:

  1. 1.

    If𝜃X < 𝜃C, then\(D_{n}=\sqrt {k} (\hat {p}_{k}-1) \displaystyle \overset {\mathbb {P}}{\longrightarrow } - \frac {\theta _{X}}{\theta _{C}} \frac {c_{G}}{{c_{F}^{d}}} \alpha ^{\prime }\).

  2. 2.

    If𝜃X = 𝜃C, then\(D_{n}\! =\! \displaystyle \sqrt {k} \left (\frac {\hat {p}_{k}}{p} -1 \right ) \overset {d}{\longrightarrow } N\left (0,\frac {1-p}{p}\right )\), where\(p= \displaystyle \frac {c_{F}}{c_{F}+c_{G}}\).

  3. 3.

    If𝜃X > 𝜃C (hencea < 1 andb ∈]0, 1/2[), then\(D_{n} \displaystyle \overset {d}{\longrightarrow } N\left (0,\frac {a}{\tilde {c}}\right )\).

Remark 2

Lemma 2 shows, in particular, that the proportion of non-censored data in the tail \(\hat {p}_{k}\) tends to p = 1 if 𝜃X < 𝜃C, to \(p=\frac {c_{F}}{c_{F}+c_{G}}\) if 𝜃X = 𝜃C (in this case, p equals \(\tilde {c}\)) and to p = 0 (with rate \(L_{nk}^{a-1}\)) if 𝜃X > 𝜃C. This has to be linked to the result of Lemma 4 (see Appendix C.1) concerning the limit of the theoretical function \(p(x)=\mathbb {P}(\delta =1|Z=x)\) as x.

When 𝜃X < 𝜃C, Lemma 2 states that Dn converges to a constant : hence, via Lemma 1, the leading term in Eq. A.7 is \(\sqrt {k} L_{nk}^{-b}\left (\bar {E}_{n} -1 \right ) = \sqrt {k} \left (\bar {E}_{n} -1 \right ) \overset {d}{\longrightarrow } N(0,1)\), and we thus obtain as desired \( \sqrt {k} L_{nk}^{1-b} {\Delta }_{n} \overset {d}{\longrightarrow } N(m_{\Delta },1)\), where mΔ is defined in the statement of Proposition 1.

When 𝜃X = 𝜃C, the constant b is still equal to 0 and both Dn and \(\sqrt {k} \left (\bar {E}_{n} -1 \right )\) (which are independent) take part into the asymptotic normality of Δn, with \(D_{n} - a \sqrt {k} \left (\bar {E}_{n} -1 \right ) \overset {d}{\longrightarrow } N(0,\sigma ^{2}_{\Delta })\) in relation (A.7), where \(\sigma ^{2}_{\Delta } = \frac {1-p}{p} +a^{2} = \frac {1}{\tilde {c}}\). Thus, we obtain \(\sqrt {k} L_{nk}^{1-b} {\Delta }_{n} \overset {d}{\longrightarrow } N(0, \frac {1}{\tilde {c}})\).

Finally, when 𝜃X > 𝜃C, \(\sqrt {k} L_{nk}^{-b}\left (\bar {E}_{n} -1 \right ) \) tends to 0 and Dn is thus the leading term : we obtain \(\sqrt {k} L_{nk}^{1-b} {\Delta }_{n} \overset {d}{\longrightarrow } N(0, \frac {a}{\tilde {c}})\) as desired.

This ends the proof of Proposition 1.

A.2. Proof of Proposition 2

Remind from Eq. A.4 that

$$ R_{n,l} = \displaystyle \frac{1}{k} \sum\limits_{j=1}^{k} \log \left( \frac{l(E_{n-j+1,n})}{l(E_{n-k,n})} \right) \text{ and } R_{n,\tilde{l}} = \displaystyle \frac{1}{k} \sum\limits_{j=1}^{k} \log \left( \frac{\tilde{l}(E_{n-j+1,n})}{\tilde{l}(E_{n-k,n})} \right). $$

Let A > 1. Under condition Rl(B, ρ), we have for all 𝜖 > 0 and t sufficiently large

$$ (1-\epsilon) B(t) K_{\rho} (x)\leqslant \frac{l(tx)}{l(t)} -1\leqslant (1+\epsilon) B(t) K_{\rho} (x) \hspace{0.3cm} (\forall 1 \leqslant x \leqslant A). $$

We only prove the result for Rn, l, the proof for \( R_{n,\tilde {l}}\) being very similar, using \(R_{\tilde {l}}(\tilde {B}, \tilde {\rho })\) instead of Rl(B, ρ). Note that

$$ R_{n,l} = \frac{1}{k} \sum\limits_{j=1}^{k} \log \left( 1+ \xi_{j,n} \right), $$

where \( \xi _{j,n}= \frac {l(E_{n-j+1,n})}{l(E_{n-k,n})} -1\) tends to 1 uniformly in j, because l is slowly varying and \(\frac {E_{n-j+1,n}}{E_{n-k,n}}\) tends to 1 uniformly in j, according to Lemma 6 stated in Appendix C.4. Hence, using the following inequality,

$$ x- x^{2}/2 \leqslant \log(1+x) \leqslant x \quad (\forall x \geqslant -1/2) $$

and the fact that \(x_{j,n} := \frac {E_{n-j+1,n}}{E_{n-k,n}} \geqslant 1\) tends to 1 uniformly in j, we obtain that for all 𝜖 > 0 and n sufficiently large,

$$ R_{n,l} \leqslant \frac{1}{k} \sum\limits_{j=1}^{k} \left( \frac{l(E_{n-j+1,n})}{l(E_{n-k,n})} -1 \right) \leqslant (1 + \epsilon) B(E_{n-k,n}) \frac{1}{k} \sum\limits_{j=1}^{k} K_{\rho}(x_{j,n}), $$

omitting the lower bound, which is treated similarly. Since Kρ(1 + x) ∼ x when x tends to 0, then \(K_{\rho }(x_{j,n}) \sim \frac {E_{n-j+1,n}-E_{n-k,n}}{E_{n-k,n}}\), uniformly in j. By Lemma 5 (also stated in Appendix C.4), \(\frac {E_{n-j+1,n}-E_{n-k,n}}{E_{n-k,n}} \overset {d}{=} \frac {\tilde {E}_{k-j+1,k}}{E_{n-k,n}} \). Hence, it is easy to prove that

$$ E_{n-k,n} \frac{1}{k} \sum\limits_{j=1}^{k} K_{\rho}(x_{j,n}) \overset{\mathbb{P}}{\longrightarrow} 1. $$

Since B is regularly varying and \(\frac {E_{n-k,n}}{L_{nk}} \rightarrow 1\), then \(\frac {B(E_{n-k,n})}{E_{n-k,n}} \sim \frac {B(L_{nk})}{L_{nk}}\) and consequently

$$ \begin{array}{@{}rcl@{}} \sqrt{k}L_{nk}^{-b}B(L_{nk}) (1+o_{\mathbb{P}}(1)) &\leqslant& \liminf \sqrt{k}L_{nk}^{1-b} R_{n,l} \leqslant \limsup \sqrt{k}L_{nk}^{1-b} R_{n,l} \\ &&\leqslant \sqrt{k}L_{nk}^{-b}B(L_{nk}) (1+o_{\mathbb{P}}(1)). \end{array} $$

We conclude using assumption Rl(B, ρ) and conditions H2(i), H3(i) or H4(ii), because |B| is regularly varying of order ρ, and we have \(\rho =\tilde \rho \) when 𝜃X𝜃C, and \(\rho \leqslant \tilde \rho \) when 𝜃X > 𝜃C (see Lemma 3 in Appendix C.1).

Appendix A.3. Proof of Proposition 3

Recall that

$$ M_{n}= \frac{1}{k} \sum\limits_{j=1}^{k} \log \left( \frac{E_{n-j+1,n}}{E_{n-k,n}} \right). $$

Since \(\frac {E_{n-j+1,n}}{\log (n/j)} \overset {\mathbb {P}}{\longrightarrow } 1 \) and \(\frac {L_{nk}}{\log (n/j)} \overset {\mathbb {P}}{\longrightarrow } 1 \), uniformly in j = 1,…, k (see Lemma 6), then \(\frac {E_{n-j+1,n}}{E_{n-k,n}}\overset {\mathbb {P}}{\longrightarrow } 1 \), uniformly in j = 1,…, k. By Lemma 5, \((E_{n-j+1,n}-E_{n-k,n})_{1 \leqslant j \leqslant k} \overset {d}{=} (\tilde {E}_{k,k}, {\ldots } , \tilde {E}_{1,k}) \). Therefore

$$ M_{n} \overset{d}{=} \frac{1}{k} \sum\limits_{j=1}^{k} \log \left( 1+ \frac{\tilde{E}_{k-j+1,k}}{E_{n-k,n}} \right) = (1+o_{\mathbb{P}}(1)) \frac{1}{E_{n-k,n}} \frac{1}{k} \sum\limits_{j=1}^{k} \tilde{E}_{j}, $$

with \( \frac {1}{k} {\sum }_{j=1}^{k} \tilde {E}_{j} \rightarrow 1\), a.s. Hence, LnkMn also tends to 1, in probability, as desired.

Appendix B: Proof of Theorem 2

Starting from \(x_{p_{n}} =\overline {F}^{-1}(p_{n}) \) and the definition of \(\hat {x}_{p_{n}}\) in Eq. 5, we obtain

$$ \begin{array}{@{}rcl@{}} \log(x_{p_{n}}) & = & \theta_{X} \log\log(1/p_{n}) + \log(\bar{l}_{F}(-\log(p_{n}))),\\ \log(\hat{x}_{p_{n}}) & =& \hat{\theta}_{X,k} \log\log(1/p_{n}) - \hat{\theta}_{X,k} \log(\hat{{\Lambda} }_{nF}(Z_{n-k,n})) + \log(Z_{n-k,n}). \end{array} $$

Hence

$$ \begin{array}{@{}rcl@{}} \log(\hat{x}_{p_{n}}/x_{p_{n}}) & = & (\hat{\theta}_{X,k}-\theta_{X}) \log\log(1/p_{n}) - \hat{\theta}_{X,k} \log \left( \frac{\hat{{\Lambda} }_{nF}}{{\Lambda}_{F}}(Z_{n-k,n}) \right) \\&&- (\hat{\theta}_{X,k}-\theta_{X}) \log({\Lambda}_{F}(Z_{n-k,n})) + \left\{ - \log(\bar{l}_{F}(\log(1/p_{n}))) \right.\\ &&\left.- \theta_{X} \log(l_{F}(Z_{n-k,n})) \right\}, \\&=:& Q_{1,n} + Q_{2,n} +Q_{3,n} +Q_{4,n} . \end{array} $$

First of all, the result of Theorem 1 implies that

$$ \frac{\sqrt{k}L_{nk}^{-b}}{\log\log(1/p_{n})} Q_{1,n} = \sqrt{k}L_{nk}^{-b} (\hat{\theta}_{X,k}-\theta_{X}) \overset{d}{\longrightarrow} N \left( m,\frac{{\theta_{X}^{2}}}{a \tilde{c}} \right). $$

Then, Lemma 7 (stated in Appendix C.4) implies that \((\hat {{\Lambda } }_{nF}/{\Lambda }_{F})(Z_{n-k,n}) -1 = O_{\mathbb {P}} \left (1/(\sqrt {k}{\Lambda }_{F} (Z_{n-k,n}))\right )\). Hence

$$ \frac{\sqrt{k}L_{nk}^{-b}}{\log\log(1/p_{n})} Q_{2,n} = O_{\mathbb{P}}(1) \frac{1}{L_{nk}^{b} \log\log(1/p_{n}) {\Lambda}_{F}(Z_{n-k,n})} \overset{\mathbb{P}}{\longrightarrow} 0. $$

Now, remind that \( {\Lambda }_{F}(Z_{n-k,n}) = {\Lambda }_{F} \circ {\Lambda }_{H}^{-} (E_{n-k,n}) = E_{n-k,n}^{a} \tilde {l}(E_{n-k,n})\). Hence, the asymptotic normality of \((\hat {\theta }_{X,k}-\theta _{X}) \) yields

$$ \frac{\sqrt{k}L_{nk}^{-b}}{\log\log(1/p_{n})} Q_{3,n} = O_{\mathbb{P}}(1) \frac{\log(L_{nk})}{\log\log(1/p_{n})} \left( a\frac{\log(E_{n-k,n})}{\log(L_{nk})} + \frac{\log(\tilde{l}(E_{n-k,n}))}{\log(L_{nk})} \right). $$

The additional condition \(H^{\prime }_{1}\) of Theorem 2, along with Lemma 6, imply that this term tends to 0 in probability.

Finally, Lemma 3 implies that

$$ Q_{4,n} =- \log\left( 1- \log(1/p_{n})^{\theta_{X} \rho_{F}} \bar{v}(\log(1/p_{n})\right) - \theta_{X} \log\left( 1- Z_{n-k,n}^{\rho_{F}} v(Z_{n-k,n})\right), $$

where v and \(\bar {v}\) are slowly varying. Hence, \( \frac {\sqrt {k}L_{nk}^{-b}}{\log \log (1/p_{n})} Q_{4,n} \) tends to 0 as soon as there exist some 0 < δ < 1 such that \(\frac {\sqrt {k}L_{nk}^{-b}}{\log \log (1/p_{n})} (\log {1/p_{n}})^{\theta _{X} \rho _{F}+\delta } = O(1)\) and \(\frac {\sqrt {k}L_{nk}^{-b}}{\log \log (1/p_{n})} Z_{n-k,n}^{\rho _{F} + \delta } =O_{\mathbb {P}}(1)\). Remind that \(Z_{n-k,n} = E^{\theta _{Z}}_{n-k,n} l(E_{n-k,n})\). Hence, condition \(H^{\prime }_{1}\) guarantees that we only need to show that \(\sqrt {k}L_{nk}^{-b+ \theta _{X} \rho _{F}} = O(1)\) and \(\sqrt {k}L_{nk}^{-b+ \theta _{Z} \rho _{F}} = O(1)\). When 𝜃X = 𝜃Z < 𝜃C, this is due to the additional condition H2(iv). When 𝜃X = 𝜃Z = 𝜃C, it is due to condition H3(i). Finally, when 𝜃X > 𝜃Z = 𝜃C, it is due to H4(ii).

Appendix C: More technical aspects

C.1 Details on the second order properties

Remind that the starting assumption of this paper is relation (6),

$$ {\Lambda}_{F}(x) = x^{1/\theta_{X}} l_{F}(x) \ \text{ and } \ {\Lambda}_{G}(x) = x^{1/\theta_{C}} l_{G}(x), $$

where lF and lG are slowly varying. It is then easy to prove that

$$ \begin{array}{@{}rcl@{}} {\Lambda}_{F}^{-}(x) &=& x^{\theta_{X}} \bar{l}_{F}(x), \ {\Lambda}_{G}^{-}(x) = x^{\theta_{C}} \bar{l}_{G}(x), \ {\Lambda}_{H}(x) = x^{1/\theta_{Z}} l_{H}(x) , {\Lambda}_{H}^{-}(x) \\&=& x^{\theta_{Z}} l(x) \text{ and } {\Lambda}_{F} \circ {\Lambda}_{H}^{-}(x) = x^{a} \tilde{l} (x), \end{array} $$

where 𝜃Z = min(𝜃X, 𝜃C), a = 𝜃Z/𝜃X, and \(\bar {l}_{F}\), \(\bar {l}_{G}\), l and \(\tilde {l}\) are slowly varying.

More precisely, we have the following Lemma, under the second order condition (7), which is called upon in several occasions in this paper.

Lemma 3

Under Assumptions (A1) and (A2), we have,

$$ \begin{array}{ccrcl} l_{F}(x) = c_{F}(1-x^{\rho_{F}}v(x)) & \text{ and } & l_{G}(x) & = & c_{G}(1-x^{\rho_{G}} v(x)),\\ \bar{l}_{F}(x) = c_{F}^{-\theta_{X}}(1-x^{\theta_{X} \rho_{F}} v(x)) & \text{ and } & \bar{l}_{G}(x) & = & c_{G}^{-\theta_{C}}(1-x^{\theta_{C} \rho_{G}} v(x)),\\ l_{H}(x) =c_{H}(1-x^{\rho_{H}} v(x)), \ l(x) = c_{H}^{-{\theta_{Z}}}(1-x^{\rho} v(x)) & \text{ and } & \tilde{l}(x) & = & \tilde{c}(1-x^{\tilde{\rho}} v(x)), \end{array} $$

for different slowly varying functions generically noted v, with

$$ c_{H}= \left\{ \begin{array}{ll} c_{F} & \text{ if } \theta_{X} < \theta_{C} \\ c_{F}+c_{G} & \text{ if } \theta_{X} = \theta_{C} \\ c_{G} & \text{ if } \theta_{X} > \theta_{C} \ \end{array} \right. , \ \ \ \tilde{c}= c_{H}^{-a} c_{F} = \left\{ \begin{array}{ll} 1 & \text{ if } \theta_{X} < \theta_{C} \\ c_{F}/(c_{F}+c_{G}) & \text{ if } \theta_{X} = \theta_{C} \\ c^{-a}_{G} c_{F} & \text{ if } \theta_{X} > \theta_{C} \end{array} \right. , $$
$$ \rho_{H}=\left\{ \begin{array}{ll} \max(\rho_{F}, 1/\theta_{C}-1/\theta_{X}) & \text{ if } \theta_{X} < \theta_{C} \\ \max(\rho_{F}, \rho_{G}) & \text{ if } \theta_{X} = \theta_{C} \\ \max(\rho_{G}, 1/\theta_{X}-1/\theta_{C}) & \text{ if } \theta_{X} > \theta_{C} \end{array} \right. , \ \ \rho= \theta_{Z} \rho_{H} = \left\{ \begin{array}{ll} \max(\theta_{X} \rho_{F}, d-1) & \text{ if } \theta_{X} < \theta_{C} \\ \max(\theta_{X} \rho_{F}, \theta_{X} \rho_{G}) & \text{ if } \theta_{X} = \theta_{C} \\ \max(\theta_{C} \rho_{G}, a-1) & \text{ if } \theta_{X} > \theta_{C} \end{array} \right. , $$

and

$$ \tilde{\rho}=\left\{ \begin{array}{ll} \rho & \text{ if } \theta_{X} \leqslant \theta_{C} \\ \max(\theta_{C} \rho_{G}, \theta_{C} \rho_{F}, a-1) & \text{ if } \theta_{X} > \theta_{C} \end{array} \right. . $$

The proof of this Lemma is based on Theorem B.2.2 in de Haan and Ferreira (2006) as well as the concept of de Bruyn conjugate (see Proposition 2.5 in Beirlant et al. 2004). Details are ommited for brevity.

Remark 3

It is clear that all the aforementioned slowly varying functions satisfy the second order condition SR2 with the corresponding second order parameters defined in the previous Lemma. In particular, rate functions B and \(\tilde {B}\) associated, respectively, to l and \(\tilde {l}\) satisfy \(x^{\tilde {\rho }}v(x)/\tilde {B}(x) \rightarrow -1/\tilde {\rho }\) and xρv(x)/B(x) →− 1/ρ, as x → +, with v, the appropriate slowly varying function (see again Theorem B.2.2 in de Haan and Ferreira 2006) .

Let us introduce, as in Einmahl et al. (2008), the function p(⋅) defined by

$$ p(x) = \mathbb{P}(\delta=1 | Z=x). $$

The following lemma provides useful developments of functions p and \(p \circ {\Lambda }_{H}^{-}\). In particular, it provides details about the rate of convergence of p(x), as x → + (to a limit which was denoted by p in the statement of Lemma 2, as the limit of the sequence \(\hat p_{k}\)). Its proof is based on the fact that

$$p(x)=\frac{\bar{G}(x) f(x)}{\bar{G}(x) f(x) +\bar{F}(x) G(x)},$$

where f and g are respectively the derivatives of F and G, as well as on the results of Lemma 3. It is omitted for brevity.

Lemma 4

Under assumptions (A1) and (A2), wehave

$$ \frac{1}{p(x)} = 1 + \frac{\theta_{X}}{\theta_{C}} x^{\frac{1}{\theta_{C}}-\frac{1}{\theta_{X}}} \frac{l_{G}(x)}{l_{F}(x)} (1+o(1)). $$

In particular, asx → +,

$$ p(x) \rightarrow p:=\left\{ \begin{array}{ll} 1 & \text{ if } \theta_{X} < \theta_{C}, \\ \tilde{c}= c_{F}/(c_{F}+c_{G}) & \text{ if } \theta_{X} = \theta_{C}, \\ 0 & \text{ if } \theta_{X} > \theta_{C} . \end{array} \right. $$

Moreover, we have

$$ \begin{array}{@{}rcl@{}} &&\text{ if } \theta_{X} < \theta_{C}, 1/ (p \circ {\Lambda}_{H}^{-})(x) = \ 1 + d \frac{c_{G}}{{c^{d}_{F}}} x^{d-1} (1-x^{-\beta} v(x)),\\ &&\text{ if } \theta_{X} = \theta_{C}, (p \circ {\Lambda}_{H}^{-})(x) = \ \tilde{c} (1-x^{\rho} v(x)),\\ &&\text{ if } \theta_{X} > \theta_{C}, 1/ (p \circ {\Lambda}_{H}^{-})(x) = \ 1 + \frac{1}{a \tilde{c}} x^{1-a} (1-x^{\tilde{\rho}} v(x)), \end{array} $$

whered = 𝜃X/𝜃C, vis a generic notation for a slowly varying function and

$$ -\beta = \max(\theta_{X} \rho_{F}, \theta_{X} \rho_{G}, d-1). $$

C.2. Proof of Lemma 1

  • Remind that

    $$ \begin{array}{@{}rcl@{}} R_{1,k}^{({\Delta})} & = & \displaystyle {\Delta}_{n} - \frac{1}{k} \sum\limits_{j=1}^{k} \left( \frac{\hat{{\Lambda} }_{nF}(Z_{n-j+1,n})}{ {\Lambda}_{F}(Z_{n-j+1,n})} \frac{ {\Lambda}_{F}(Z_{n-k,n}) }{\hat{{\Lambda} }_{nF}(Z_{n-k,n})} - 1\right)\\ & = & \displaystyle \frac{1}{k} \sum\limits_{j=1}^{k} \left( \log(1+\xi_{j,n}) - \xi_{j,n} \right), \end{array} $$

    where

    $$ \xi_{j,n} = \displaystyle \frac{\hat{{\Lambda} }_{nF}(Z_{n-j+1,n})}{ {\Lambda}_{F}(Z_{n-j+1,n})} \frac{ {\Lambda}_{F}(Z_{n-k,n}) }{\hat{{\Lambda} }_{nF}(Z_{n-k,n})} - 1 $$

    Introducing \( {\Delta }_{j} = \hat {{\Lambda } }_{nF}(Z_{n-j+1,n}) - {\Lambda }_{F}(Z_{n-j+1,n})\), for j = 1,…, k + 1 (which must not be confused with the Δn defined earlier in relation (A.3)), we have readily

    $$ \xi_{j,n} = \displaystyle \frac{{\Lambda}_{F}(Z_{n-k,n})}{\hat{{\Lambda} }_{nF}(Z_{n-k,n})} \left( {\Delta}_{j} \frac{{\Lambda}_{F}(Z_{n-k,n})}{{\Lambda}_{F}(Z_{n-j+1,n})} - {\Delta}_{k+1} \right) \frac{1}{{\Lambda}_{F}(Z_{n-k,n})}. $$

    Lemma 7 (in Appendix C.4) implies that \( |{\Delta }_{j}| = O_{\mathbb {P}} (1/\sqrt {j-1})\) for all j = 2,…, k + 1, \(|{\Delta }_{1}| = O_{\mathbb {P}} (1)\) and \( \frac {{\Lambda }_{F}(Z_{n-k,n})}{\hat {{\Lambda } }_{nF}(Z_{n-k,n})}\) tends to 1, in probability.

    Let now E1,…, En be n independent standard exponential random variable such that \( \frac {1}{{\Lambda }_{F}(Z_{n-k,n})} = \frac {E_{n-k,n}^{-a}}{\tilde {l}(E_{n-k,n})}\), where \(\tilde {l}\) tends to \(\tilde {c}\) at infinity. Moreover, \(\frac {{\Lambda }_{F}(Z_{n-k,n})}{{\Lambda }_{F}(Z_{n-j+1,n})} \leqslant 1\) and \(\frac {E_{n-k,n}}{L_{nk}}\) tends to 1 (see Lemma 6). Thus, we obtain \(|\xi _{1,n} | \leqslant (1 + o_{\mathbb {P}}(1)) \left (O_{\mathbb {P}} (1) + O_{\mathbb {P}} (1/\sqrt {k}) \right ) L_{nk}^{-a} (1/\tilde {c} + o_{\mathbb {P}}(1))\) and

    $$ |\xi_{j,n} | \leqslant (1 + o_{\mathbb{P}}(1)) \left( O_{\mathbb{P}} (1/\sqrt{j-1}) + O_{\mathbb{P}} (1/\sqrt{k}) \right) L_{nk}^{-a} (1/\tilde{c} + o_{\mathbb{P}}(1)), \text{ for } j=2, \ldots, k . $$

    Therefore \(\xi _{1,n}^{2} \leqslant O_{\mathbb {P}}(1) L_{nk}^{-2a}\) and

    $$ \xi_{j,n}^{2} \leqslant O_{\mathbb{P}}(1) \frac{L_{nk}^{-2a}}{j-1} \text{ for } j=2 \ldots, k. $$

    Consequently, since a > 0, sup1≤jk|ξj, n| tends to 0, in probability, and thus, using the inequality 0 ≤ x − log(1 + x) ≤ x2 (\(\forall x \geqslant -1/2\)), we obtain,

    $$ 0 \leqslant -R_{1,k}^{({\Delta})} \leqslant \frac{1}{k} \sum\limits_{j=1}^{k} \xi_{j,n}^{2}. $$

    But \( \frac {1}{k} {\sum }_{j=1}^{k} 1/j \sim \frac {\log k}{k}\). Hence

    $$ 0 \leqslant -\sqrt{k} L_{nk}^{1-b} R_{1,k}^{({\Delta})} \leqslant O_{\mathbb{P}}(1) \frac{\log k}{\sqrt{k}} L_{nk}^{1-b-2a}. $$

    Let 𝜖 > 0. We have 1 − b − 2a = 3b − 1, and so we want

    $$ \sqrt{k}(\log k)^{-1} L_{nk}^{1-3b} = (k^{\epsilon}/\log k) \left( \sqrt{k} L_{nk}^{(1-3b)/(1-2\epsilon)}\right)^{1-2\epsilon} $$

    to go to + . This is automatic when 0 ≤ b ≤ 1/3. If b > 1/3 (i.e. when 𝜃X > 3𝜃C), we can write (1 − 3b)/(1 − 2𝜖) = 1 − 3bδ for some positive δ and small enough 𝜖, and we have \(\sqrt {k} L_{nk}^{1-3b+\delta } = \sqrt {k}L_{nk}^{-b} \times L_{nk}^{-2b+1-\delta }\): the first factor goes to infinity (it is the CLT rate, assumption H4(i)), and the second factor as well for δ (i.e.𝜖) small enough because b is always smaller than 1/2.

  • Remind that

    $$ R_{2,k}^{({\Delta})} = \frac{1}{{\Lambda}_{F}(Z_{n-k,n})} \frac{1}{k} \sum\limits_{j=1}^{k} \left( \hat{{\Lambda} }_{nF}(Z_{n-j+1,n}) - {\Lambda}_{F}(Z_{n-j+1,n}) \right) \left( \frac{{\Lambda}_{F}(Z_{n-k,n})}{{\Lambda}_{F}(Z_{n-j+1,n})} -1 \right) $$

    and that \( \frac {{\Lambda }_{F}(Z_{n-k,n})}{{\Lambda }_{F}(Z_{n-j+1,n})} = x_{j,n}^{-a} \frac {\tilde {l}(E_{n-k,n})}{\tilde {l}(E_{n-j+1,n})}\), where \(x_{j,n} = \frac {E_{n-k,n}}{E_{n-j+1,n}} \rightarrow 1\), uniformly on j (see Lemma 6). Hence, using the fact that \(\sup _{1\leqslant j \leqslant k} |\hat {{\Lambda } }_{nF}(Z_{n-j+1,n}) - {\Lambda }_{F}(Z_{n-j+1,n}) | = O_{\mathbb {P}}(1)\) (see Lemma 7), we obtain

    $$ | R_{2,k}^{({\Delta})} | \leqslant O_{\mathbb{P}}(1) \frac{E_{n-k,n}^{-a}}{\tilde{l}(E_{n-k,n})} \left( \frac{1}{k} \sum\limits_{j=1}^{k} |x_{j,n}^{-a} -1| + \frac{1}{k} \sum\limits_{j=1}^{k} x_{j,n}^{-a} \left| \frac{\tilde{l}(E_{n-k,n})}{\tilde{l}(E_{n-j+1,n})} - 1 \right| \right). $$

    Introducing, once again, \(\tilde {E_{1}}, \ldots , \tilde {E_{k}}\), k independent standard exponential random variables, such that, \(\frac {E_{n-j+1,n}-E_{n-k,n}}{E_{n-k,n}} \overset {d}{=} \frac {\tilde {E}_{k-j,k}}{E_{n-k,n}} \) (see Lemma 5), and using a Taylor expansion, we have

    $$ | R_{2,k}^{({\Delta})}| \leqslant O_{\mathbb{P}}(1) E_{n-k,n}^{-a} \left( \frac{1}{k} \sum\limits_{j=1}^{k} \frac{\tilde{E}_{k-j,k}}{E_{n-k,n}} + \frac{1}{k} \sum\limits_{j=1}^{k} \left| \frac{\tilde{l}(E_{n-k,n})}{\tilde{l}(E_{n-j+1,n})}-1 \right| \right). $$

    Since \(\bar {E}_{n} = \frac {1}{k} {\sum }_{j=1}^{k} \tilde {E}_{j}\) and \(\frac {E_{n-k,n}}{L_{nk}}\) tend to 1, in probability, the first term of the right hand side multiplied by \(\sqrt {k} L_{nk}^{1-b} \) tends to 0, by the fact that \(\sqrt {k} L_{nk}^{-a-b} \) tends to 0 under condition H2(iii), H3(ii) or H4(iv). For the second term of the right hand side, we proceed as for \(R_{n,\tilde {l}}\) (see the proof of Proposition 2), by using the fact that condition \(R_{\tilde {l}}(\tilde {B}, \tilde {\rho })\) implies \(R_{1/\tilde {l}}(-\tilde {B}, \tilde {\rho })\) and again that \(\sqrt {k} L_{nk}^{-a-b} \) tends to 0.

  • Remind that

    $$ R_{3,k}^{({\Delta})}= \frac{ \hat{p}_{k}}{E_{n-k,n}^{a}} \left( \frac{1}{\tilde{l}(E_{n-k,n})} - \frac{1}{\tilde{c}} \right), $$

    where, according to Lemma 3, we have \(1-\frac {\tilde {l}(x)}{\tilde {c}} = x^{\tilde {\rho }} v(x) \), with v slowly varying. Hence,

    $$ R_{3,k}^{({\Delta})}= (1+o_{\mathbb{P}}(1)) E_{n-k,n}^{-a} \frac{ \hat{p}_{k}}{\tilde{c}} E_{n-k,n}^{\tilde{\rho}} v(E_{n-k,n}). $$

    We prove, in Lemma 2 (in Appendix A.1), that \(L_{nk}^{1-a} \frac {\hat {p}_{k}}{\tilde {c}}\) tends to a. Moreover, since v is slowly varying and \(\frac {E_{n-k,n}}{L_{nk}}\) tends to 1 (see Lemma 6), we obtain

    $$ \sqrt{k} L_{nk}^{1-b} R_{3,k}^{({\Delta})} = a (1+o_{\mathbb{P}}(1)) \sqrt{k} L_{nk}^{-b+ \tilde{\rho}} v(L_{nk}). $$

    This term tends to 0 in the case \(\theta _{X} \geqslant \theta _{C}\), under condition H3(i) or H4(ii). In the case 𝜃X < 𝜃C, we use the fact that \(\frac {x^{\tilde {\rho }} v(x)}{\tilde {B}(x)} \rightarrow -\frac {1}{\tilde {\rho }}\) (see Remark 7 in Appendix C.1). Therefore,

    $$ \sqrt{k} L_{nk}^{1-b} R_{3,k}^{({\Delta})} = -\frac{1}{\tilde{\rho}} (1+o_{\mathbb{P}}(1)) \sqrt{k} L_{nk}^{-b} \tilde{B}(L_{nk}), $$

    which tends to \(-\frac { \tilde {\alpha }}{\rho }\) under condition H2(ii), since \(\rho =\tilde {\rho }\), in this case.

  • Remind that

    $$ R_{4,k}^{({\Delta})} = - \frac{1}{k} \sum\limits_{j=1}^{k} \left( \frac{E_{n-j+1,n}}{E_{n-k,n}} \right)^{a} \left( \frac{\tilde{l}(E_{n-j+1,n})}{\tilde{l}(E_{n-k,n})} -1 \right). $$

    The treatment of this term is very similar to that of \(R_{n,\tilde {l}}\) (see the proof of Proposition 2). It relies on condition \(R_{\tilde {l}}(\tilde {B}, \tilde {\rho })\), as well as H2(ii), H3(i) or H4(ii). It is thus omitted.

  • Remind that

    $$ R_{5,k}^{({\Delta})} = - \frac{1}{k} \sum\limits_{j=1}^{k} \left\{ \left( \left( 1+ \frac{\tilde{E}_{k-j+1,k}}{E_{n-k,n}} \right)^{a}-1 \right) - a \frac{\tilde{E}_{k-j+1,k}}{E_{n-k,n}} \right\}. $$

    This term is 0 in the case 𝜃X𝜃C (a = 1). So, we only consider the case 𝜃X > 𝜃C (where 0 < a < 1). It is clear (see Lemmas 5 and 6) that \(\xi _{j,n} = \frac {\tilde {E}_{k-j+1,k}}{E_{n-k,n}} \overset {d}{=} \frac {E_{n-j+1,n}}{E_{n-k,n}} -1\) tends to 0, uniformly in j. Hence, by a Taylor expansion, we obtain

    $$ \begin{array}{@{}rcl@{}} R_{5,k}^{({\Delta})} & = & - (1+ o_{\mathbb{P}}(1)) \frac{1}{k} {\sum}_{j=1}^{k} \frac{a(a-1)}{2} \xi_{j,n}^{2}\\ & \overset{d}{=} & (1+ o_{\mathbb{P}}(1)) \frac{a(1-a)}{2} \frac{1}{E_{n-k,n}^{2}} \frac{1}{k} {\sum}_{j=1}^{k} \tilde{E_{j}}^{2} \sim \frac{a(1-a)}{2} L_{nk}^{-2}, \text{ (in probability)}, \end{array} $$

    and we conclude using H4(iv).

  • Finally, remind that

    $$ R_{6,k}^{({\Delta})}= \frac{ \hat{p}_{k}}{\tilde{c} E_{n-k,n}} \left( E_{n-k,n}^{1-a} - L_{nk}^{1-a} \right). $$

    This term is 0 in the case 𝜃X𝜃C (a = 1). So, we only consider the case 𝜃X > 𝜃C, where 0 < a < 1 and \( \hat {p}_{k}\) tends to 0 (see Lemma 2 in Appendix A.1). By the mean value theorem,

    $$ E_{n-k,n}^{1-a} - L_{nk}^{1-a} = (1-a) L_{nk}^{-a} \left( \frac{\widetilde{L}_{nk}}{L_{nk}} \right)^{-a}(E_{n-k,n} - L_{nk}), $$

    where \(\widetilde {L}_{nk}\) is between Lnk and Enk, n. Hence \(\frac {\widetilde {L}_{nk}}{L_{nk}}\) tends to 1 and, since \(\sqrt {k}(E_{n-k,n} - L_{nk}) \overset {d}{\longrightarrow } N(0,1)\) (see Lemma 6), we have

    $$ \sqrt{k} L_{nk}^{1-b} |R_{6,k}^{({\Delta})}| \leqslant o_{\mathbb{P}}(1)L_{nk}^{-b-a}=o_{\mathbb{P}}(1). $$

C.3. Proof of Lemma 2

The function p(⋅) below has already been defined in Appendix C.1,

$$ p(x)= \mathbb{P}(\delta=1 | Z=x). $$

Proceeding as in Einmahl et al. (2008), we carry on the proof by considering now that δi is related to Zi by

$$ \delta_{i}=\mathbb{I}_{U_{i}\leqslant p(Z_{i})}, $$

where (Ui)in denotes an independent sequence of standard uniform variables, independent of the sequence (Zi)in. We denote by U[1, n],…, U[n, n] the (unordered) values of the uniform sample pertaining to the order statistics Z1, n ≤… ≤ Zn, n of the observed sample Z1,…, Zn.

Remind that \(Z_{i}= {\Lambda }^{-}_{H}(E_{i})\), where E1,…, En are independent standard exponential random variables. We introduce, for every 1 ≤ in, the standard uniform random variables Vi = 1 − exp(−Ei) such that \(Z_{i}= {\Lambda }^{-}_{H}(-\log (1-V_{i}))\), and define the function

$$ r(t):=(p \circ {\Lambda}_{H}^{-})(-\log t). $$

Lemma 4 (in Appendix C.1) provides valuable information about the behavior of r(⋅) at infinity. We now write

$$ \begin{array}{@{}rcl@{}} D_{n}= \sqrt{k} L_{nk}^{-b} \left( L_{nk}^{1-a} \frac{\hat{p}_{k}}{\tilde{c}} -a \right) &= & \displaystyle \frac{L_{nk}^{-b}}{\sqrt{k}} \sum\limits_{j=1}^{k} \left( \frac{L_{nk}^{1-a}} {\tilde{c}} \mathbb{I}_{U_{[n-j+1,n]} \leqslant r(1-V_{n-j+1,n})}-a \right)\\ & = & \displaystyle\frac{L_{nk}^{b}}{\tilde{c} \sqrt{k}} \sum\limits_{j=1}^{k} \left( \mathbb{I}_{U_{[n-j+1,n]} \leqslant r(1-V_{n-j+1,n})} - \mathbb{I}_{U_{[n-j+1,n]} \leqslant r(j/n)} \right)\\ & &\displaystyle + \frac{L_{nk}^{-b}}{\sqrt{k}} \sum\limits_{j=1}^{k} \left( \frac{L_{nk}^{1-a}} {\tilde{c}} \mathbb{I}_{U_{[n-j+1,n]} \leqslant r(j/n)}-a \right)\\ & =: & T_{1,k} + T_{2,k}. \end{array} $$

Whatever the position of 𝜃X versus 𝜃C, we will prove below that the term T1, k above converges to 0 in probability. It turns out that this amounts to proving that, for some positive sequence vn = o(1/n) (to be chosen later) and some constant c > 0,

$$ \begin{array}{@{}rcl@{}} \!\sqrt{k} L_{nk}^b S_{n,k} \overset{n\rightarrow\infty}{\longrightarrow} 0\! \text{ where\! } S_{n,k} \!:= \sup \left\{ |r(s) - r(t)| ; \frac 1 n \leqslant t \leqslant \frac k n \ , \ |s - t| \leqslant c\sqrt{k}/n \ ,\! \ s\geqslant v_n \right\}. \end{array} $$
(C.1)

As a matter of fact, if we introduce the events

$$ \textstyle A_{n,c} = \left\{ \sup_{1\leqslant j\leqslant k} |(1-V_{n-j+1,n}) - j/n | \leqslant c\sqrt{k}/n \right\} text{ and } B_{n} = \left\{ 1-V_{n,n} \geqslant v_{n} \right\}, $$

then, since \(|\mathbb {I}_{U\leqslant a}-\mathbb {I}_{U\leqslant b}| \overset {d}{=} \mathbb {I}_{U\leqslant |a-b|}\) for any standard uniform U and constants a, b in [0, 1], it comes

$$ \begin{array}{@{}rcl@{}} \mathbb{P}(|T_{1,k}| > \delta ) & \leqslant & \mathbb{P} \left( \frac{1}{k} \sum\limits_{i=1}^{k} \mathbb{I}_{U_{j} \leqslant | r(1-V_{n-j+1,n}) - r(j/n) | } > \tilde c \delta / (\sqrt{k}L_{nk}^{b}) \right) \\ & \leqslant & \mathbb{P} \left( \sqrt{k} L_{nk}^{b} S_{n,k} > \eta \right) + \mathbb{P} \left( \frac{1}{k} \sum\limits_{i=1}^{k} \mathbb{I}_{U_{j} \leqslant \eta/(\sqrt{k}L_{nk}^{b})} > \tilde c \delta / (\sqrt{k}L_{nk}^{b}) \right) \\ &&+ \mathbb{P}({B_{n}^{c}}) + \mathbb{P}(A_{n,c}^{c}) \end{array} $$

for any given δ > 0 and η > 0. The second term in the right-hand side is (by Markov’s inequality) lower than \(\tilde c \delta / \eta \) (which is arbitrarily small), the third term is equal to nvn(1 + o(1)) = o(1), and the fourth term is arbitrarily small (for c large enough) by the weak convergence of the uniform tail quantile process. Therefore, we are left to prove that \(\sqrt {k}L_{nk}^{b} S_{n,k}=o(1)\) (i.e. relation (C.1)), so that \(T_{1,k}=o_{\mathbb {P}}(1)\) will be proved. This is done in the different cases distinguished below, along with the treatment of the main term T2, k.

The whole proof heavily relies on the first and second order developments stated in Lemma 4 of Appendix C.1, concerning the function \(p\circ {\Lambda }_{H}^{-}\).

1. Case𝜃X < 𝜃C

In this situation, we have a = 1, b = 0, \(\tilde {c}=1\) and \(p=\lim _{z \rightarrow + \infty } p(z) = \lim _{t\searrow 0} r(t)=1\) via Lemma 4. Hence

$$ \begin{array}{@{}rcl@{}} T_{2,k} & = & \frac{1}{\sqrt{k}} {\sum}_{j=1}^{k} \left( \mathbb{I}_{U_{n-j+1,n} \leqslant r(j/n)}- 1 \right)\\ & \overset{d}{=} & - \frac{1}{\sqrt{k}} {\sum}_{j=1}^{k} \left( \mathbb{I}_{U_{j}>r(j/n)} -(1-r(j/n)) \right) - \frac{1}{\sqrt{k}} {\sum}_{j=1}^{k} (1-r(j/n))\\ & =: & - T^{\prime}_{2,k} - T^{\prime\prime}_{2,k}, \end{array} $$

where \(T^{\prime }_{2,k}\) turns out to be a sum of centered independent random variables. Let us now prove that \(T^{\prime }_{2,k}=o_{\mathbb {P}}(1)\), \(T^{\prime \prime }_{2,k}\) tends to Aα (here \(A=\frac {\theta _{X}}{\theta _{C}} \frac {c_{G}}{{c_{F}^{d}}}\) where α is defined in condition H2(iii)) and that \(\sqrt {k}S_{n,k}\to 0\) (hence, as explained above, \(T_{1,k}=o_{\mathbb {P}}(1)\)).

Concerning \(T^{\prime }_{2,k}\), by definition of r(⋅) and thanks to Lemma 4, we have

$$ 1-r(x)= A (-\log x)^{d-1} (1 + o(x)) \hspace{0.23cm} \ \text{ where } \ d=\theta_{X}/\theta_{C} \in ]0,1[ . $$

Therefore, since log(n/j)/Lnk tends to 1 uniformly in j under condition H1 (Lemma 6), we obtain

$$ \mathbb{V}(T^{\prime}_{2,k}) = \frac{1}{k} \sum\limits_{j=1}^{k} r(j/n) (1-r(j/n)) \leqslant \frac{1}{k}\sum\limits_{j=1}^{k} (1-r(j/n)) \leqslant L_{nk}^{d-1} A (1 + o(1)), $$

which implies that \( \mathbb {V}(T^{\prime }_{2,k})\) tends to 0, since d < 1.

Concerning \(T^{\prime \prime }_{2,k}\), we have similarly, using now assumption H2(iii) and Lemma 6 (log n/jLnk),

$$ T^{\prime\prime}_{2,k}= A (1+o(1)) \sqrt{k} (L_{nk})^{d-1} \overset{n\rightarrow\infty}{\longrightarrow} A \alpha^{\prime}. $$

Let us now deal with \(\sqrt {k}S_{n,k}\). From now on, let cst denote some generic positive constant. Since r(t) converges to 1 as t ↘ 0, and thanks to Lemma 4, we have, for s and t small,

$$ \begin{array}{@{}rcl@{}} | r(s) - r(t) | & = & \left| \frac 1{r(s)} - \frac 1 {r(t)} \right| r(s)r(t) \\ & \leqslant & cst \left\{ |(-\log t)^{d-1} - (-\log s)^{d-1}| + |(-\log t)^{d-1-\beta}v(-\log t)\right. \\&&\left.- (-\log s)^{d-1-\beta}v(-\log s)| \right\} \end{array} $$

Introducing the set \(Z_{n}=\{ (s,t) ; 1/n \leqslant t \leqslant k/n , |t-s|\leqslant c\sqrt {k}/n , s\geqslant v_{n} \}\) and reminding that vn = o(1/n) (an appropriate sequence will be chosen in few lines), it can be checked that applying the mean value theorem to the function h(t) = (− log t)d− 1 of positive derivative h(t) = (1 − d)t− 1(− log t)d− 2, yields for large n (below, u = u(s, t) denotes some appropriate value between s and t)

$$ \textstyle \sqrt{k} \sup_{(s,t)\in Z_{n}} |h(t)-h(s)| \leqslant \sup_{(s,t)\in Z_{n}} |h^{\prime}(u)|.|t-s| \leqslant cst \sqrt{k} \frac 1{v_{n}} L_{nk}^{d-2} c\sqrt{k}/n = cst \frac{k}{nv_{n}} L_{nk}^{d-2}. $$

This is the first step towards the proof of \(\sqrt {k}S_{n,k}=o(1)\). The second step requires to do the same job with the function \(\tilde h(t)=(-\log t)^{d-1-\beta }v(-\log t)\), where v(⋅) is slowly varying at infinity. It is known (cf Bingham et al. 1987 page 15) that we have xv(x)/v(x) → 0 and xβv(x) → 0 as x, so that

$$ | \tilde h^{\prime}(t) | = |1-d+\beta| \frac 1 t (-\log t)^{d-2} \left| 1- cst \frac {xv^{\prime}(x)}{v(x)} \right| x^{-\beta} |v(x)| \leqslant cst |h^{\prime}(t)| $$

where x denoted (− log t), which is large when t is close to 0. Therefore, taking into account all the previous findings, and considering the choice vn = k𝜖/n = o(1/n), we have proved that for n large

$$ \textstyle \sqrt{k} S_{n,k} \leqslant cst \frac{k}{nv_{n}} L_{nk}^{d-2} = cst . k^{1+\epsilon} L_{nk}^{d-2} = cst \left( \sqrt{k} L_{nk}^{(d-2)/2 + \delta}\right)^{2(1+\epsilon)} $$

which turns out to be o(1) as soon as 0 < δ < d/2 thanks to assumption H2(iii). This ends the proof of Lemma 2 in the mild censoring case 𝜃X < 𝜃C.

2. Case𝜃X = 𝜃C

In this case, we also have a = 1, b = 0 but now \(\tilde {c}=\frac {c_{F}}{c_{F}+c_{G}} =p=\lim _{z \rightarrow \infty } p(z) = \lim _{t\searrow 0} r(t)\) via Lemma 4. It is then clear that

$$ \begin{array}{@{}rcl@{}} T_{2,k} & \overset{d}{=} & \displaystyle \frac 1 p \frac{1}{\sqrt{k}} \sum\limits_{j=1}^{k} \left( \mathbb{I}_{U_{j} \leqslant r(j/n)}- r(j/n) \right) + \frac 1 p \frac{1}{\sqrt{k}} \sum\limits_{j=1}^{k} (r(j/n)-p)\\ & =: & T^{\prime}_{2,k} + T^{\prime\prime}_{2,k} \end{array} $$

Let us prove that \(T^{\prime }_{2,k} \overset {d}{\longrightarrow } N(0,\frac {1-p}{p})\), while \(T^{\prime \prime }_{2,k}\) and \(\sqrt {k} S_{n,k}\) are both o(1).

Concerning \(T^{\prime }_{2,k}\), we have

$$ \mathbb{V}(T^{\prime}_{2,k}) = \frac{1}{p^{2}} \frac{1}{k}\sum\limits_{j=1}^{k} r(j/n) (1-r(j/n)), $$

which tends to \(\frac {1-p}{p}\), since r(j/n) tends to p, uniformly in j (see Lemma 4). We conclude, for this term, using Lyapunov’s theorem (details are omitted, here r(j/n) ≤ 1).

Concerning \(T^{\prime \prime }_{2,k}\), since Lemma 4 yields r(t) = p (1 − (− log t)ρv(− log t)), we have (for some δ > 0)

$$ T^{\prime\prime}_{2,k} = -\frac {1}{\sqrt{k}} \sum\limits_{j=1}^{k} (\log(n/j))^{\rho} v(\log(n/j)) = -\sqrt{k} (L_{nk})^{\rho+\delta} L_{nk}^{-\delta}v(L_{nk}) \frac{1}{k} \sum\limits_{j=1}^{k} u_{n,j}^{\rho} $$

where we noted un, j = log(n/j)/Lnk, which tends to 1 uniformly in j thanks to condition H1, and used the fact that v(log(n/j)) ∼ v(Lnk) because vRV0. The Riemann sum on the right-hand side converges to 1, so for a choice of δ satisfying assumption H3(i), we have proved that \(T^{\prime \prime }_{2,k}=o(1)\).

Concerning now \(\sqrt {k}S_{n,k}\), we proceed similarly as in the first case. Introducing \(\tilde h(t)=(-\log t)^{\rho }v(-\log t)\) where v(⋅) is slowly varying at infinity, we have as previously \(|\tilde h^{\prime }(t)|=\frac 1 t (-\log t)^{\rho -1+\epsilon }o(1)\) for t ↘ 0 and any some small 𝜖 > 0. Therefore, Lemma 4, definitions of Sn, k and of the set Zn, along with the mean value theorem, yield

$$ \sqrt{k}S_{n,k} = \tilde c \sup_{(s,t)\in Z_{n}} |\tilde h(t)-\tilde h(s)| \leqslant cst \sqrt{k} \sup_{(s,t)\in Z_{n}} \{ |\tilde h^{\prime}(u)|.|t-s| \} \leqslant cst \sqrt{k} \frac 1{v_{n}} L_{nk}^{\rho-1+\epsilon} \tilde c \frac{\sqrt{k}}{n}. $$

Choosing, in the definition of Sn, k, the sequence vn = k𝜖/n = o(1/n) for some small 𝜖 > 0, we have

$$ \textstyle \sqrt{k}S_{n,k} = cst \left( \sqrt{k} L_{nk}^{(\rho-1+\epsilon)/(2(1+\epsilon))} \right)^{2(1+\epsilon)} = cst \left( \sqrt{k} L_{nk}^{(\rho-1)/2+\delta} \right)^{2(1+\epsilon)} $$

which turns out to be o(1) according to assumption H3(i) (if \(\rho \geqslant 1\)) or H3(ii) (if ρ < 0), as soon as δ is sufficiently small. This ends the proof of Lemma 2 in the semi-strong censoring case 𝜃X = 𝜃C.

3. Case𝜃X > 𝜃C

Now we are in the situation where a < 1, b = (1 − a)/2 ∈]0, 1/2[, and \(\tilde {c}=\frac {c_{F}}{{c_{G}^{a}}}\) is different from \(p=\lim _{z \rightarrow \infty } p(z) = \lim _{t\searrow 0} r(t)=0\). Since 1 − ab = b, we have readily

$$ \begin{array}{@{}rcl@{}} T_{2,k} & \overset{d}{=} & \displaystyle \frac{L_{nk}^{b}}{\tilde{c}} \frac{1}{\sqrt{k}} \sum\limits_{j=1}^{k} \left( \mathbb{I}_{U_{j} \leqslant r(j/n)}- r(j/n) \right) + \frac{a L_{nk}^{-b}}{\sqrt{k}} \sum\limits_{j=1}^{k} \left( \frac{L_{nk}^{1-a}}{a \tilde{c}} r(j/n)-1 \right)\\ & =: & T^{\prime}_{2,k} + T^{\prime\prime}_{2,k} \end{array} $$

Let us prove that \(T^{\prime }_{2,k} \overset {d}{\longrightarrow } N(0,\frac {a}{\tilde {c}})\), while \(T^{\prime \prime }_{2,k}\) and \(\sqrt {k} L_{nk}^{b} S_{n,k}\) are both o(1) (the latter will guarantee that \(T_{1,k}=o_{\mathbb {P}}(1)\)).

Concerning \(T^{\prime }_{2,k}\), we have

$$ \mathbb{V}(T^{\prime}_{2,k}) = \frac{L_{nk}^{2b}}{\tilde{c}^{2}} \frac{1}{k} \sum\limits_{j=1}^{k} r(j/n) (1-r(j/n)) $$

Lemma 4 yields the following first order development, as t ↘ 0,

$$ r(t)= a \tilde{c} (-\log t)^{a-1} (1 + o(t)) = a \tilde{c} (-\log t)^{-2b} (1 + o(t)). $$
(A.2)

Since un, j = log(n/j)/Lnk tends to 1 uniformly in j, under condition H1 (see Lemma 6), it is then easy to see that \(\mathbb {V}(T^{\prime }_{2,k})\) tends to \(\frac {a}{\tilde {c}}\). We conclude concerning \(T^{\prime }_{2,k}\) using Lyapunov’s theorem (again, details are easy and omitted).

Concerning \(T^{\prime \prime }_{2,k}\), we write

$$ \frac{L_{nk}^{1-a}}{a \tilde{c}} r(j/n)-1 = \left( \frac{L_{nk}^{1-a}}{a \tilde{c}} r(j/n) - \left( \frac{L_{nk}}{\log(n/j)} \right)^{1-a} \right) + \left( \left( \frac{L_{nk}}{\log(n/j)} \right)^{1-a} -1 \right) $$

and treat these two terms separately. Using the second order formula stated in Lemma 4, we have

$$ \begin{array}{@{}rcl@{}} \frac{1}{r(t)} = 1+ \frac{(-\log t)^{1-a}}{a \tilde{c} } \left( 1- (-\log t)^{\tilde{\rho}} v(-\log t) \right). \end{array} $$
(A.3)

and consequently, for some small δ > 0,

$$ \begin{array}{@{}rcl@{}} \frac{a\tilde c}{L_{nk}^{1-a}r(j/n)} & = & \left( \frac{\log(n/j)}{L_{nk}}\right)^{1-a} \left( 1 - (\log(n/j))^{\tilde \rho} v(\log(n/j)) + a\tilde c (\log(n/j))^{a-1} \right) \\ & = & \left( \frac{\log(n/j)}{L_{nk}}\right)^{1-a} \left( 1 - L_{nk}^{\tilde\rho+\delta} o(1) + a\tilde c L_{nk}^{a-1} (1+o(1))\right) \\ \end{array} $$

where we used condition H1 and the slow variation of v, which guarantees that v(log(n/j)) ∼ v(Lnk) and xδv(x) → 0 as x. Now, since \( \tilde {\rho }= \max (\theta _{Z} \rho _{F}, \theta _{Z} \rho _{G},a-1) \geqslant a-1\), it comes

$$ \frac{L_{nk}^{1-a}}{a \tilde{c}} r(j/n) - \left( \frac{L_{nk}}{\log(n/j)} \right)^{1-a} = (1+o(1)) L_{nk}^{\tilde\rho + \delta} o(1) $$

and therefore the first term of \(T^{\prime \prime }_{2,k}\) is equal to \(a \sqrt {k} L_{nk}^{-b+\tilde {\rho }+\delta } o(1)\), which tends to 0 under condition H4(ii). The second term of \(T^{\prime \prime }_{2,k}\) is

$$ a \sqrt{k} L_{nk}^{-b} \frac{1}{k} \sum\limits_{j=1}^{k} \left( \left( \frac{L_{nk}}{\log(n/j)} \right)^{1-a} -1 \right). $$

But \( \left (\frac {L_{nk}}{\log (n/j)} \right )^{1-a} -1= (a-1) \frac {\log (k/j)}{L_{nk}} (1+o(1))\) with \( \frac {1}{k} {\sum }_{j=1}^{k} \log (k/j)\) tending to 1. So the second term of \(T^{\prime \prime }_{2,k}\) is equal to

$$ a(a-1) \sqrt{k} L_{nk}^{-1-b} (1+o(1)), $$

and this quantity tends to 0 under condition H4(iv).

Concerning now \(\sqrt {k}L_{nk}^{b} S_{n,k}\), we have

$$ S_{n,k} = \sup\limits_{(s,t)\in Z_{n}} |r(t)-r(s)| \leqslant \sup\limits_{(s,t)\in Z_{n}} \left| \frac 1{r(t)} - \frac 1 {r(s)} \right| \sup_{(s,t)\in Z_{n}} \{ r(t)r(s) \}. $$

Thanks to the first order relation (A.2), the second supremum of the right-hand side is lower than a constant times \(L_{nk}^{2(a-1)}\). The first supremum will be handled with the more precise second order development (A.3), which yields

$$ \sup\limits_{(s,t)\in Z_{n}} \left| \frac 1{r(t)} - \frac 1 {r(s)} \right| = cst \left\{ \sup\limits_{(s,t)\in Z_{n}} |h(t)-h(s)| + \sup\limits_{(s,t)\in Z_{n}} |\tilde h(t)-\tilde h(s)| \right\} $$

where we define h(t) = (− log t)1−a and \(\tilde h(t)=(-\log t)^{1-a+\tilde {\rho }}v(-\log t)\). Contrary to the functions arisen in case 1, the functions h and \(\tilde h\) tend to infinity instead of vanishing to 0, when t ↘ 0 : this will be counterbalanced by the second supremum. Studying derivatives of the functions h and \(\tilde h\), and again using a first order Taylor expansion, we obtain via similar computations as in the previous cases, for n large and any 𝜖 > 0 (with the choice vn = k𝜖/n),

$$ \sup\limits_{(s,t)\in Z_{n}} \left| \frac 1{r(t)} - \frac 1 {r(s)} \right| \leqslant cst. k^{1/2+\epsilon} L_{nk}^{-a}. $$

Therefore, gathering the two suprema, we have (for some small value of δ > 0 depending on 𝜖)

$$ \sqrt{k}L_{nk}^{b} S_{n,k} \leqslant cst.k^{1+\epsilon} L_{nk}^{b-a}L_{nk}^{2(a-1)} = cst. k^{1+\epsilon} L_{nk}^{-1-b} = cst \left( \sqrt{k} L_{nk}^{-(1+b)/2+\delta} \right)^{2(1+\epsilon)} $$

which, by assumption H4(iii), converges to 0 as n.

C.4. Additional useful lemmas

Let E1,…, En be n iid standard exponential random variables.

Lemma 5

According to Lemma 1.4.3. inReiss (1989), wehave

$$ (E_{n-j+1,n} -E_{n-k,n})_{1\leqslant j \leqslant k} \overset{d}{=} (\tilde{E}_{k-j+1,k})_{1\leqslant j \leqslant k}, $$

where\(\tilde {E}_{1}, \ldots , \tilde {E}_{k}\)arekindependent standard exponential random variables.

Lemma 6

Under conditionH1, we have, asn → +,

$$ \frac{E_{n-k,n}}{L_{nk}} \overset{\mathbb{P}}{\longrightarrow} 1, \ \frac{E_{n-j+1,n}}{\log(n/i)} \overset{\mathbb{P}}{\longrightarrow} 1, \text{ uniformly on } j=1, {\ldots} k \ \text{ and } \ \sqrt{k} (E_{n-k,n} - L_{nk}) \overset{d}{\longrightarrow} N(0,1). $$

We refer to Girard (2004b) for the proof of this Lemma.

Lemma 7

If we consider the classical random censoring model ( 1 ) with continuous distributionfunctionsF andG of the variablesX andC, then the following in-probability results hold :

$$ \begin{array}{@{}rcl@{}} &&\left| \hat{{\Lambda} }_{nF}(Z_{n-j+1,n}) - {\Lambda}_{F}(Z_{n-j+1,n}) \right| =O_{\mathbb{P}} (1/\sqrt{j-1}), \text{ for } j=2, \ldots, k+1,\\ &&\left| \hat{{\Lambda} }_{nF}(Z_{n,n}) - {\Lambda}_{F}(Z_{n,n}) \right| =O_{\mathbb{P}} (1). \end{array} $$

The first statement is a part of Theorem 1 in Csorgo (1996). For the second statement, one has to make a careful examination of Theorem 2.1 in Zhou (1991), in a narrower context, since the samples (Xi) and (Ci) we consider are i.i.d. , whereas Zhou considers possibly non-identically distributed censoring variables Ci. In pages 2269-2270 of the mentioned paper, one can find out that the maximum observed value (named Tn) does not have to be excluded from the probability bound (2.3) : it can indeed be proved, by following the steps of the proof of (2.3), that for every n,

$$ \forall \epsilon>0, \hspace{0.5cm} \textstyle{ \mathbb{P} \left[ \sup_{t\leqslant Z_{n,n}} \left| \hat{{\Lambda} }_{nF}(t) - {\Lambda}_{F}(t) \right| > \epsilon \right] } \leqslant 6 \epsilon^{2/3}. $$

So the second statement of Lemma 7 follows.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Worms, J., Worms, R. Estimation of extremes for Weibull-tail distributions in the presence of random censoring. Extremes 22, 667–704 (2019). https://doi.org/10.1007/s10687-019-00354-2

Download citation

Keywords

  • Weibull-tail
  • Tail inference
  • Random censoring
  • Asymptotic representation

Mathematics Subject Classification (2010)

  • Primary 62G32
  • Secondary 62N02