Abstract
The study of extremes in missing data frameworks is a recent developing field. In particular, the randomly right-censored case has been receiving a fair amount of attention in the last decade. All studies on this topic, however, essentially work under the usual assumption that the variable of interest and the censoring variable are independent. Furthermore, a frequent characteristic of estimation procedures developed so far is their crucial reliance on particular properties of the asymptotic behaviour of the response variable Z (that is, the minimum between time-to-event and time-to-censoring) and of the probability of censoring in the right tail of Z. In this paper, we focus instead on elucidating this asymptotic behaviour in the dependent censoring case, and, more precisely, when the structure of the dependent censoring mechanism is given by an extreme value copula. We then draw a number of consequences of our results, related to the asymptotic behaviour, in this dependent context, of a number of estimators of the extreme value index of the random variable of interest that were introduced in the literature under the assumption of independent censoring, and we discuss more generally the implications of our results on the inference of the extremes of this variable.
Article PDF
Similar content being viewed by others
References
Aban, I.B., Meerschaert, M.M., Panorska, A.K.: Parameter estimation for the truncated Pareto distribution. J. Am. Stat. Assoc. 101, 270–277 (2006)
Beirlant, J., Guillou, A., Dierckx, G., Fils-Villetard, A.: Estimation of the extreme value index and extreme quantiles under random censoring. Extremes 10, 151–174 (2007)
Beirlant, J., Guillou, A., Toulemonde, G.: Peaks-Over-Threshold modeling under random censoring. Commun. Stat. – Theory Methods 39, 1158–1179 (2010)
Beirlant, J., Bardoutsos, A, de Wet, T., Gijbels, I.: Bias reduced tail estimation for censored Pareto type distributions. Stat. Probab. Lett. 109, 78–88 (2016)
Bingham, N.H., Goldie, C.M., Teugels, J.L.: Regular Variation. Cambridge University Press, Cambridge (1987)
Brahimi, B., Meraghni, D., Necir, A.: Gaussian approximation to the extreme value index estimator of a heavy-tailed distribution under random censoring. Math. Methods Stat. 24, 266–279 (2015)
Capéràa, P., Fougères, A.-L., Genest, C.: Bivariate distributions with given extreme value attractor. J. Multivar. Anal. 72, 30–49 (2000)
Carrière, J.F.: Removing cancer when it is correlated with other causes of death. Biom. J. 37, 339–350 (1995)
de Haan, L., Ferreira, A.: Extreme Value Theory: an Introduction. Springer, New York (2006)
Deheuvels, P.: Probabilistic aspects of multivariate extremes. In: Tiago de Oliveira, J. (ed.) Statistical extremes and applications, pp 117–130 (1984)
Dekkers, A.L.M., Einmahl, J.H.J, de Haan, L.: A moment estimator for the index of an extreme-value distribution. Ann. Stat. 17, 1833–1855 (1989)
Ding, A.A.: Identifiability conditions for covariate effects model on survival times under informative censoring. Stat. Probab. Lett. 80, 911–915 (2010)
Ding, A.A.: Copula identifiability conditions for dependent truncated data model. Lifetime Data Anal. 18, 397–407 (2012)
Ebrahimi, N., Molefe, D.: Survival function estimation when lifetime and censoring time are dependent. J. Multivar. Anal. 87, 101–132 (2003)
Ebrahimi, N., Molefe, D., Ying, Z.: Identifiability and censored data. Biometrika 90, 724–727 (2003)
Einmahl, J.H.J., Fils-Villetard, A., Guillou, A.: Statistics of extremes under random censoring. Bernoulli 14, 207–227 (2008)
Fisher, L., Kanarek, P.: Presenting censored survival data when censoring and survival times may not be independent. In: Proschan, F., Serfling, R (eds.) Reliability and Biometry. Statistical Analysis of Life Length, pp 303–326. Society for Industrial and Applied Mathematics (1974)
Galambos, J.: The Asymptotic Theory of Extreme Order Statistics, Wiley Series in Probability and Mathematical Statistics. Wiley, New York-Chichester-Brisbane (1978)
Gardes, L., Stupfler, G.: Estimating extreme quantiles under random truncation. TEST 24, 207–227 (2015)
Genest, C., Rivest, L.-P.: A characterization of Gumbel’s family of extreme value distributions. Stat. Probab. Lett. 8, 207–211 (1989)
Goegebeur, Y., de Wet, T.: Estimation of the third-order parameter in extreme value statistics. TEST 21, 330–354 (2012)
Gomes, M.I., Neves, M.M.: Estimation of the extreme value index for randomly censored data. Biom. Lett. 48, 1–22 (2011)
Gudendorf, G., Segers, J.: Extreme-value copulas. In: Jaworski, P., Durante, F., Härdle, W.K., Rychlik, T (eds.) Copula Theory and Its Applications. Lecture Notes in Statistics, vol. 198, pp 127–145 (2010)
Gumbel, E.J.: Bivariate exponential distributions. J. Am. Stat. Assoc. 55, 698–707 (1960)
Hsu, C.-H., Taylor, J.M.G.: A robust weighted Kaplan-Meier approach for data with dependent censoring using linear combinations of prognostic covariates. Stat. Med. 29, 2215–2223 (2010)
Huang, X., Zhang, N.: Regression survival analysis with an assumed copula for dependent censoring: a sensitivity analysis approach. Biometrics 64, 1090–1099 (2008)
Jackson, D., White, I.R., Seaman, S., Evans, H., Baisley, K., Carpenter, J.: Relaxing the independent censoring assumption in the Cox proportional hazards model using multiple imputation. Stat. Med. 33, 4681–4694 (2014)
Joe, H.: Multivariate Models and Dependence Concepts. Chapman and Hall/CRC, Boca Raton (1997)
Kaplan, E.L., Meier, P.: Nonparametric estimation from incomplete observations. J. Am. Stat. Assoc. 53, 457–481 (1958)
Klein, J.P., Moeschberger, M.L.: Independent or dependent competing risks: does it make a difference Commun. Stat. Simul. Comput. 16, 507–533 (1987)
Lagakos, S.W.: General right censoring and its impact on the analysis of survival data. Biometrics 35, 139–156 (1979)
Ledford, A.W., Tawn, J.A.: Statistics for near independence in multivariate extreme values. Biometrika 83, 169–187 (1996)
Ledford, A.W., Tawn, J.A.: Modelling dependence within joint tail regions. J. R. Stat. Soc. Ser. B 59, 475–499 (1997)
Li, Y., Tiwari, R.C., Guha, S.: Mixture cure survival models with dependent censoring. J. R. Stat. Soc. Ser. B 69, 285–306 (2007)
Lo, S.M.S., Wilke, R.A.: A copula model for dependent competing risks. J. R. Stat. Soc. Ser. C 59, 359–376 (2010)
Modarres, M., Kaminskiy, M.P., Krivtsov, V.: Reliability Engineering and Risk Analysis: a Practical Guide, 2nd edn. CRC Press, Boca Raton (2009)
Ndao, P., Diop, A., Dupuy, J.-F.: Nonparametric estimation of the conditional tail index and extreme quantiles under random censoring. Comput. Stat. Data Anal. 79, 63–79 (2014)
Ndao, P., Diop, A., Dupuy, J.-F.: Nonparametric estimation of the conditional extreme-value index with random covariates and censoring. J. Stat. Plan. Infer. 168, 20–37 (2016)
Reynkens, T., Verbelen, R., Beirlant, J., Antonio, K.: Modelling censored losses using splicing: a global fit strategy with mixed Erlang and extreme value distributions. Insur.: Math. Econ. 77, 65–77 (2017)
Rivest, L.-P., Wells, M.T.: A martingale approach to the Copula-Graphic estimator for the survival function under dependent censoring. J. Multivar. Anal. 79, 138–155 (2001)
Sayah, A., Yahia, D., Brahimi, B.: On robust tail index estimation under random censorship. Afr. Stat. 9, 671–683 (2014)
Sklar, A.: Fonctions de répartition à n dimensions et leurs marges, vol. 8. Publications de l’Institut de Statistique de l’Université de Paris, pp. 229–231 (1959)
Stupfler, G.: Estimating the conditional extreme-value index under random right-censoring. J. Multivar. Anal. 144, 1–24 (2016)
Tawn, J.A.: Bivariate extreme value theory: Models and estimation. Biometrika 75, 397–415 (1988)
Tsiatis, A.: A non-identifiability aspect of the problem of competing risks. Proc. Natl. Acad. Sci. 72, 20–22 (1975)
Worms, J., Worms, R.: New estimators of the extreme value index under random right censoring, for heavy-tailed distributions. Extremes 17, 337–358 (2014)
Zeng, D.: Estimating marginal survival function by adjusting for dependent censoring using many covariates. Ann. Stat. 32, 1533–1555 (2004)
Zheng, M., Klein, J.P.: Estimates of marginal survival for dependent competing risks based on an assumed copula. Biometrika 82, 127–138 (1995)
Acknowledgements
The author acknowledges an anonymous Associate Editor and two anonymous reviewers for their helpful comments that led to an improved version of this paper.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix A: Proofs of the main results
Proof of Proposition 1
Use Eq. 1 to get
Set \(v_{0}=\displaystyle \lim _{z\uparrow \tau _{Y}} \overline {F}_{T}(z) >0\) and apply Lemma 2 to get
This is a positive constant, by Lemma 1. Thus
which completes the proof. □
Proof of Theorem 1
Apply Lemma 3 to get:
-
(i)
If γY > γT ≥ 0, then τ = ∞ and \(\overline {F}_{T}(z)/\overline {F}_{Y}(z)\to 0\) as z → ∞.
-
(ii)
If γT > γY ≥ 0, then τ = ∞ and \(\overline {F}_{T}(z)/\overline {F}_{Y}(z) = [\overline {F}_{Y}(z)/\overline {F}_{T}(z)]^{-1}\to \infty \) as z → ∞.
-
(iii)
If 0 ≥ γY > γT, then τ < ∞ and \(\overline {F}_{T}(z)/\overline {F}_{Y}(z) = [\overline {F}_{Y}(z)/\overline {F}_{T}(z)]^{-1}\to \infty \) as z ↑ τ.
-
(iv)
If 0 ≥ γT > γY, then τ < ∞ and \(\overline {F}_{T}(z)/\overline {F}_{Y}(z)\to 0\) as z ↑ τ.
In each of these four cases, the result is then a direct corollary of Lemma 5(i) and (ii). In the case γY = γT, the result is a consequence of Lemma 5(iii). □
Proof of Theorem 2
Recall that
To prove (i), remark that if γYγT ≥ 0 and |γY| > |γT|, then \(\overline {F}_{T}(z)/\overline {F}_{Y}(z)\to 0\) as z ↑ τ by Lemma 3. In that case, Lemma 5(i) entails together with Lemma 6(i) that
The first part of the integral in the numerator is controlled by noting that
To control the second one, we first write
When γY > 0, and therefore τ = ∞, we apply Lemma 3 to obtain that there is ε > 0 such that
Now, by Theorem 1.2.1 in de Haan and Ferreira (2006), the function \(\overline {F}_{Y}\) is regularly varying with index − 1/γY. In other words, we can write this function as
where LY is a slowly varying function at infinity. Since log(LY(t))/ log t → 0 as t → ∞ (see Proposition 1.3.6 p.16 in Bingham et al. 1987) we get \(\log (\overline {F}_{Y}(t)) t^{-\varepsilon /2} \to 0 \ \text { as } \ t\to \infty \). In particular:
Therefore
which concludes the proof in this case. When γY < 0 and τ < ∞, we apply Lemma 3 again to obtain that there is ε > 0 such that
Use the change of variables T = (τ − t)− 1 to obtain
Since by Theorem 1.2.1 in de Haan and Ferreira (2006) the function \(T\mapsto \overline {F}_{Y}(\tau -T^{-1})\) is regularly varying with index 1/γY < 0, we can exactly mimic the proof of the case γY > 0 to obtain
The proof of (i) is then complete.
To prove (ii), we note that in this case \(\overline {F}_{T}(z)/\overline {F}_{Y}(z) = [\overline {F}_{Y}(z)/\overline {F}_{T}(z)]^{-1}\to \infty \) as z ↑ τ by Lemma 3. Apply then Lemma 5(ii) together with Lemma 6(ii) to get
Since A′(1) > 0 by Lemma 1(ii), the numerator is clearly equivalent to \(A^{\prime }(1)\overline {F}_{Y}(z)\), which entails that P(z) → 1 as z → τ and concludes the proof of (ii).
Finally, to show (iii), apply Lemma 5(iii) together with Lemma 6(iii) to obtain
If the constant \(1- \left [ A\left (\frac {c}{c + 1} \right ) - \frac {c}{c + 1} A^{\prime }\left (\frac {c}{c + 1} \right ) \right ]\) is zero then
as required. Otherwise we clearly have
and this completes the proof. □
Proof of Proposition 2
Recall that \(\overline {F}_{T}(z)/\overline {F}_{Y}(z)\to 0\) as z ↑ τ by Lemma 3. Then, use Lemma 7 together with the equality
to obtain the statement about \(\overline {F}_{Z}(z)\). The result on P(z) is a reformulation of Eq. 5 in the proof of Theorem 2. □
Appendix B: Auxiliary results and proofs
In this Appendix, A denotes a Pickands dependence function, C is the corresponding extreme value copula
which we assume not to be equal to the independence copula, and φ is the function defined by:
The first lemma gathers a few results about Pickands dependence functions.
Lemma 1
Let A be a Pickands dependence function, and assume that the related copula C is not the independent copula. Then:
-
(i)
It holds thatA(t) < 1 for allt ∈ (0, 1). If moreoverA is continuously differentiable on [0, 1], then:
-
(ii)
We haveA′ (0) < 0 andA′ (1) > 0.
-
(iii)
We have
$$\forall x\in [0,1], \ 0\leq A(x)-xA^{\prime}(x)\leq 1. $$These inequalities are all strict on (0, 1) should A be strictly convex.
Proof of Lemma 1
To show (i), assume that there is t0 ∈ (0, 1) such that A(t0) = 1. Recall the increasing slopes inequality:
Apply this first with x = t ∈ (0, t0), y = t0 and z = 1 to get
so that A(t) ≥ 1 and therefore A(t) = 1 for all t ∈ (0, t0). Apply now the above set of inequalities with x = 0, y = t0 and z = t ∈ (t0, 1) to obtain
so that again A(t) = 1 for all t ∈ (t0, 1). Consequently, A is the constant function 1, which is impossible since C is not the independent copula.
We turn to the proof of (ii). Since A is continuously differentiable on [0, 1] and convex, its derivative A′ is nondecreasing. Consequently, if A′(0) were nonnegative, then so would be A′(x) for any x ∈ (0, 1], and thus A would be nondecreasing on [0, 1]. Since A(0) = A(1) = 1, this would entail that A is the constant function 1, which is a contradiction. Similarly, A′(1) cannot be nonpositive since if it were, then A would be nonincreasing on [0, 1], which is a contradiction in virtue of the equality A(0) = A(1) = 1.
We now show (iii). Notice that since A(x) ≥ x, we have
Assume that there is x0 ∈ (0, 1) such that A′(x0) > 1. Since A is convex, the function A′ is nondecreasing and therefore A′(x) > 1 on [x0, 1]. Integrating yields
which entails A(x0) < x0 and a contradiction. As a consequence, A′(x) ≤ 1 for all x ∈ (0, 1), and this should also be true on the closed interval [0, 1] by continuity of A′. Finally, A(x) − xA′(x) ≥ x(1 − A′(x)) ≥ 0 for all x ∈ [0, 1]. Use now the increasing slopes inequality to write
Let t↓x to get
which entails A(x) − xA′(x) ≤ 1 for all x ∈ (0, 1). Using the continuity of A and A′, this inequality is of course also true for x ∈ {0, 1}, which concludes the proof of the first set of desired inequalities if A is convex.
If A is moreover strictly convex, assume that there is x0 ∈ (0, 1) such that A′(x0) ≥ 1. Then A′(x) ≥ 1 on [x0, 1] and integrating yields
which entails A(t) ≤ t and therefore A(t) = t on an interval with nonempty interior, which is a contradiction by the strict convexity of A. This entails A(x) − xA′(x) ≥ x(1 − A′(x)) > 0 for all x ∈ (0, 1). Finally, to prove that A(x) − xA′(x) < 1 for all x ∈ (0, 1), we apply again the increasing slopes inequality: for all x, t ∈ (0, 1),
where all inequality signs are strict because of the strict convexity of A. Let t↓x to get
which entails A(x) − xA′(x) < 1 for all x ∈ (0, 1) as required. □
The second lemma contains an equivalent of φ(u, v) as u → 0 and v → v0 > 0.
Lemma 2
Assume thatA is continuously differentiable on [0, 1]. Then,ifu → 0 andv → v0 > 0, wehave that:
Proof of Lemma 2
Note that, since u → 0 and v → v0 > 0,
Besides, we get, as u → 0 and v → v0 > 0:
In particular:
Since as u → 0 and v → v0 > 0 we have:
we get
Because this quantity converges to 0 and
the conclusion readily follows by using a Taylor expansion of the exponential function in a neighbourhood of 0. □
Lemma 3 is a very useful result about the asymptotic interactions between the survival functions of Y and T. We state it for the sake of clarity: its proof essentially consists in applying Theorem 1.2.6 in de Haan and Ferreira (2006) repeatedly.
Lemma 3
Assume thatcondition \((\mathcal {H})\)holdsand that the distributions ofY andT satisfyconditions \(\mathcal {C}_{1}(\gamma _{Y})\)and \(\mathcal {C}_{1}(\gamma _{T})\)respectively.If in additionγYγT ≥ 0 and |γY| > |γT|, thenwhenτ = ∞wehave
When otherwise τ < ∞, then
The next result contains an equivalent of φ(u, v) as u, v → 0.
Lemma 4
Assume thatA is continuously differentiable on [0, 1]. Then,if u, v → 0, we have that:
-
φ(u, v) = −A′(0)v(1 + o(1)) ifv/u → 0;
-
φ(u, v) = A′(1)u(1 + o(1)) ifv/u → ∞;
-
\(\varphi (u,v)=(c + 1)\left [ 1-A\left (\frac {c}{c + 1} \right ) \right ] u (1+\operatorname {o}(1))\)ifv/u → c ∈ (0, ∞).
Proof of Lemma 4
Start by writing, as u, v → 0,
Since u, v → 0, we get
In particular:
Meanwhile
so that
Since this quantity converges to 0 and
the conclusion readily follows by using a Taylor expansion of the exponential function in a neighbourhood of 0. □
Lemma 5, which provides an asymptotic equivalent of \(\overline {F}_{Z}\), is a direct corollary of Lemma 4. Note that it gives indeed a true asymptotic equivalent of \(\overline {F}_{Z}\), since A′(0) < 0, A′(1) > 0 and A(t) < 1 for any t ∈ (0, 1), see Lemma 1(i) and (ii).
Lemma 5
Assume thatcondition \((\mathcal {H})\)holds andthatA is continuously differentiable on [0, 1]. Then, asz ↑ τ, we have that:
-
(i)
\(\overline {F}_{Z}(z)=-A^{\prime }(0) \overline {F}_{T}(z)(1+\operatorname {o}(1))\)if \(\overline {F}_{T}(z)/\overline {F}_{Y}(z)\to 0\)asz ↑ τ;
-
(ii)
\(\overline {F}_{Z}(z)=A^{\prime }(1) \overline {F}_{Y}(z)(1+\operatorname {o}(1))\)if \(\overline {F}_{T}(z)/\overline {F}_{Y}(z)\to \infty \)asz ↑ τ;
-
(iii)
\(\overline {F}_{Z}(z)=(c + 1)\left [ 1-A\left (\frac {c}{c + 1} \right ) \right ] \overline {F}_{Y}(z) (1+\operatorname {o}(1))\)if \(\overline {F}_{T}(z)/\overline {F}_{Y}(z)\to c\in (0,\infty )\).
Proof of Lemma 5.
Just use Eq. 1 together with Lemma 4. □
The next lemma contains an equivalent of the quantity 1 − ∂1C(1 − u, 1 − v) as u, v → 0, and is fundamental to prove Theorem 2.
Lemma 6
Assume thatA is twice continuously differentiable on [0, 1]. Then,if u, v → 0, we have that:
-
(i)
\(1-\partial _{1} C(1-u,1-v) = (1+A^{\prime }(0)) v + \frac {1}{2}A^{\prime \prime }(0) (v/u)^{2} + \operatorname {o}(v) + \operatorname {o}((v/u)^{2})\)ifv/u → 0;
-
(ii)
1 − ∂1C(1 − u, 1 − v) → A′(1) ifv/u → ∞;
-
(iii)
\(1-\partial _{1} C(1-u,1-v) \to 1- \left [ A\left (\frac {c}{c + 1} \right ) - \frac {c}{c + 1} A^{\prime }\left (\frac {c}{c + 1} \right ) \right ]\)ifv/u → c ∈ (0, ∞).
Proof of Lemma 6
Recall the identity
where B(x) = A(x) − xA′(x). We first prove (i). It is straightforward to show that
Since in this case we assume v/u → 0, we get
Therefore
It remains to compute an asymptotic expansion of C(1 − u, 1 − v)/(1 − u). To this end, we rewrite this term as
Note now that
Plugging this into the Taylor expansion
and rearranging yields
Now
so that
and therefore
We combine this with a Taylor expansion of the exponential function in a neighbourhood of 0 to get
Combining Eqs. 6 and 8 and rearranging terms concludes the proof of (i). □
We now turn to the proof of (ii) and (iii): as u and v → 0, we have
Besides,
The argument of B here converges to 1 if v/u →∞, and to c/(c + 1) if v/u → c ∈ (0,∞). By continuity of B, it follows that 1 − ∂1C(1 − u, 1 − v) → 1 − B(1) = A′(1) if v/u → ∞, and
if v/u → c ∈ (0,∞). This completes the proof. __
The final lemma contains, together with Lemma 6, the core result to show Proposition 2.
Lemma 7
Assume thatA is twice continuously differentiable on [0, 1]. Then, if u, v → 0 withv/u → 0, we havethat
Proof of Lemma 7
Recall from the proof of Lemma 4 that
Now, by Eq. 7 in the proof of Lemma 6,
Consequently, using a Taylor expansion,
Multiplying this expansion by (1 − u)(1 − v) and expanding again concludes the proof. □
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Stupfler, G. On the study of extremes with dependent random right-censoring. Extremes 22, 97–129 (2019). https://doi.org/10.1007/s10687-018-0328-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10687-018-0328-6
Keywords
- Random right-censoring
- Dependent censoring
- Extreme value copula
- Extreme value index
- Tail identifiability
- Tail censoring probability