On the study of extremes with dependent random right-censoring

The study of extremes in missing data frameworks is a recent developing field. In particular, the randomly right-censored case has been receiving a fair amount of attention in the last decade. All studies on this topic, however, essentially work under the usual assumption that the variable of interest and the censoring variable are independent. Furthermore, a frequent characteristic of estimation procedures developed so far is their crucial reliance on particular properties of the asymptotic behaviour of the response variable Z (that is, the minimum between time-to-event and time-to-censoring) and of the probability of censoring in the right tail of Z. In this paper, we focus instead on elucidating this asymptotic behaviour in the dependent censoring case, and, more precisely, when the structure of the dependent censoring mechanism is given by an extreme value copula. We then draw a number of consequences of our results, related to the asymptotic behaviour, in this dependent context, of a number of estimators of the extreme value index of the random variable of interest that were introduced in the literature under the assumption of independent censoring, and we discuss more generally the implications of our results on the inference of the extremes of this variable.


Introduction
The problem of missing data, and in particular censoring, is frequently encountered in certain fields of statistical applications. The archetypal example of censoring is arguably the study of the survival times of patients to a given chronic disease in a medical follow-up study lasting up to a fixed time t. If a patient is diagnosed with the disease at time s, then his/her survival time will be known if and only if he/she dies before time t. If this is not the case, then the only information available is that his/her survival time is not less than the censoring time t − s. In mathematical terms, the information available to the practitioner is the pair (Z, δ), where Z is the minimum between the survival time and censoring time, and δ is the 0-1 variable equal to 1 if and only if the survival time is actually observed. This situation is one of the most frequent examples of random right-censoring, which shall be our framework in this paper.
Random right-censoring is also found in actuarial science: in non-life insurance, Reynkens et al. (2017) use random right-censoring to model unsettled claims, about which the insurer only knows what has been paid up to date rather than the ultimate claim amount, i.e. the sum of all payments for the claim, which is only known when the claim is fully settled. In life insurance, meanwhile, any study that monitors policyholders in a given time period contains right-censored data points, since many of the subjects still live at the end of the monitoring period. Another example is reliability data analysis: if a car company collects failure data during the warranty period, then a failure could happen not only because of a failure in the mechanics of the car, but also because of an accident or driver error. In the latter case, time-to-accident or time-to-driver-error should be treated as the censoring time, see Modarres et al. (2009). Random right-censoring should not be confused with other types of missing data mechanisms such as right-truncation, where no information is available at all when the random variable of interest is not actually observed. In a heavy-tailed context and when the right-truncation point is unknown but non-random, this problem is considered by Aban et al. (2006), while the earliest reference tackling the random right-truncation problem in the extreme value context is Gardes and Stupfler (2015).
In a random right-censoring framework, a stimulating problem is the estimation of extreme parameters of the underlying distribution of the variable of interest, a prime example of which being its extreme value index. In the aforementioned examples, this would ultimately lead to the analysis of survival times of exceptionally strong/weak patients to a given disease, extreme losses/payouts in insurance, or failure times for highly resistant/unreliable devices. This subfield of extreme value statistics has received a good amount of attention in recent years: we refer to Beirlant et al. (2007Beirlant et al. ( , 2010Beirlant et al. ( , 2016, Einmahl et al. (2008), Gomes and Neves (2011), Ndao et al. (2014Ndao et al. ( , 2016, Sayah et al. (2014), Worms and Worms (2014), Brahimi et al. (2015) and Stupfler (2016). All of these papers work under the hypothesis that the variable of interest Y and the censoring variable T are independent random variables; in the case of Ndao et al. (2014Ndao et al. ( , 2016 and Stupfler (2016), the condition is actually conditional independence given a suitable, in practice low-dimensional, covariate. Among others, the independence assumption allows for a very convenient expression of the cumulative distribution function of the observed variable Z and, as a result, for a simple discussion of its extreme value properties. The asymptotic behaviour of Z, and that of δ|Z for high Z, can then be fruitfully exploited to construct a class of simple estimators of the extreme value index of Y (see, for instance, Beirlant et al. 2007 andEinmahl et al. 2008). It should be said that since the pioneering paper of Kaplan and Meier (1958) on the product-limit estimator for the survival function, the assumption of independent censoring is arguably the standard assumption in the context of random right-censoring.
And yet, cases in which there are strong suspicions of dependence between the variable of interest and the censoring time have been reported several times over the last decades. An early reference is Lagakos (1979). In medical studies especially, a common cause of the probable violation of the independence hypothesis is a sizeable number of patient dropouts (Huang and Zhang 2008;Jackson et al. 2014). Crucially, using traditional estimators such as the Kaplan-Meier estimator when there is dependence may yield to invalid inferences, see Fisher and Kanarek (1974), Klein and Moeschberger (1987) and the introduction of . Moreover, there is the additional issue of identifiability, in the sense that if the dependence structure is completely unspecified then the distribution of (Y, T ) cannot be recovered from that of the pair (Z, δ), see Tsiatis (1975). A number of authors have suggested partial solutions to tackle the problem of dependence: some recent efforts include fitting specific types of known copulas (Li et al. 2007;Huang and Zhang 2008) or assuming weaker assumptions than independence on the pair (X, Y ) . The integration of valuable, preferably high-dimensional covariate information may also be helpful if conditional independence given the covariate is reasonable (see Zeng 2004, Li et al. 2007and Hsu and Taylor 2010. Let us point out that the studies by Ndao et al. (2014Ndao et al. ( , 2016 and Stupfler (2016) did consider incorporating a low-dimensional covariate X, but the common idea underpinning these papers is to estimate the conditional extreme value index of Y given X = x by adapting the procedure of Einmahl et al. (2008), developed for independent censoring, using kernel-type techniques. The introduction of covariate information is therefore not motivated by a reduction of the dependence between Y and T ; in fact, these papers ignore altogether the issue of dependence and its consequences upon the inference about the extremes of the variable of interest.
Given the importance of the knowledge of the asymptotic behaviour of (Z, δ) for Z large in the construction of extreme value estimators in the independent censoring case, it is natural to think of the consequences that dependent censoring may have on this asymptotic behaviour. As noted above, there are numerous ways to specify dependence; in this paper, we assume that the dependence structure of the pair (Y, T ) is given by an extreme value copula, which is equivalent to assuming that (Y, T ) has a bivariate extreme value distribution in the sense of Tawn (1988). The construction and early development of extreme value copulas date back to Galambos (1978) and Deheuvels (1984), and a recent account is provided by Gudendorf and Segers (2010). This type of copula is particularly adapted to the description of joint extreme events, i.e. of situations when both Y and T are extreme, which constitute precisely the kind of events one has to consider in order to understand the extremes of the observed variable Z = min(Y, T ), in an effort to then get back to the extremes of Y (that would be the goal of the statistician in this context). The main results of this paper focus on, assuming standard extreme value conditions on the distribution of Y and T together with an extreme value copula dependence model, the analysis of the extreme value properties of Z first and then of the behaviour of δ given that Z is large, the latter variable indicating how much censoring there is in the extremes of the sample.
The basic assumption of a purely extreme value copula model may appear restrictive at first. It is indeed similar in spirit to assuming that, in the univariate case, the underlying distribution is a Generalised Extreme Value distribution (see de Haan and Ferreira 2006). This assumption shall nevertheless prove very useful in identifying several problems, such as the inconsistency of certain estimators, that may arise when there is dependence in the censoring mechanism. The idea is that any estimator (of, say, the extreme value index of Y ) which would be inconsistent in the present context cannot be expected to be consistent in general in a wider class of models (such as, for instance, the Archimax copula model of Capéràa et al. 2000), as any such wider class would contain the extreme value copula model in which the estimator is inconsistent. Let us also point out that the very popular bivariate model of Tawn (1996, 1997) would not be appropriate here, because this model assumes that Y and T are unit Fréchet distributed. To use this model in practice, one therefore has to transform Y and T to a unit Fréchet distribution, which implies that the distributions of Y and T are known or have at least been accurately estimated beforehand. This would be an issue in the statistical analysis of extremes with censoring, since the mindset is rather that nothing is known about the distributions of Y and T and the problem is to recover the extreme value behaviour of Y .
Let us highlight the main contributions of this paper. We start by, in an extreme value copula dependent censoring model, investigating the extreme value properties of Z as well as the convergence of the proportion of censored observations in the right tail of the variable Z. After that, we shall explain how this investigation shows that a number of estimators of the extreme value index of Y , introduced in the independent censoring case, become inconsistent in the dependent censoring framework we consider, whenever T has a lighter tail than Y has. More generally, we will argue that when the dependence structure of (Y, T ) is given by a non-independent extreme value copula, and if T has a lighter tail than Y has, then the identifiability of the extreme value index of Y , based on the information provided jointly by Z and δ|Z for Z large, is unclear. This is in stark contrast with the independent censoring case, in which we know from Einmahl et al. (2008) that the problem of inferring this parameter can indeed be solved in a simple way based on the behaviour of the pair (Z, δ) for large Z alone. This discussion will be formalised using our introduced and dedicated concept of tail identifiability, adapted to the censoring framework, and refined based on additional asymptotic considerations on the distribution of (Z, δ) for high Z. We shall then explain why, based on the full information provided by the distribution of (Z, δ) and if the extreme value copula describing the censoring mechanism is known, then the extreme value index of Y becomes clearly identifiable, and we shall outline a couple of possible strategies that may lead to consistent estimators of this extreme value index.
The outline of our paper is as follows. Section 2 gives further details about our assumptions and especially about our dependence framework. Section 3 gives our main results, first about the extremes of the observed variable Z in Section 3.1 and then about the tail censoring probability in Section 3.2. Section 4 gathers statistical considerations deduced from our results about the estimation of the extreme value index of Y , relative to the inconsistency of certain estimators in Section 4.1 and then more generally to the identifiability of this parameter based on tail information in Section 4.2. Section 5 concludes by briefly discussing possible ways to design inference techniques and providing ideas for further work. Proofs of the main results are deferred to Appendix A, and auxiliary results and their proofs are relegated to Appendix B.

Framework
We assume throughout that the variable of interest Y is partially unobserved, due to the existence of a right-censoring random variable T . In other words, we only observe the pair (Z, δ), where Z = min(Y, T ) and δ = I {Y ≤T } . Contrary to the standard setup, we also assume that Y and T are not independent. We describe here the dependence structure of the pair (Y, T ) by the means of a copula function. The key result in order to do so is Sklar's theorem (Sklar 1959), which says that there exists a copula C with in which F Y and F T denote the respective cumulative distribution functions of Y and T . A copula C is simply, in our case, a bivariate distribution function of a pair of standard uniform random variables. We assumed here that Y and T are not independent, so that the copula C cannot be the independent copula (u, v) → uv.
Since the ultimate goal of the statistician would be to recover the extremes of Y , we should work in a relevant extreme value framework. The condition we shall introduce, on an arbitrary distribution function F , is one of the many equivalent versions of the classical extreme value condition saying that the distribution should belong to the domain of attraction of some extreme value distribution. As we are in a randomly right-censored situation, it is convenient to write our extreme value condition on a distribution function F in terms of the survival function 1 − F , which leads us to the following formulation (see Theorem 1.1.6 in de Haan and Ferreira 2006): There is γ ∈ R, called the extreme value index of F , and a positive function a such that the distribution function F satisfies: Since the observed variable is Z = min(Y, T ), it makes sense to assume that the distributions of Y and T can both be included within this extreme value framework, and our main results will then be stated under the assumption that the distributions of Y and T satisfy conditions C 1 (γ Y ) and C 1 (γ T ) respectively. Because actually P(Z > z) = P(min(Y, T ) > z) = P(Y > z, T > z) it follows that the study of the extremes of Z, which is intuitively a sensible way to get information about the extremes of Y , will require a study of the situation when Y and T are jointly extreme. A very convenient assumption on the copula C in this context is then to suppose that C is an extreme value copula (see e.g. Gudendorf and Segers 2010): where A is the so-called Pickands dependence function related to C, i.e. it is a function that maps [0, 1] into [1/2, 1], is convex and satisfies the inequalities max(t, 1 − t) ≤ A(t) ≤ 1 for all t ∈ [0, 1]. The function A characterises the copula C: in particular, the case A(t) = 1 corresponds to the independent copula C(u, v) = uv (which we therefore exclude), and the case A(t) = max(t, 1 − t) is that of the perfect dependence copula C(u, v) = min (u, v). In theoretical terms, these copulae arise naturally as limiting copulae of suitably normalised sequences of componentwise maxima of independent and identically distributed bivariate pairs (Joe 1997). In this sense, assuming that (Y, T ) follows a bivariate extreme value distribution is analogous to, in a univariate context, assuming that the random variable of interest has a Generalised Extreme Value distribution. Working in such a context, which is a simplified version of the general case, can help to identify statistical issues, especially regarding the inconsistency of certain estimators, that would arise in a more general context as well; we will highlight such issues in Section 4.
Our first step is to, compared to the independent censoring case, quantify the influence that the dependence structure induced by C has on the distribution of the random variable Z. We do this by using a simple identity linking the survival function F Z (z) = P(Z > z) of Z to the survival functions F Y and F T of Y and T and to the copula C. Since it follows, after straightforward calculations, that we can write In other words, we can write the survival function of Z as what it would be under independence of Y and T , plus the term ϕ(F Y (z), F T (z)) measuring the effect that the dependence structure in C has on Z. Since the behaviour of F Y (z)F T (z) for large z (i.e. when z converges to the right endpoint of Z) is easy to analyse, Eq. 1 suggests that, to analyse the extremes of Z, it is enough to understand the behaviour of ϕ(F Y (z), F T (z)) for large z.
Let now τ Y and τ T be the right endpoints of Y and T , respectively. Since we will focus on the large values of Z, namely, near its right endpoint τ = min(τ Y , τ T ), we can anticipate that the relative positions of τ Y and τ T will play a major role in our context. In the case when τ T < τ Y , then the extremes of Y cannot be recovered because the distribution of Y is not identifiable past the point τ T . In the case τ Y < τ T , we should expect the extremes of Y to be those of Z, meaning that they can be recovered by standard techniques. The following result makes this statement precise.
Proposition 1 Assume that τ Y < τ T and C is an extreme value copula whose Pickands dependence function is continuously differentiable on [0, 1]. Then F Z (z)/F Y (z) has a positive and finite limit as z ↑ τ Z = τ Y , the right endpoint of Z.
We therefore assume in what follows that Y and T have a common right endpoint τ = τ Y = τ T . Note that distributions with a positive extreme value index have an infinite right endpoint, while distributions with a negative extreme value index have a finite right endpoint, see Theorem 1.2.1 in de Haan and Ferreira (2006). It therefore follows from our basic assumption τ Y = τ T that the extreme value indices γ Y and γ T should have the same sign.
We may now summarise our hypotheses about the joint behaviour of Y and T in the following condition: Condition (H) Y and T have a common right endpoint τ = τ Y = τ T , satisfy conditions C 1 (γ Y ) and C 1 (γ T ) respectively, and their joint distribution function is given by where C is an extreme value copula whose Pickands dependence function A is twice continuously differentiable and not equal to the constant function 1.
In their pioneering paper, Einmahl et al. (2008) use the right-tail behaviour of the observed variable Z, together with an analysis of the censoring probability given that Z is large, to show that if Y and T are independent then the extreme value index of Y can be recovered exclusively from the behaviour of (Z, δ) given that Z is large in a very simple manner under classical extreme value assumptions. A similar idea, albeit implemented differently, is used by Beirlant et al. (2007). This line of thought was then followed by Gomes and Neves (2011), Ndao et al. (2014, 2016, Brahimi et al. (2015), Beirlant et al. (2016) and Stupfler (2016) in their respective contexts. Our aim in Section 3 below is to carry out an analogue study in our dependent censoring context and see what influence the introduction of dependence in the censoring mechanism, via an extreme value copula, has on the distribution of (Z, δ) given that Z is large. Let us finally highlight that the double differentiability assumption on A is technically convenient and allows for a unified presentation of our results in Sections 3 and 4; the results in Section 3 below can be obtained by simply assuming that A is continuously differentiable (at the expense of extra technical details).

The extremes of the response variable
Our first step is to analyse the extreme value behaviour of Z, in terms of the extreme value indices γ Y and γ T of Y and T . This is straightforward in the independent censoring case, because then the survival function of Z is the product of those of Y and T . Our aim here is to state a corresponding result in the dependence context (H); recall that this condition entails that the extreme value indices of Y and T have the same sign.
Theorem 1 Assume that condition (H) holds; if γ Y = γ T , assume further that the ratio F T (z)/F Y (z) has a finite and positive limit as z ↑ τ . We have that: The right-tail behaviour of Z is therefore essentially that of the variable with the lightest tail in the pair (Y, T ), i.e. that of Y in the case when Y has a lighter tail than (or a similar tail to) that of T , and that of T when T has a lighter tail than Y has. Let us consider a simple illustrative example.
In particular, Y (resp. T ) has extreme value index 1 (resp. γ T ). Assume that the dependence structure of the pair (Y, T ) is described by a Gumbel-Hougaard copula (see Gumbel 1960): where θ ≥ 1 is a constant; here we choose θ > 1, in order to ensure that Y and T are not independent. This copula is an extreme value copula, whose Pickands dependence function is Theorem 1 predicts that F Z should be regularly varying with index −1/ min(1, γ T ). We check this by analysing the asymptotic behaviour, as z → ∞, of (1)).
In the independent case, the absolute value of the extreme value index of the observed variable Z is therefore strictly lower than what it is in the dependent case we consider here. This means that in the dependent case, the right tail of Z is heavier than it is in the independent case. Qualitatively, this is a consequence of the positive tail dependence between Y and T , due to the dependence structure being described by an extreme value copula (see Gudendorf and Segers 2010). It is interesting to note that the expression of the dependence function itself does not affect the extreme value index of Z at all. Theorem 1 also has an important corollary relative to the relationship between the extremes of Y and those of the observed variable Z: whereas the extremes of Z always contain information about those of Y and those of T in the independent case, they are driven either solely by those of Y or those of T in the dependent case considered here, no matter how close to independence the dependence structure is. It should be especially emphasised that in the case when γ Y γ T > 0 and |γ T | < |γ Y |, corresponding to T having a lighter tail than Y has, the extreme value index of Z If γ T > 1 then Together with the identity which is a consequence of Eq. 1, this entails It follows from this computation that F Z is indeed regularly varying with index −1/ min(1, γ T ).
Let us now highlight a couple of consequences of Theorem 1 about the tail behaviour of the observed variable Z in our setup. For ease of exposition, we assume until the end of this section that γ Y γ T > 0, i.e. Y and T both belong to the same max-domain of attraction, that can be either the Fréchet or Weibull domain of attraction. It is known (see Einmahl et al. 2008) that in the independent censoring case, Z then belongs to the common max-domain of attraction of Y and T , with extreme value index is exactly that of T . The only cases when this type of behaviour is observed in the independent censoring situation are when either the right tail of T is much lighter than the right tail of Y (e.g. T is light-tailed while Y is heavy-tailed) or when τ T < τ Y ≤ ∞, the latter being a case when the problem of recovering the extremes of Y has no solution since the distribution of Y is then not identifiable past the point τ T .

Tail censoring probability
The second main part of our study focuses on the information available in the censoring indicator δ given that the observed variable Z is large. In other words, we consider the behaviour of the probability When z is close to τ , the probability 1 − P (z) gives an idea of the probability of censoring in the extremes of Z. In particular, if P (z) converges to a limit P (τ ) as z ↑ τ , the probability 1 − P (τ ) shall be called the tail censoring probability.
It should be noted that Einmahl et al. (2008) achieve the study of censoring in the right tail by slightly different means, as they assume that Y and T have continuous distributions and they consider .
They mention (without proof) that under independent censoring and suitable extreme value conditions, this function has a limit as z ↑ τ ; it is actually straightforward to show that this is the limit of P as well, essentially by l'Hôpital's rule. Their statistical arguments, however, use the quantity P (z) instead: in particular, they develop an estimator of the extreme value index of Y using the sample counterpart of P (z) for large z. The limit of the function P as z ↑ τ can be expected to play an important role in the context of extreme value analysis with right-censored data, just as the classical censoring probability P(Y > T ) does for the estimation of central quantities. In the classical, central setup, a positive (and less than 1) censoring probability means that the problem has a solution and that traditional estimators have to be corrected in some way in order to retain consistency. A censoring probability equal to 0 happens in totally uncensored cases, in which standard, uncorrected techniques will still be consistent. Finally, a censoring probability equal to 1 gives rise to a totally censored case, which is a situation when the estimation problem does not have a solution.
In classical, central censoring problems, the condition that the censoring probability belongs to the interval (0, 1) is thus crucial for an estimation problem to be both nontrivial and workable. Our purpose here is to show that the tail censoring probability exists indeed in our dependent censoring setting, and the next section will draw conclusions from its value that are related to inference about the extreme value index of Y . Our first step to prove that the limiting tail censoring probability exists is to obtain a convenient representation of the function P . In all what follows, ∂ 1 C denotes the partial derivative of C with respect to its first argument. It is easy to show that (2) Besides, if Y has a continuous distribution with probability density function f Y then clearly It follows that if T has a continuous distribution as well, then we can write Formula (2) and representation (3) are key to our analysis of the convergence of P , which is the focus of our second main result.
Theorem 2 Assume that condition (H) holds and suppose moreover that Y and T have continuous distributions. Then, as z ↑ τ , we have that: If moreover A is strictly convex, then P c ∈ (0, 1).
It follows from this result that, in our dependent censoring context, the tail censoring probability 1 − P (τ ) = 1 − lim z↑τ P (z) exists and: • When T has a lighter tail than Y has, then the tail censoring probability is 1; • When Y has a lighter tail than T has, then the tail censoring probability is 0; • The only nondegenerate case arises when A is strictly convex and Y and T have proportional tails, the tail censoring probability then being between 0 and 1 depending on the function A.
By comparison, in the independent case, provided Y and T have continuous distributions that satisfy conditions C 1 (γ Y ) and C 1 (γ T ) respectively, with γ Y γ T > 0 and equal right endpoints, the tail censoring probability is see Einmahl et al. (2008). We illustrate these conclusions further in the example below.
Example 2 Suppose, as in Example 1, that Y and T have the Pareto distributions In the independent case, the tail censoring probability would be decreasing smoothly from 1 to 0 as the tail of T gets heavier. By contrast, in the dependent case with Pickands dependence function A(t) = 1 − t (1 − t), the tail censoring probability is which, although still a decreasing function of γ T , is discontinuous at 1 and piecewise constant.
Due to the nature of the limit P (τ ) in our dependent case, the above example is very much representative of the different situations that can occur, the only degrees of freedom here being the position of the discontinuity (i.e. the value of γ Y , taken to be equal to 1 in the above example) and the value of P (τ ) at this discontinuity (which depends on the function A). Again, it should be emphasised that the Pickands dependence function plays a largely insignificant role in the results, in that it does not affect the value of P (τ ) except at the discontinuity. In particular, the coefficient of upper tail dependence: (see Gudendorf and Segers 2010), only appears when Y and T have exactly equivalent tails, and in that case One reason behind this is the following: recall that the upper tail dependence coefficient is where (U, V ) is a random pair with distribution function C (see again Gudendorf and Segers 2010). In other words, the upper tail dependence coefficient measures extremal dependence in the direction of the 45-degree line. Now (U, V ) :=

(F Y (Y ), F T (T )) is such a random pair and
We should therefore only expect the upper tail dependence coefficient to appear in this problem if F Y (z) and F T (z) are equivalent for large z. More generally though, the fact that the actual expression of the function A has no influence in our results outside of the very specific case of similar tails is worthy of note.
We conclude this section by underlining what is arguably the essential difference, as far as the tail censoring probability is concerned, between the independent and our dependent censoring case. For independent censoring, when the right tails of Y and T are of the same type, the tail censoring probability is In particular, on average, a positive proportion of the highest values of Z are known to come from the variable of interest Y , meaning that the extremes of Y are indeed observed in practice and thus can be recovered by an adapted technique. By contrast, in our dependent censoring case, the tail censoring probability 1 − P (τ ) is 1 as soon as T has a lighter tail than Y has; in other words, a vanishingly small proportion of the highest observations from Z will come from Y in this case. This suggests in particular that, without any further information on the tails of Y and T , the problem of inferring the extreme value index of Y from (Z, δ) given that Z is large has no clear solution. We now seek to clarify these ideas in the next section.

Statistical consequences
In this section, we first explain what impact our main results can be expected to have on a class of estimators of the extreme value index of Y in our dependent framework. We then draw general conclusions regarding the identifiability of γ Y from the information provided by the distribution of (Z, δ) for large Z.

On the large-sample behaviour of a class of estimators of γ Y
Our results imply that a dependent censoring mechanism entails consequences, in terms of consistency of estimators, that can be as serious in the study of extremes of the variable of interest as they are in the study of its central parameters. To illustrate this point, we focus on certain estimators that have been developed up to now in the literature. For ease of exposition, we work here under the conditions of Theorem 2 and we further assume that γ Y γ T > 0. The estimators of Beirlant et al. (2007Beirlant et al. ( , 2016, Einmahl et al. (2008), Gomes and Neves (2011), Ndao et al. (2014, 2016, Brahimi et al. (2015) and Stupfler (2016) are all based on the fact that in the independent censoring case, The common idea these authors use is then to plug a consistent estimator of γ Z and a consistent estimator of P (τ ) in the left-hand side above in order to obtain a consistent estimator of γ Y . One representative example of such an estimator is obtained as follows: suppose that the available data is made of independent pairs (Z i , δ i ), 1 ≤ i ≤ n. Denote by Z 1,n ≤ · · · ≤ Z n,n the order statistics of (Z 1 , . . . , Z n ) and by δ [n−i+1,n] the δ−indicator corresponding to Z n−i+1,n . An empirical estimator of P (τ ) is then where k = k(n) → ∞ and k/n → 0, see Einmahl et al. (2008). Combining this with e.g. the moment estimator γ Z,k of γ Z (see Dekkers et al. 1989) results in an estimator γ Y = γ Z,k / P k (τ ) of γ Y which is consistent under independent censoring if certain general conditions are satisfied. In the dependent case we consider, however, the equality γ Z P (τ ) = γ Y is only true when the tail of Y is strictly lighter than that of T , in which case we actually have γ Z = γ Y ; the left-hand side is not even defined for |γ Y | > |γ T | since P (τ ) is then equal to 0. This leads to the inconsistency of the aforementioned estimators when T has a lighter tail than Y has, and more generally results in a considerable restriction on their applicability in our dependent censoring setup. Let us also point out that the alternative estimators of Beirlant et al. (2010), Sayah et al. (2014) and Worms and Worms (2014) do not directly use the ratio of γ Z and P (τ ) in order to design a consistent estimator. They are however based on a Kaplan-Meier estimate of F Y in its right tail, so that they would also be seriously affected by a violation of the independent censoring assumption, as dependence can cause the Kaplan-Meier estimator to become inconsistent (see Fisher and Kanarek 1974, Klein and Moeschberger 1987. An adaptation of these methods therefore requires in particular the construction of an adapted estimator of F Y , which necessitates more information than that brought by the pair (Z, δ) for Z large: we shall return to this in Section 5.

Tail identifiability
In order to draw some consequences of our results on the problem of inferring the extreme value index of Y from the behaviour of (Z, δ) conditional on Z being large, we start by introducing a dedicated concept of tail identifiability. Recall that a statistical model {P θ , θ ∈ }, where P θ denotes a distribution described by a parameter θ belonging to a parameter space , is called identifiable if the mapping θ → P θ specifying the model is one-to-one, i.e.
In the statistical literature, the phrases "identifiable statistical model" and "identifiable parameter" are very often used interchangeably: a parameter (of a model) is then said to be identifiable if two different values of the parameter yield two different models. Here, our interest lies primarily in the extreme value index γ Y of Y . A complication is that γ Y does not completely specify the distribution of Y in general; to put it differently, two distributions can have the same extreme value index and still not be identical (such as a Pareto distribution and a nontrivial mixture of this same Pareto distribution with a uniform distribution). However, the extreme value index precisely describes the type of tail a distribution has, by the following simple observation: if U and V satisfy conditions C 1 (γ U ) and C 1 (γ V ) respectively, and U and V have an equal right endpoint τ , then This assertion is essentially a consequence of Theorem 1.2.6 in de Haan and Ferreira (2006). This simple remark is the basis for our concept of tail identifiability in the random censoring framework, which we define below. Note that this definition bears no relationship to the concept of tail identifiability of a family of copulas introduced in Ding (2010, 2012).
Definition 1 Let K τ be the class of joint distributions of (Y, T ) such that τ Y = τ T = τ and the distributions of Y and T satisfy conditions C 1 (γ Y ) and C 1 (γ T ) respectively. Let M τ ⊂ K τ . We say that γ Y is tail identifiable in the random censoring model M τ if, whenever the distributions of (Y 1 , T 1 ) and (Y 2 , T 2 ) both belong to M τ then, denoting by Z i = min(Y i , T i ) and The above definition thus essentially says that γ Y is tail identifiable if its value can be inferred from the first meaningful asymptotics in the distribution of (Z, δ) given that Z is large; in practice, γ Y will be tail identifiable if its value can be computed from the extreme value index of Z and the tail censoring probability. Since the extreme value condition C 1 (γ ) itself only gives information about the first-order asymptotics of a survival function near its right endpoint, Definition 1 appears to be a reasonable definition for our purpose. Finally, we note that, as we should expect, γ Y is tail identifiable in the submodel of K τ restricted to independent censoring and γ Y γ T > 0: this is because in that particular case, γ Y = γ Z /P (τ ).
We shall now summarise Theorems 1 and 2 with a corollary in terms of tail identifiability of γ Y in our random censoring model.

Corollary 1 For a fixed τ , let M τ be the class of joint distributions of (Y, T ) satisfying condition (H), and such Y and T have continuous distributions. Let further
has a finite positive limit as z ↑ τ . Then: Corollary 1 has essentially three consequences. The first one is that in general, if the dependence structure is described by an extreme value copula distinct from the independence copula, then the parameter γ Y is not tail identifiable. In other words, the information contained in the first-order asymptotics of the distribution of the pair (Z, δ), for large Z, is in general not sufficient to identify the value of γ Y . The second one is that even if it is known that Y has a heavier tail than T has, the parameter γ Y is still not tail identifiable. The third and final consequence is that, if by contrast it is known that Y does not have a heavier tail than that of T , then Y is indeed tail identifiable; this is a corollary of Theorem 1, stating that in this case γ Y is actually the extreme value index of Z.
Statistically speaking, it follows from Corollary 1 that, when there is dependent censoring induced by an extreme value copula and contrary to the independent censoring case, there is no "obvious" way to estimate γ Y consistently based on the highest values of the observed variable in a sample together with their censoring indicators, if no further information is provided. This result does not, however, consider what can be obtained by looking at the asymptotics of the functions z → F Z (z) = P(Z > z) and z → P (z) = P(δ = 1|Z > z) in greater detail. For instance, recalling that, from Eq. 1: it can be thought that, although the influence of γ Y (or equivalently F Y (z)) is not necessarily visible in the first-order asymptotics of the survival function of Z, it may well affect its second-order asymptotics. Similarly, tail identifiability is only linked to the limit of the function z → P (z) at the endpoint of Z, but it does not consider the rate of convergence of this function to its limit. In the problematic, non-tailidentifiable case |γ Y | > |γ T |, when γ Z = γ T and P (z) → 0, the equality suggests that the rate of convergence of P (z) to 0 should contain information about γ Y . Our final result, which is a refinement of Theorems 1 and 2, sheds some light on this by giving asymptotic expansions of z → F Z (z) and z → P (z) in the case |γ Y | > |γ T |.

Proposition 2 Assume that condition (H) holds and suppose moreover that Y and T have continuous distributions. If |γ
and P (z) = − It follows from Proposition 2 that the extreme value behaviour of Y does indeed theoretically have an influence on the higher-order asymptotic properties of z → F Z (z) and z → P (z). However, this information is still not sufficient in practice to identify γ Y , as the next example shows.
Example 3 Consider the following two models: • Model 1: Y 1 has a Pareto distribution with parameter γ Y 1 = 3, and T 1 = T is an equally weighted mixture of two Pareto distributions with respective parameters 1 and 6/7, • Model 2: Y 2 has a Pareto distribution with parameter γ Y 2 = 3/2, and T 2 = T is an equally weighted mixture of two Pareto distributions with respective parameters 1 and 6/7, with a common dependence structure given by the Pickands dependence function Then for any i ∈ {1, 2}, In particular and it follows from Proposition 2 that, for any i ∈ {1, 2}, we have The functions z → F Z i (z) (resp. z → P i (z)) therefore have the same second-order expansion (resp. asymptotic equivalent) in both models, although γ Y 1 = 3 = 3/2 = γ Y 2 .
The idea behind the construction of the above example is that, even if γ Y should theoretically have an influence on the higher-order asymptotic properties of z → F Z (z), this influence can be completely masked by the asymptotic behaviour of the survival function of T . For instance, if γ Y > γ T > 0, and where a Y , a T > 0, b Y , b T = 0 and ρ Y , ρ T < 0, then by Proposition 2 In other words, the extreme value index γ Y may not appear at all in the second-order asymptotics of F Z if the second-order parameter ρ T of T is sufficiently close to 0. This discussion carries over to higher-order asymptotics; if the third-order parameter of T (in the sense of Goegebeur and de Wet 2012) is small as well, then F Z and F T will (up to the third asymptotic order) have the same asymptotic behaviour and in particular, γ Y will not feature in the asymptotic properties of F Z up to that order. The same kind of phenomenon happens when considering the function P . Calculations similar to those that lead to Proposition 2 show that, when A is three times differentiable and γ Y > γ T > 0, then one can push further the asymptotic expansion of P given in Proposition 2 to get The function P is then generally regularly varying with index − min(γ −1 is not sufficient in general to know γ Y . Furthermore, the second-order (and higher-order) terms in F Y and F T can once again hide the influence of γ Y on the asymptotic behaviour of the function P if they converge to 0 sufficiently slowly. An example, at the second order, is obtained by altering Example 3 as follows: consider the models • Model 1: Y 1 has a survival function such that F Y 1 (y) = y −1/3 1 + 1 5 y −1/12 + o(y −1/12 ) as y → ∞ (one such example is F Y 1 (y) = y −1/3 (1 + y −1/12 ) 1/5 for y large enough) and T 1 = T is an equally weighted mixture of two Pareto distributions with respective parameters 1 and 6/7, • Model 2: Y 2 has a survival function such that F Y 2 (y) = y −2/3 1 − 2 7 y −1/12 + o(y −1/12 ) as y → ∞ (one such example is F Y 2 (y) = y −2/3 (1 − y −1/12 ) 2/7 for y large enough) and T 2 = T is an equally weighted mixture of two Pareto distributions with respective parameters 1 and 6/7, with a common dependence structure given by the Pickands dependence function It is then readily checked that, in both models, The functions z → F Z i (z) and z → P i (z) therefore have the same second-order expansion in both models, although the extreme value indices of Y 1 and Y 2 are different. As we argued above, our arguments can be extended to construct further pairs of models in which higher-order (such as third-order) expansions of F Z and P coincide, although extreme value indices of Y are different. It is therefore not clear that the information provided jointly by Z and δ|Z for Z large is generally sufficient to recover the extreme value index γ Y .

Discussion and ideas for further work
Our arguments in Section 4.2 show that, when the censoring mechanism is described by a non-independent extreme value copula, then the information contained in the extremes of Z, and in the distribution of δ given that Z is high, does not appear to allow one to identify the extreme value index of Y . Let us stress that our results do not show that the extreme value index of Y cannot be recovered; they suggest however that, if one wants to estimate the extreme value index of Y , then one should use more information than what is provided by the distribution of (Z, δ) for Z large. This stands in contrast with the independent censoring situation, in which the extreme value index of Y is generally the ratio of the extreme value index γ Z and of the tail proportion of uncensored observations, both quantities being solely determined by the information provided by Z and δ|Z for Z large. One possible way of incorporating more information into the model is to integrate covariate information that makes the assumption of conditional independence between censoring time and variable of interest plausible. While Ndao et al. (2014Ndao et al. ( , 2016 and Stupfler (2016) consider a conditional extreme value model, they do not consider the question of dependence in the censoring mechanism. The focus of Ndao et al. (2014) and Stupfler (2016) is rather the analysis of extreme survival times to AIDS given age at diagnosis. It is unlikely that such a covariate would be enough to eliminate all source of dependence in the censoring mechanism if it is believed to be present; more relevant information would include prognostic covariates, which are typically high-dimensional (see Hsu and Taylor 2010). It can also happen that the covariate information which could be used by the investigator to alleviate the dependence is simply not available from the data at hand.
Here, staying within the context of an extreme value copula dependence structure, we show how assuming that the copula C is fully known could lead to estimators of γ Y . One possible idea for that is to come back to the interpretation of γ Y as a shape parameter for the tail of the survival function F Y . For instance, if γ Y > 0, then γ Y can be obtained as a limiting average log-excess: In other words (see equation (3.2.1) in de Haan and Ferreira 2006): In general, we have the relationship This is the convergence behind the well-known moment estimator of Dekkers et al. (1989). One consequence of this relationship is that we have an explicit, relatively simple limit relationship linking γ Y to F Y . Besides, in the context of model (H), the partial derivatives of C exist and are continuous, being given by These nice expressions and regularity properties guarantee that, if C (or equivalently the Pickands dependence function A) is known, then F Y is identifiable (see Zheng andKlein 1995 andCarrière 1995) and can be consistently estimated by solving iteratively a system of nonlinear differential equations (Carrière 1995), or by the so-called Copula-Graphic estimator of Zheng and Klein (1995); see also Lo and Wilke (2010). The identifiability of F Y then guarantees that of γ Y , by Formula (4), and this formula also suggests a potential plug-in strategy for the construction of an estimator of γ Y if F Y has been accurately estimated beforehand, although the particular structure of the aforementioned estimators of F Y may make this task computationally challenging. Let us also note that, while the Copula-Graphic estimator of Zheng and Klein (1995) has in general only an implicit form, it admits a simple closed form if the copula C is moreover Archimedean, namely where φ : (0, 1] → [0, ∞] is a decreasing convex function satisfying φ(x) → ∞ as x → 0 and φ(1) = 0. This result is due to Rivest and Wells (2001;see Formula (4) therein). It happens that the Gumbel-Hougaard copula, introduced in Example 1, is the only non-independent extreme value copula that is also Archimedean (see Genest and Rivest 1989). In the independent case, the Copula-Graphic estimator is actually the classical Kaplan-Meier estimator (see again Rivest and Wells 2001). The idea we have just developed was essentially that, since it is not clear that γ Y can in general be identified from the information provided by Z and δ|Z for large Z only, a good idea is to look at what further information the central part of the distribution of Z (and the associated conditional distribution δ|Z) can bring. If the copula C is moreover known, then F Y is identifiable from (Z, δ) and γ Y can then at least conceptually be recovered as a (tail) shape parameter for the distribution of Y via, for instance, Formula (4). The assumption of a known copula is not as costly as it may seem; in general, the overarching aim of extreme value analysis is to estimate extreme quantiles of Y , or more generally get some understanding of the distribution of Y in its right tail which is more comprehensive than the knowledge of its extreme value index. Extreme quantile estimation, in particular, is nothing but the estimation of the generalised inverse of F Y at specific levels, and this inference therefore translates into inference about F Y on part of its range. It is then appropriate, in this context, to make an assumption adapted to the estimation of F Y . The approaches we have just outlined could, however, result in potentially implicit and/or computationally challenging estimators of γ Y , whose asymptotic properties are not yet known and outside the scope of this paper.
Proof of Proposition 1 Use Eq. 1 to get This is a positive constant, by Lemma 1. Thus which completes the proof.
Proof of Theorem 1 Apply Lemma 3 to get: In each of these four cases, the result is then a direct corollary of Lemma 5(i) and (ii). In the case γ Y = γ T , the result is a consequence of Lemma 5(iii).
(5) The first part of the integral in the numerator is controlled by noting that To control the second one, we first write When γ Y > 0, and therefore τ = ∞, we apply Lemma 3 to obtain that there is ε > 0 such that Now, by Theorem 1.2.1 in de Haan and Ferreira (2006), the function F Y is regularly varying with index −1/γ Y . In other words, we can write this function as where L Y is a slowly varying function at infinity. Since log(L Y (t))/ log t → 0 as t → ∞ (see Proposition 1.3.6 p.16 in Bingham et al. 1987) we get log(F Y (t))t −ε/2 → 0 as t → ∞. In particular: as z → ∞ which concludes the proof in this case. When γ Y < 0 and τ < ∞, we apply Lemma 3 again to obtain that there is ε > 0 such that Since by Theorem 1.2.1 in de Haan and Ferreira (2006) the function T → F Y (τ − T −1 ) is regularly varying with index 1/γ Y < 0, we can exactly mimic the proof of the case γ Y > 0 to obtain The proof of (i) is then complete.

If the constant 1 −
as required. Otherwise we clearly have and this completes the proof.
Proof of Proposition 2 Recall that F T (z)/F Y (z) → 0 as z ↑ τ by Lemma 3. Then, use Lemma 7 together with the equality to obtain the statement about F Z (z). The result on P (z) is a reformulation of Eq. 5 in the proof of Theorem 2.
Apply this first with x = t ∈ (0, t 0 ), y = t 0 and z = 1 to get so that A(t) ≥ 1 and therefore A(t) = 1 for all t ∈ (0, t 0 ). Apply now the above set of inequalities with x = 0, y = t 0 and z = t ∈ (t 0 , 1) to obtain 0 ≤ A(t) − 1 t so that again A(t) = 1 for all t ∈ (t 0 , 1). Consequently, A is the constant function 1, which is impossible since C is not the independent copula.
We turn to the proof of (ii). Since A is continuously differentiable on [0, 1] and convex, its derivative A is nondecreasing. Consequently, if A (0) were nonnegative, then so would be A (x) for any x ∈ (0, 1], and thus A would be nondecreasing on [0, 1]. Since A(0) = A(1) = 1, this would entail that A is the constant function 1, which is a contradiction. Similarly, A (1) cannot be nonpositive since if it were, then A would be nonincreasing on [0, 1], which is a contradiction in virtue of the equality We now show (iii). Notice that since A(x) ≥ x, we have Assume that there is x 0 ∈ (0, 1) such that A (x 0 ) > 1. Since A is convex, the function A is nondecreasing and therefore A (x) > 1 on [x 0 , 1]. Integrating yields which entails A(x 0 ) < x 0 and a contradiction. As a consequence, A (x) ≤ 1 for all x ∈ (0, 1), and this should also be true on the closed interval [0, 1] by continuity of A . Finally, A(x) − xA (x) ≥ x(1 − A (x)) ≥ 0 for all x ∈ [0, 1]. Use now the increasing slopes inequality to write which entails A(x) − xA (x) ≤ 1 for all x ∈ (0, 1). Using the continuity of A and A , this inequality is of course also true for x ∈ {0, 1}, which concludes the proof of the first set of desired inequalities if A is convex. If A is moreover strictly convex, assume that there is x 0 ∈ (0, 1) such that A (x 0 ) ≥ 1. Then A (x) ≥ 1 on [x 0 , 1] and integrating yields which entails A(t) ≤ t and therefore A(t) = t on an interval with nonempty interior, which is a contradiction by the strict convexity of A. This entails A(x) − xA (x) ≥ x(1 − A (x)) > 0 for all x ∈ (0, 1). Finally, to prove that A(x) − xA (x) < 1 for all x ∈ (0, 1), we apply again the increasing slopes inequality: for all x, t ∈ (0, 1), where all inequality signs are strict because of the strict convexity of A. Let t ↓ x to get ∀x ∈ (0, 1), which entails A(x) − xA (x) < 1 for all x ∈ (0, 1) as required.

Lemma 2 Assume that
Proof of Lemma 2 Note that, since u → 0 and v → v 0 > 0, Besides, we get, as u → 0 and v → v 0 > 0: In particular: Since as u → 0 and v → v 0 > 0 we have: Because this quantity converges to 0 and − 1 −1 the conclusion readily follows by using a Taylor expansion of the exponential function in a neighbourhood of 0.
Lemma 3 is a very useful result about the asymptotic interactions between the survival functions of Y and T . We state it for the sake of clarity: its proof essentially consists in applying Theorem 1.2.6 in de Haan and Ferreira (2006) repeatedly.
Lemma 3 Assume that condition (H) holds and that the distributions of Y and T satisfy conditions C 1 (γ Y ) and C 1 (γ T ) respectively. If in addition γ Y γ T ≥ 0 and |γ Y | > |γ T |, then when τ = ∞ we have When otherwise τ < ∞, then The next result contains an equivalent of ϕ(u, v) as u, v → 0.

Lemma 4 Assume that A is continuously differentiable on
we have that: Proof of Lemma 4 Start by writing, as u, v → 0, (1)).
Since this quantity converges to 0 and ϕ(u, v) = (1+o (1) − 1 − 1 the conclusion readily follows by using a Taylor expansion of the exponential function in a neighbourhood of 0.
Lemma 5, which provides an asymptotic equivalent of F Z , is a direct corollary of Lemma 4. Note that it gives indeed a true asymptotic equivalent of F Z , since A (0) < 0, A (1) > 0 and A(t) < 1 for any t ∈ (0, 1), see Lemma 1(i) and (ii).

Lemma 5 Assume that condition (H) holds and that
A is continuously differentiable on [0, 1]. Then, as z ↑ τ , we have that: Proof of Lemma 5. Just use Eq. 1 together with Lemma 4.
The next lemma contains an equivalent of the quantity 1 − ∂ 1 C(1 − u, 1 − v) as u, v → 0, and is fundamental to prove Theorem 2.

Lemma 6
Assume that A is twice continuously differentiable on [0, 1]. Then, if u, v → 0, we have that: Proof of Lemma 6 Recall the identity where B(x) = A(x) − xA (x). We first prove (i). It is straightforward to show that Since in this case we assume v/u → 0, we get (1)). Therefore It remains to compute an asymptotic expansion of C(1 − u, 1 − v)/(1 − u). To this end, we rewrite this term as Note now that Plugging this into the Taylor expansion and rearranging yields and therefore We combine this with a Taylor expansion of the exponential function in a neighbourhood of 0 to get Combining Eqs. 6 and 8 and rearranging terms concludes the proof of (i).
We now turn to the proof of (ii) and (iii): as u and v → 0, we have (1 + o(1)).
The final lemma contains, together with Lemma 6, the core result to show Proposition 2. Now, by Eq. 7 in the proof of Lemma 6,

Lemma 7 Assume that
Consequently, using a Taylor expansion, Multiplying this expansion by (1 − u)(1 − v) and expanding again concludes the proof.