## Abstract

In this paper, a copula-graphic estimator is proposed for censored survival data. It is assumed that there is some dependent censoring acting on the variable of interest that may come from an existing competing risk. Furthermore, the full process is independently censored by some administrative censoring time. The dependent censoring is modeled through an Archimedean copula function, which is supposed to be known. An asymptotic representation of the estimator as a sum of independent and identically distributed random variables is obtained, and, consequently, a central limit theorem is established. We investigate the finite sample performance of the estimator through simulations. A real data illustration is included.

This is a preview of subscription content, log in to check access.

## References

Fleming TR, Harrington DP (1991) Counting processes and survival analysis. Wiley, New York

Földes A, Rejtő L (1981) A LIL type result for the product limit estimator. Z Wahrscheinlichkeitstheor Verw Geb 56:75–86

Kalbfleisch JD, Prentice RL (1980) The statistical analysis of failure time data. Wiley, New York

Kaplan EL, Meier P (1958) Nonparametric estimation from incomplete observations. J Am Stat Assoc 53:457–481

Lakhal L, Rivest LP, Abdous B (2008) Estimating survival and association in a semicompeting risks model. Biometrics 64:180–188

Lo SH, Singh K (1986) The product-limit estimator and the bootstrap: some asymptotic representations. Probab Theory Relat Fields 71:455–456

Major P, Rejtő L (1988) Strong embedding of the estimator of the distribution function under random censorship. Ann Stat 16:1113–1132

Nelsen RB (2006) An introduction to copulas. Springer, New York

Rivest LP, Wells MT (2001) A martingale approach to the copula-graphic estimator for the survival function under dependent censoring. J Multivar Anal 79:138–155

Said M, Ghazzali N, Rivest LP (2009) Score tests for independence in semiparametric competing risks models. Lifetime Data Anal 15:413–444

Sánchez-Sellero C, González-Manteiga W, Van Keilegom I (2005) Uniform representation of product-limit integrals with applications. Scand J Stat 32:563–581

Schäfer H (1986) Local convergence of empirical measures in the random censorship situation with application to density and rate estimators. Ann Stat 14:1240–1245

Stute W (1993) Consistent estimation under random censorship when covariables are present. J Multivar Anal 45:89–103

Stute W (1995) The central limit theorem under random censorship. Ann Stat 23:422–439

Stute W (1996) Distributional convergence under random censorship when covariables are present. Scand J Stat 23:461–471

Tsiatis A (1975) A nonidentifiability aspect of the problem of competing risks. Proc Natl Acad Sci 72:20–22

Van Keilegom I, Veraverbeke N (1997) Estimation and bootstrap with censored data in fixed design nonparametric regression. Ann Inst Stat Math 49:467–491

Zheng M, Klein JP (1995) Estimates of marginal survival for dependent competing risks based on an assumed copula. Biometrika 82:127–138

## Acknowledgements

Work supported by the Grant MTM2008-03129 of the Spanish Ministry of Science and Innovation. The first author acknowledges support from the projects MTM2011-23204 of the Spanish Ministry of Science and Innovation (FEDER support included) and 10PXIB300068PR of the Xunta de Galicia. The second author also acknowledges the IAP Research Network P6/03 of the Belgian State (Belgian Science Policy). Noël Veraverbeke is extraordinary professor at the North-West University, Potchefstroom, South Africa.

## Author information

### Affiliations

### Corresponding author

## Appendix 1: Technical lemmas

### Appendix 1: Technical lemmas

In this section, we give the technical lemmas needed in the proof to Theorem 1.

### Lemma 1

*Under the conditions in Theorem *1, *we have*

### Proof

It is easy to see that, with probability 1,

The first term is *O*(*n*
^{−1/2}(loglog*n*)^{1/2}) a.s., cf. Földes and Rejtő (1981). The same order bound is proved to hold for the second term in Lemma 4 below, and the proof is complete. □

### Lemma 2

*Under the conditions in Theorem *1, *we have*

### Proof

With probability 1 we have

Then, the assertion of the lemma follows from a result of Földes and Rejtő (1981). □

### Lemma 3

*Under the conditions in Theorem *1, *we have*

### Proof

Divide [0,*T*] into *k*
_{
n
}=*O*(*n*
^{1/2}(log*n*)^{1/2}) subintervals [*t*
_{
i
},*t*
_{
i+1}] of length *O*(*n*
^{−1/2}(log*n*)^{1/2}). Then, as in the proof of Lo and Singh (1986), we have

For *I*, we have by Taylor expansion and the fact that \(\sup_{0\leq t\leq T}\vert \overline{H}_{n}(t)-\overline{H}(t)\vert =O(n^{-1/2}(\log \log n)^{1/2})\) a.s. (Földes and Rejtő 1981):

Now further subdivide each interval [*t*
_{
i
},*t*
_{
i+1}] into *a*
_{
n
}=*O*(*n*
^{1/4}(log*n*)^{−1/4}) subintervals of length *O*(*n*
^{−3/4}(log*n*)^{3/4}). By using Bernstein’s inequality we can show that this term is bounded a.s. by

for some constant *c*>0. By the modulus of continuity result for the Kaplan–Meier estimator (see Schäfer 1986) we obtain that *I*=*O*(*n*
^{−3/4}(log*n*)^{3/4}) a.s. The *II* term is treated similarly and leads to the same order. It requires the almost sure behavior of the modulus of continuity of the \(H_{n}^{1}\) estimator, and this follows from Lemma 5 below. In that Lemma take *a*
_{
n
}=*n*
^{−1/2}(log*n*)^{1/2}. □

Lemmas 4 and 5 below are needed for the proofs of Lemmas 1 and 3, respectively. They have some independent interest since they provide the almost sure rate of convergence and the almost sure behavior of the modulus of continuity for the estimator of the cumulative incidence function of *Z* subject to *δ*=1 (\(H_{n}^{1}\)).

### Lemma 4

*For*
\(T<\min (T_{F},T_{G},T_{\widetilde{G}})\), *we have*

### Proof

Define the following empirical estimators for the distribution function \(\widetilde{H}(t)=P(U\leq t)\) and for the subdistribution functions \(\widetilde{H}^{0}(t)=P(U\leq t,\rho =0)\) and \(\widetilde{H}^{11}(t)=P(U\leq t,\rho =1,\delta =1)\):

Then, *H*
^{1}(*t*) can be expressed in terms of \(\widetilde{H}\), \(\widetilde{H}^{0}\), and \(\widetilde{H}^{11}\), and \(H_{n}^{1}(t)\) can be expressed in terms of the corresponding empiricals. Similar as in Stute (1995), we obtain

and

It follows that \(\sup_{0\leq t\leq T}\vert H_{n}^{1}(t)-H^{1}(t)\vert \) is smaller than

The second term in (4) is *O*(*n*
^{−1/2}(loglog*n*)^{1/2}) a.s. For the first term in (4), we use (with obvious abbreviations) that

with *θ* between 0 and *a*−*b*. Note that exp(*b*) is uniformly bounded in [0,*T*]. Looking at (*a*−*b*), we have

The second term in (5) is *O*(*n*
^{−1/2}(loglog*n*)^{1/2}) a.s. For the first term in (5), we use that, for *x*≥0,

It follows that the first term in (5) is bounded above by

This is *O*(*n*
^{−1/2}(loglog*n*)^{1/2}) a.s. since \(\sup_{0\leq z\leq T}\vert \widetilde{H}_{n}(z)-\widetilde{H}(z)\vert \) has the same order and since \(\widetilde{H}(T)<1\). □

### Lemma 5

*Suppose that*
\(T<\min (T_{F},T_{G},T_{\widetilde{G}})\). *Suppose that*
*H*(*t*)=*P*(*Z*≤*t*) *and*
*H*
^{1}(*t*)=*P*(*Z*≤*t*,*δ*=1) *have bounded first derivatives in* [0,*T*]. *Let* {*a*
_{
n
}} *be a sequence of positive constants tending to zero with*
*a*
_{
n
}
*n*(log*n*)^{−5}>*Δ*>0 *for all*
*n*
*sufficiently large*. *Then*

### Proof

We make the same partition of the interval [0,*T*] as in Lemma A.5 of Van Keilegom and Veraverbeke (1997). Exploiting the monotonicity of *H*
^{1}(*t*) and \(H_{n}^{1}(t)\) and also the Lipschitz continuity of *H*
^{1}(*t*), we obtain that it suffices to prove that

where {*t*
_{
ij
}}, *i*=1,…,*m*, *j*=−*b*
_{
n
},…,*b*
_{
n
} is a grid of points with \(m= [ \frac{T}{a_{n}} ] \) ([⋅] denoting the integer part) and \(b_{n}\sim a_{n}^{1/2}n^{1/2}(\log n)^{-1/2}\). At this point, we use the almost sure asymptotic representation for \(H_{n}^{1}(t)\) as it can be derived as a special case of the more general result of Sánchez-Sellero et al. (2005):

where

with

the function \(\widetilde{C}(t)\) being that in the remark of Sect. 2; and sup_{0≤t≤T
}|*R*
_{
n
}(*t*)|=*O*(*n*
^{−1}(log*n*)^{3}) a.s. It follows that it suffices to show that

To achieve this, we use Bernstein’s inequality as in Van Keilegom and Veraverbeke (1997). The random variables \(\widetilde{\widetilde{\psi }}_{r}(t_{ik})-\widetilde{\widetilde{\psi }}_{r}(t_{ij})\) are bounded, and \(\operatorname {Var}(\widetilde{\widetilde{\psi }}_{r}(t_{ik})-\widetilde{\widetilde{\psi }}_{r}(t_{ij}))\) is bounded by a constant times *a*
_{
n
}. The latter fact is shown by checking six appropriate groups of terms in

For example, by direct calculation,

for some constant *c*>0 by the Lipschitz continuity of *H*. The other groups of terms are treated similarly. □

## Rights and permissions

## About this article

### Cite this article

de Uña-Álvarez, J., Veraverbeke, N. Generalized copula-graphic estimator.
*TEST* **22, **343–360 (2013). https://doi.org/10.1007/s11749-012-0314-2

Received:

Accepted:

Published:

Issue Date:

### Keywords

- Almost sure representation
- Archimedean copula
- Censored data
- Informative censoring
- Survival analysis

### Mathematics Subject Classification

- 62G05
- 62G20
- 62N02