Generalized copula-graphic estimator

Abstract

In this paper, a copula-graphic estimator is proposed for censored survival data. It is assumed that there is some dependent censoring acting on the variable of interest that may come from an existing competing risk. Furthermore, the full process is independently censored by some administrative censoring time. The dependent censoring is modeled through an Archimedean copula function, which is supposed to be known. An asymptotic representation of the estimator as a sum of independent and identically distributed random variables is obtained, and, consequently, a central limit theorem is established. We investigate the finite sample performance of the estimator through simulations. A real data illustration is included.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3

References

  1. Fleming TR, Harrington DP (1991) Counting processes and survival analysis. Wiley, New York

    Google Scholar 

  2. Földes A, Rejtő L (1981) A LIL type result for the product limit estimator. Z Wahrscheinlichkeitstheor Verw Geb 56:75–86

    MATH  Article  Google Scholar 

  3. Kalbfleisch JD, Prentice RL (1980) The statistical analysis of failure time data. Wiley, New York

    Google Scholar 

  4. Kaplan EL, Meier P (1958) Nonparametric estimation from incomplete observations. J Am Stat Assoc 53:457–481

    MathSciNet  MATH  Article  Google Scholar 

  5. Lakhal L, Rivest LP, Abdous B (2008) Estimating survival and association in a semicompeting risks model. Biometrics 64:180–188

    MathSciNet  MATH  Article  Google Scholar 

  6. Lo SH, Singh K (1986) The product-limit estimator and the bootstrap: some asymptotic representations. Probab Theory Relat Fields 71:455–456

    MathSciNet  MATH  Article  Google Scholar 

  7. Major P, Rejtő L (1988) Strong embedding of the estimator of the distribution function under random censorship. Ann Stat 16:1113–1132

    MATH  Article  Google Scholar 

  8. Nelsen RB (2006) An introduction to copulas. Springer, New York

    Google Scholar 

  9. Rivest LP, Wells MT (2001) A martingale approach to the copula-graphic estimator for the survival function under dependent censoring. J Multivar Anal 79:138–155

    MathSciNet  MATH  Article  Google Scholar 

  10. Said M, Ghazzali N, Rivest LP (2009) Score tests for independence in semiparametric competing risks models. Lifetime Data Anal 15:413–444

    MathSciNet  Article  Google Scholar 

  11. Sánchez-Sellero C, González-Manteiga W, Van Keilegom I (2005) Uniform representation of product-limit integrals with applications. Scand J Stat 32:563–581

    Article  Google Scholar 

  12. Schäfer H (1986) Local convergence of empirical measures in the random censorship situation with application to density and rate estimators. Ann Stat 14:1240–1245

    MATH  Article  Google Scholar 

  13. Stute W (1993) Consistent estimation under random censorship when covariables are present. J Multivar Anal 45:89–103

    MathSciNet  MATH  Article  Google Scholar 

  14. Stute W (1995) The central limit theorem under random censorship. Ann Stat 23:422–439

    MathSciNet  MATH  Article  Google Scholar 

  15. Stute W (1996) Distributional convergence under random censorship when covariables are present. Scand J Stat 23:461–471

    MathSciNet  MATH  Google Scholar 

  16. Tsiatis A (1975) A nonidentifiability aspect of the problem of competing risks. Proc Natl Acad Sci 72:20–22

    MathSciNet  MATH  Article  Google Scholar 

  17. Van Keilegom I, Veraverbeke N (1997) Estimation and bootstrap with censored data in fixed design nonparametric regression. Ann Inst Stat Math 49:467–491

    MATH  Article  Google Scholar 

  18. Zheng M, Klein JP (1995) Estimates of marginal survival for dependent competing risks based on an assumed copula. Biometrika 82:127–138

    MathSciNet  MATH  Article  Google Scholar 

Download references

Acknowledgements

Work supported by the Grant MTM2008-03129 of the Spanish Ministry of Science and Innovation. The first author acknowledges support from the projects MTM2011-23204 of the Spanish Ministry of Science and Innovation (FEDER support included) and 10PXIB300068PR of the Xunta de Galicia. The second author also acknowledges the IAP Research Network P6/03 of the Belgian State (Belgian Science Policy). Noël Veraverbeke is extraordinary professor at the North-West University, Potchefstroom, South Africa.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Jacobo de Uña-Álvarez.

Appendix 1: Technical lemmas

Appendix 1: Technical lemmas

In this section, we give the technical lemmas needed in the proof to Theorem 1.

Lemma 1

Under the conditions in Theorem 1, we have

$$ \sup_{0\leq t\leq T}\bigl \vert R_{n1}(t)\bigr \vert =O \bigl(n^{-1}\log \log n\bigr)\quad \text{\textit{a.s.}} $$

Proof

It is easy to see that, with probability 1,

The first term is O(n −1/2(loglogn)1/2) a.s., cf. Földes and Rejtő (1981). The same order bound is proved to hold for the second term in Lemma 4 below, and the proof is complete. □

Lemma 2

Under the conditions in Theorem 1, we have

$$ \sup_{0\leq t\leq T}\bigl \vert R_{n2}(t)\bigr \vert =O \bigl(n^{-1}\log \log n\bigr)\quad \text{\textit{a.s.}} $$

Proof

With probability 1 we have

Then, the assertion of the lemma follows from a result of Földes and Rejtő (1981). □

Lemma 3

Under the conditions in Theorem 1, we have

$$ \sup_{0\leq t\leq T}\bigl \vert R_{n3}(t)\bigr \vert =O \bigl(n^{-3/4}(\log n)^{3/4}\bigr)\quad \text{\textit{a.s.}} $$

Proof

Divide [0,T] into k n =O(n 1/2(logn)1/2) subintervals [t i ,t i+1] of length O(n −1/2(logn)1/2). Then, as in the proof of Lo and Singh (1986), we have

For I, we have by Taylor expansion and the fact that \(\sup_{0\leq t\leq T}\vert \overline{H}_{n}(t)-\overline{H}(t)\vert =O(n^{-1/2}(\log \log n)^{1/2})\) a.s. (Földes and Rejtő 1981):

Now further subdivide each interval [t i ,t i+1] into a n =O(n 1/4(logn)−1/4) subintervals of length O(n −3/4(logn)3/4). By using Bernstein’s inequality we can show that this term is bounded a.s. by

$$ c\max_{1\leq i\leq k_{n}}\max_{0\leq j\leq a_{n}-1}\bigl \vert H_{n}(t_{i,j+1})-H(t_{i,j+1})-H_{n}(t_{i})+H(t_{i}) \bigr \vert +O\bigl(n^{-3/4}(\log n)^{3/4}\bigr) $$

for some constant c>0. By the modulus of continuity result for the Kaplan–Meier estimator (see Schäfer 1986) we obtain that I=O(n −3/4(logn)3/4) a.s. The II term is treated similarly and leads to the same order. It requires the almost sure behavior of the modulus of continuity of the \(H_{n}^{1}\) estimator, and this follows from Lemma 5 below. In that Lemma take a n =n −1/2(logn)1/2. □

Lemmas 4 and 5 below are needed for the proofs of Lemmas 1 and 3, respectively. They have some independent interest since they provide the almost sure rate of convergence and the almost sure behavior of the modulus of continuity for the estimator of the cumulative incidence function of Z subject to δ=1 (\(H_{n}^{1}\)).

Lemma 4

For \(T<\min (T_{F},T_{G},T_{\widetilde{G}})\), we have

$$ \sup_{0\leq t\leq T}\bigl \vert H_{n}^{1}(t)-H^{1}(t) \bigr \vert =O\bigl(n^{-1/2}(\log \log n)^{1/2}\bigr)\quad \text{\textit{a.s.}} $$

Proof

Define the following empirical estimators for the distribution function \(\widetilde{H}(t)=P(U\leq t)\) and for the subdistribution functions \(\widetilde{H}^{0}(t)=P(U\leq t,\rho =0)\) and \(\widetilde{H}^{11}(t)=P(U\leq t,\rho =1,\delta =1)\):

Then, H 1(t) can be expressed in terms of \(\widetilde{H}\), \(\widetilde{H}^{0}\), and \(\widetilde{H}^{11}\), and \(H_{n}^{1}(t)\) can be expressed in terms of the corresponding empiricals. Similar as in Stute (1995), we obtain

$$ H_{n}^{1}(t)=\int_{0}^{t}\exp \biggl\{ n\int_{0}^{u}\log \biggl(1+ \frac{1}{n(1-\widetilde{H}_{n}(z))}\biggr)\,d\widetilde{H}_{n}^{0}(z) \biggr\} \,d \widetilde{H}_{n}^{11}(u) $$

and

$$ H^{1}(t)=\int_{0}^{t}\exp \biggl\{ \int _{0}^{u}\frac{d\widetilde{H}^{0}(z)}{1-\widetilde{H}(z)} \biggr\} \,d \widetilde{H}^{11}(u). $$

It follows that \(\sup_{0\leq t\leq T}\vert H_{n}^{1}(t)-H^{1}(t)\vert \) is smaller than

(4)

The second term in (4) is O(n −1/2(loglogn)1/2) a.s. For the first term in (4), we use (with obvious abbreviations) that

with θ between 0 and ab. Note that exp(b) is uniformly bounded in [0,T]. Looking at (ab), we have

(5)

The second term in (5) is O(n −1/2(loglogn)1/2) a.s. For the first term in (5), we use that, for x≥0,

$$ x-\frac{1}{2}x^{2}\leq \log (1+x)\leq x. $$

It follows that the first term in (5) is bounded above by

$$ \sup_{0\leq z\leq T}\biggl \vert \frac{1}{1-\widetilde{H}_{n}(z)}-\frac{1}{1-\widetilde{H}(z)}\biggr \vert +\frac{1}{2n}\sup_{0\leq z\leq T}\frac{1}{(1-\widetilde{H}_{n}(z))^{2}}. $$

This is O(n −1/2(loglogn)1/2) a.s. since \(\sup_{0\leq z\leq T}\vert \widetilde{H}_{n}(z)-\widetilde{H}(z)\vert \) has the same order and since \(\widetilde{H}(T)<1\). □

Lemma 5

Suppose that \(T<\min (T_{F},T_{G},T_{\widetilde{G}})\). Suppose that H(t)=P(Zt) and H 1(t)=P(Zt,δ=1) have bounded first derivatives in [0,T]. Let {a n } be a sequence of positive constants tending to zero with a n n(logn)−5>Δ>0 for all n sufficiently large. Then

$$ \sup_{0\leq t,s\leq T,\vert t-s\vert \leq a_{n}}\bigl \vert H_{n}^{1}(t)-H_{n}^{1}(s)-H^{1}(t)+H^{1}(s) \bigr \vert =O\bigl(a_{n}^{1/2}n^{-1/2}(\log n)^{1/2}\bigr)\quad \text{\textit{a.s.}} $$

Proof

We make the same partition of the interval [0,T] as in Lemma A.5 of Van Keilegom and Veraverbeke (1997). Exploiting the monotonicity of H 1(t) and \(H_{n}^{1}(t)\) and also the Lipschitz continuity of H 1(t), we obtain that it suffices to prove that

where {t ij }, i=1,…,m, j=−b n ,…,b n is a grid of points with \(m= [ \frac{T}{a_{n}} ] \) ([⋅] denoting the integer part) and \(b_{n}\sim a_{n}^{1/2}n^{1/2}(\log n)^{-1/2}\). At this point, we use the almost sure asymptotic representation for \(H_{n}^{1}(t)\) as it can be derived as a special case of the more general result of Sánchez-Sellero et al. (2005):

$$ H_{n}^{1}(t)-H^{1}(t)=\frac{1}{n}\sum _{i=1}^{n}\widetilde{\widetilde{\psi }}_{i}(t)+R_{n}(t), $$

where

with

the function \(\widetilde{C}(t)\) being that in the remark of Sect. 2; and sup0≤tT |R n (t)|=O(n −1(logn)3) a.s. It follows that it suffices to show that

$$ \max_{1\leq i\leq m-1}\max_{-b_{n}<j,k<b_{n}}\Biggl \vert \frac{1}{n}\sum_{r=1}^{n}\bigl(\widetilde{\widetilde{ \psi }}_{r}(t_{ik})-\widetilde{\widetilde{\psi }}_{r}(t_{ij})\bigr)\Biggr \vert =O\bigl(a_{n}^{1/2}n^{-1/2}( \log n)^{1/2}\bigr). $$

To achieve this, we use Bernstein’s inequality as in Van Keilegom and Veraverbeke (1997). The random variables \(\widetilde{\widetilde{\psi }}_{r}(t_{ik})-\widetilde{\widetilde{\psi }}_{r}(t_{ij})\) are bounded, and \(\operatorname {Var}(\widetilde{\widetilde{\psi }}_{r}(t_{ik})-\widetilde{\widetilde{\psi }}_{r}(t_{ij}))\) is bounded by a constant times a n . The latter fact is shown by checking six appropriate groups of terms in

For example, by direct calculation,

for some constant c>0 by the Lipschitz continuity of H. The other groups of terms are treated similarly. □

Rights and permissions

Reprints and Permissions

About this article

Cite this article

de Uña-Álvarez, J., Veraverbeke, N. Generalized copula-graphic estimator. TEST 22, 343–360 (2013). https://doi.org/10.1007/s11749-012-0314-2

Download citation

Keywords

  • Almost sure representation
  • Archimedean copula
  • Censored data
  • Informative censoring
  • Survival analysis

Mathematics Subject Classification

  • 62G05
  • 62G20
  • 62N02