Skip to main content
Log in

Sieve Estimation for the Cox Model with Clustered Interval-Censored Failure Time Data

  • Published:
Statistics in Biosciences Aims and scope Submit manuscript

Abstract

Clustered interval-censored failure time data occur when the failure times of interest are clustered into small groups and known only to lie in certain intervals. A number of methods have been proposed for regression analysis of clustered failure time data, but most of them apply only to clustered right-censored data. In this paper, a sieve estimation procedure is proposed for fitting a Cox frailty model to clustered interval-censored failure time data. In particular, a two-step algorithm for parameter estimation is developed and the asymptotic properties of the resulting sieve maximum likelihood estimators are established. The finite sample properties of the proposed estimators are investigated through a simulation study and the method is illustrated by the data arising from a lymphatic filariasis study.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  1. Bellamy SL, Li Y, Ryan LM, Lipsitz S, Canner MJ, Wright R (2004) Analysis of clustered and interval censored data from a community-based study in asthma. Stat Med 23:3607–3621

    Article  Google Scholar 

  2. Cai J, Prentice RL (1995) Estimating equations for hazard ratio parameters based on correlated failure time data. Biometrika 82:151–164

    Article  MATH  MathSciNet  Google Scholar 

  3. Cai J, Prentice RL (1997) Regression estimation using multivariate failure time data and a common baseline hazard function model. Lifetime Data Anal 3:197–213

    Article  MATH  Google Scholar 

  4. Cai J, Sen P, Zhou H (1999) A random effects model for multivariate failure time data from multicenter clinical trials. Biometrics 55:182–189

    Article  MATH  Google Scholar 

  5. Cai T, Wei L, Wilcox M (2000) Semi-parametric regression analysis for clustered failure time data. Biometrika 87:867–878

    Article  MATH  MathSciNet  Google Scholar 

  6. Chen K, Tong X (2010) Varying coefficient transformation models with censored data. Biometrika 97:969–976

    Article  MATH  MathSciNet  Google Scholar 

  7. Clayton DG, Cuzick J (1985) Multivariate generalizations of the proportional hazards model. J R Stat Soc 148:82–117

    MATH  MathSciNet  Google Scholar 

  8. Dudley RM (1984) A course on empirical processes, ècole d’ ètè de probabilitès de st flour. Lecture notes in mathematics, vol 1097. Springer, New York

    Google Scholar 

  9. Guo G, Rodriguez G (1992) Estimating a multivariate proportional hazards model for clustered data using the em algorithm, with an application to child survival in Guatemala. J Am Stat Assoc 97:969–976

    Article  Google Scholar 

  10. Hougaard P (2000) Analysis of multivariate survival data: statistics for biology and health. Springer, New York

    Book  Google Scholar 

  11. Huang J, Rossini A (1997) Sieve estimation for the proportional-odds failure-time regression model with interval censoring. J Am Stat Assoc 92:960–967

    Article  MATH  MathSciNet  Google Scholar 

  12. Oakes D (1989) Bivariate survival models induced by frailties. J Am Stat Assoc 84:487–493

    Article  MATH  MathSciNet  Google Scholar 

  13. Rossini A, Moore D (1999) Modeling clustered, discrete, or grouped time survival data with covariates. Biometrics 55:813–819

    Article  Google Scholar 

  14. Sun J (2006) The statistical analysis of interval censored failure time data. Springer, New York

    MATH  Google Scholar 

  15. van der Vaart A, Wellner J (1996) Weak convergence and empirical processes with applications to statistics. Springer, New York

    Book  MATH  Google Scholar 

  16. Williamson JM, Kim HY, Manatunga A, Addiss DG (2008) Modeling survival data with informative cluster size. Stat Med 27:543–555

    Article  MathSciNet  Google Scholar 

  17. Zeng D, Lin D, Yin GS (2005) Maximum likelihood estimation for the proportional odds model with random effects. J Am Stat Assoc 100:470–482

    Article  MATH  MathSciNet  Google Scholar 

  18. Zhang X, Sun J (2010) Regression analysis of clustered interval-censored data failure time data with informative cluster size. Comput Stat Data Anal 54:1817–1823

    Article  MATH  Google Scholar 

Download references

Acknowledgements

The authors wish to thank the Editor, Dr. Xinhong Lin, the Associate Editor and two referees for their time and many helpful comments and suggestions. This work was partially supported by a NSF grant and NIH grant 5 R01 CA152035 to the third author. The work was also partially supported by NSF China Zhongdian Project 11131002, NSFC (NO. 10971015) and the Fundamental Research Funds for the Central Universities to the second author.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Junlong Li.

Appendix A: Proofs of Theorems

Appendix A: Proofs of Theorems

In this appendix, we will use the same notation defined above and provide the sketch of the proofs of the three theorems given in Sect. 3. For this, we need the following regularity conditions.

  1. (C1)

    There exists a positive constant a such that P(V ij U ij a)=1.

  2. (C2)

    Define the parameter space \(\varTheta=\mathcal{B}\times[\eta_{l},\eta_{u}] \times\mathcal{H}\) with \(\mathcal{B}\) supposed to be a bounded open subset of R p, the true regression parameter β 0 an interior point of \(\mathcal{B}\), Λ 0(0)=0, Λ 0(τ)≤M for some large M, and \(\mathcal{H}\) is the collection of all bounded continuous and nondecreasing functions on [0,τ].

  3. (C3)

    \(\varLambda_{0}\in\mathcal{H}_{r} \subset\mathcal{H}\) for r=1 or 2, define

    where Λ (r) stands for the rth derivative function of Λ.

  4. (C4)

    The covariates Z ij ’s are uniformly bounded. If there exist a vector βR p and a constant c such that \(Z_{ij}^{T}\beta=c, \ \mbox{a.s.}\), then β=0 and c=0.

  5. (C5)

    The joint distribution of (U ij ,V ij ) does not depend on b i and the density function g(u,vz) of (U ij ,V ij ) conditional on Z ij =z has uniformly bounded partial derivatives with respect to u and v. In addition, g(u,vz)>0 over u,v∈[0,τ].

  6. (C6)

    The cluster sizes n i ’s are uniformly bounded with P(n i =1)>0 and P(n i ≥2)>0.

Proof of Theorem 1

Since \(\{\hat{\varLambda}_{n}(\cdot), n=1,2,\ldots\}\) is a sequence of bounded nondecreasing functions and \(\{(\hat{\beta}_{n},\hat{\eta}_{n}), n=1,2,\ldots\} \) is also a sequence of bounded vectors, it follows from Helly’s selection theorem that there exists a convergent subsequence of \(\hat{\theta}_{n}\). Let θ =(β ,η ,Λ ) denote the limit of this convergent subsequence and for simplicity, we assume that the subsequence is simply \(\hat{\theta}_{n}\). Then to prove the theorem, it is sufficient to show that β =β 0, η =η 0 and Λ =Λ 0.

Since \(\hat{\theta}_{n}\) is the maximum likelihood estimator, we obtain

$$l_n(\hat{\beta}_n,\hat{\eta}_n,\hat{\varLambda}_n)\geq l_n(\beta_0, \eta_0,\varLambda_{0n}), $$

where Λ 0n denotes the projection of the true cumulative hazard function \(\varLambda(t) \in\mathcal{H}\) into \(\mathcal{H}_{n}\). Note that as n→∞, Λ 0n (t)→Λ 0(t) for every t, since q is an increasing integer along with n at the rate O(n k) with 0<k<1/2. Letting n→∞, Glivenko–Cantelli theorem yields

$$E \bigl[l(\beta_*,\eta_*,\varLambda_*) \bigr]\geq E \bigl[l( \beta_0,\eta_0,\varLambda_0) \bigr]. $$

Using the Kullback–Leibler divergence, one obtains with probability one, l(θ ;D i )=l(θ 0;D i ) which indicates for any cluster i

Letting V ij =t and U ij =0. Then the above equality reduces to

Letting n i =1 or 2, one obtains that for any t∈[0,τ], with probability one,

$$ \biggl\{1+\frac{1}{\eta_*}\varLambda_*(t)\exp \bigl(Z_{ij}^{T} \beta_* \bigr) \biggr\}^{-\eta_*} = \biggl\{1+\frac{1}{\eta_0} \varLambda_0(t)\exp \bigl(Z_{ij}^{T} \beta_0 \bigr) \biggr\}^{-\eta_0} $$
(A.1)

and

$$ \biggl\{1+\frac{2}{\eta_*}\varLambda_*(t)\exp \bigl(Z_{ij}^{T} \beta_* \bigr) \biggr\}^{-\eta_*} = \biggl\{1+\frac{2}{\eta_0} \varLambda_0(t)\exp \bigl(Z_{ij}^{T} \beta_0 \bigr) \biggr\}^{-\eta_0}. $$
(A.2)

By differentiating the above both sides with respect to t and letting t→0, we have \(Z_{ij}^{T}(\beta_{*}-\beta_{0})=\log\{\varLambda_{0}(0)/\varLambda_{*}(0)\}\) and thus β =β 0 from (C4).

Now we prove that η =η 0 and Λ =Λ 0 for β 0≠0. Otherwise, without loss of generality, we assume that η 0>η . Again from (A.1), one obtains that

$$ \varLambda_*(t)=\eta_* \biggl[ \biggl\{1+\frac{1}{\eta_0} \varLambda_0(t)\exp \bigl(Z_{i1}^{T} \beta_0 \bigr) \biggr\}^{\eta_0/\eta_*}-1 \biggr]\exp \bigl(-Z_{i1}^{T}\beta_0 \bigr) . $$
(A.3)

Now we fix t>0, the left hand of the above equality is a constant. The above two equations yield the result that the equation g(x)=c has solutions \(\exp(Z_{i1}^{T}\beta_{0})\), where \(g(x)=\frac{1}{x}\{(1+xb)^{\eta_{0}/\eta_{*}}-1\}\), c=Λ (t)/η with b=Λ 0(t)/η 0>0. Note that the function g(x)={(1+bx)α−1}/x is a strictly increasing function for x>0 when α>1. This means that the equation g(x)=c has at most one solution. But when β 0≠0, condition (C4) yields the result that \(\exp(Z_{i1}^{T}\beta_{0})\) can take multiple values, which leads to a contradiction. This yields η 0/η =1. Again using (A.3), one obtains that Λ (t)=Λ 0(t).

Finally we show that η =η 0 and Λ =Λ 0 when β 0=0. Otherwise, without loss of generality, we assume that η 0>η . When β 0=0, (A.3) reduces to

$$\varLambda_*(t)=\eta_* \biggl[ \biggl\{1+\frac{1}{\eta_0} \varLambda_0(t) \biggr\}^{\eta_0/\eta_*}-1 \biggr] $$

and (A.2) reduces to

$$\varLambda_*(t)=\frac{\eta_*}{2} \biggl[ \biggl\{ \biggl(1+\frac{2}{ \eta_0}\varLambda_0(t) \biggr) \biggr\}^{\eta_0/\eta_*}-1 \biggr], $$

with fixed t>0, the two equations above yield the result that the equation g(x)=c has two solutions: 1 and 2, where \(g(x)=\{(1+xb)^{\eta_{0}/\eta_{*}}-1\}/x\). Similarly this contradicts the monotonicity property of the function g. Therefore, η =η 0 and Λ =Λ 0. These prove Theorem 1. □

Proof of Theorem 2

Let G(u,v) denote the joint distribution of (U ij ,V ij ) as before and θ 0n =(β 0,η 0,Λ 0n ) be the projection of the true parameter θ 0Θ into Θ n . Also let ϵ be a small and fixed positive number. Define a function class

$$\varPsi(\epsilon)= \bigl\{l(d; \theta): \rho(\theta,\theta_{0n})\leq \epsilon, \theta\in\varTheta_n \bigr\} , $$

where d is the realization of D. It is easy to see that the log-likelihood function has closed form although it is complicated and it satisfies the Lipschitz conditions with respect to its arguments from conditions (C1), (C5), (C6) and the definition of Θ n . Then, for all 0<ξ<ϵ,

$$ \log N_{[]}(\xi,\varPsi,\rho)\leq A_1\epsilon/ \xi+A_1 q_n\log(\epsilon/\xi), $$
(A.4)

where N [](ξ,Ψ,ρ) is the bracketing number under the norm ρ and A 1 is a constant not depending on n; see [15]. Let \(J_{[]}(\epsilon,\varPsi,\rho)=\int_{0}^{\epsilon}\{1+ \log N_{[]}(\xi,\varGamma,\rho)\}^{1/2}\,d\xi\) be the integral entropy. It follows from (A.4) that

$$ J_{[]}(\epsilon,\varPsi,\rho)\leq A_2 q_n^{1/2} \epsilon, $$
(A.5)

where A 2>0 is a constant depending only on A 1. Then, (A.5) and Lemma 3.4.2 in [15, p. 324] implies that for sufficiently large n,

where A 3 and A 4 are all constants not depending on n. One can verify that the conditions of Lemma 3.4.1 in [15, p. 322] are all satisfied. It then follows that \(\rho(\hat{\theta}_{n},\theta_{0n})=O_{p}\{(n/q_{n})^{-1/2}\}=O_{p}(n^{-(1-k)/2})\).

Since \(\varLambda_{0n}\in\mathcal{H}_{n}\), condition (C3) yields

$$\int_0^\tau\big|\varLambda_0(t)- \varLambda_{0n}(t)\big|^2\,dt=O \bigl(n^{-2rk} \bigr) \!\quad \mbox{and}\!\quad E \bigl\{l_{n}(\theta_0;D)-l_{n}( \theta_{0n};D) \bigr\}=O \bigl(n^{-2rk} \bigr), $$

which, together with (C5), immediately yields ρ(θ 0,θ 0n )=O p (n rk). By triangle inequality, one obtains that \(\rho(\hat{\theta}_{n},\theta_{0})=O_{p}(n^{-rk}+n^{-(1-k)/2})\). The proof is complete. □

Proof of Theorem 3

We show this theorem by verifying the four conditions in Theorem 3.3.1 of [15]. To this end, we first define the random mappings Ψ and Ψ n . Denote the log-likelihood function of θ as l(θ;D). For i=1,…,n, we have

where \(W_{i}=\sum_{j=1}^{n_{i}}\varLambda(U_{ij})\exp(Z_{ij}^{T}\beta)\). For \(g_{\varLambda}\in\mathcal{H}_{r}\), let \(g_{n\varLambda}\in\mathcal{H}_{n}\) be the projection of g Λ . For θ=(β,η,Λ)∈Θ, denote its projection by θ n =(β,η,Λ n ), where θ n Θ n , then ρ(θ,θ n )=O p (n rk).

Define the random maps Ψ as follows: for any function \(g_{\varLambda}\in\mathcal{H}_{r}\), g β R p with ∥g β ∥≤1 and a scalar g η with |g η |≤1,

To be specific, Ψ(θ)[g β ,g η ,g Λ ] is the score function along the path (β+ag β ,η+ag η ,Λ+ag Λ ). For i=1,…,n, define random maps

$$\varPsi_n(\theta)[g_\beta,g_\eta,g_\varLambda] = \mathcal{P}_n \varPsi(\theta_n)[g_\beta,g_\eta,g_{n\varLambda}], $$

where \(\mathcal{P}_{n}\) is the empirical measure of the first n observations.

First, we prove that the first condition holds, which is

(A.6)

for k<0.5 and rk>0.25. By arguments analogous to those in Theorem 1, one can show

(A.7)

for k<0.5 and r>0.5. It follows from the Taylor expansion that

(A.8)

under the assumptions rk>0.25 and k<0.5. Observe that

(A.9)

The second equality of (A.9) follows from the uniform asymptotic equicontinuity of empirical processes indexed by Donsker class of functions [8]. Then (A.6) follows from (A.7)–(A.9) and thus the first condition is satisfied.

Now we verify the second condition. Applying Theorem 2.10.6 of [15, p. 192], one can show that the function class

$$\bigl\{ \varPsi(\theta)[g_\beta,g_\eta,g_\varLambda]: \|g_\beta\|\leq1, |g_\eta|\leq1, \mbox{$g_{\varLambda}\in \mathcal{H}_{r}$, $\rho(\theta,\theta_{0})<\xi$} \bigr\} $$

is P-Donsker for some small ξ>0. Therefore,

$$n^{1/2} \bigl\{\mathcal{P}_n\varPsi(\theta_0)[g_\beta,g_\eta,g_\varLambda] -E\varPsi (\theta_0)[g_\beta,g_\eta,g_\varLambda] \bigr\}\to N \bigl(0,\sigma^{-2} \bigr) $$

in distribution with

$$ \sigma^2= 1/E \bigl[ \bigl\{\varPsi (\theta_0)[g_\beta,g_\eta,g_\varLambda] \bigr\}^2 \bigr], $$
(A.10)

which implies that the second condition holds.

To prove the third condition that \(E\dot{\varPsi}_{\theta_{0}}\) is continuously invertible, following the proof of Theorem 2 in [17], it is enough to prove that, if Ψ(θ 0)[g β ,g η ,g Λ ]=0 almost surely, then g β =0,g η =0,g Λ =0. For fixed t∈[0,τ], by letting n i =1, U i1→0, V i1t, after some calculation one obtains that, with probability one,

Again letting n i =2,U i1→0,U i2→0,V i1t,V i2t and Z i1=Z i2, after some calculation one obtains that, with probability one,

Combining the above two equations, one obtains that g η =0 and \(Z_{i1}^{T}g_{\beta}+ g_{\varLambda}(t)=0\), which, together with condition (C4), yield g β =0, g η =0 and g Λ =0.

Observe that \(E\{\varPsi(\theta_{0})[g_{\beta},g_{\eta},g_{\varLambda}]\}= \varPsi_{n}(\hat{\theta}_{n})[g_{\beta},g_{\eta},g_{n\varLambda}]=0\), which implies that the forth condition is satisfied. The proof is complete. □

The proof of the consistent estimation of σ 2 follows similarly from the proof of Theorem 3 in [17]. This completes the proof.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, J., Tong, X. & Sun, J. Sieve Estimation for the Cox Model with Clustered Interval-Censored Failure Time Data. Stat Biosci 6, 55–72 (2014). https://doi.org/10.1007/s12561-012-9078-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12561-012-9078-1

Keywords

Navigation