Abstract
Clustered interval-censored failure time data occur when the failure times of interest are clustered into small groups and known only to lie in certain intervals. A number of methods have been proposed for regression analysis of clustered failure time data, but most of them apply only to clustered right-censored data. In this paper, a sieve estimation procedure is proposed for fitting a Cox frailty model to clustered interval-censored failure time data. In particular, a two-step algorithm for parameter estimation is developed and the asymptotic properties of the resulting sieve maximum likelihood estimators are established. The finite sample properties of the proposed estimators are investigated through a simulation study and the method is illustrated by the data arising from a lymphatic filariasis study.
Similar content being viewed by others
References
Bellamy SL, Li Y, Ryan LM, Lipsitz S, Canner MJ, Wright R (2004) Analysis of clustered and interval censored data from a community-based study in asthma. Stat Med 23:3607–3621
Cai J, Prentice RL (1995) Estimating equations for hazard ratio parameters based on correlated failure time data. Biometrika 82:151–164
Cai J, Prentice RL (1997) Regression estimation using multivariate failure time data and a common baseline hazard function model. Lifetime Data Anal 3:197–213
Cai J, Sen P, Zhou H (1999) A random effects model for multivariate failure time data from multicenter clinical trials. Biometrics 55:182–189
Cai T, Wei L, Wilcox M (2000) Semi-parametric regression analysis for clustered failure time data. Biometrika 87:867–878
Chen K, Tong X (2010) Varying coefficient transformation models with censored data. Biometrika 97:969–976
Clayton DG, Cuzick J (1985) Multivariate generalizations of the proportional hazards model. J R Stat Soc 148:82–117
Dudley RM (1984) A course on empirical processes, ècole d’ ètè de probabilitès de st flour. Lecture notes in mathematics, vol 1097. Springer, New York
Guo G, Rodriguez G (1992) Estimating a multivariate proportional hazards model for clustered data using the em algorithm, with an application to child survival in Guatemala. J Am Stat Assoc 97:969–976
Hougaard P (2000) Analysis of multivariate survival data: statistics for biology and health. Springer, New York
Huang J, Rossini A (1997) Sieve estimation for the proportional-odds failure-time regression model with interval censoring. J Am Stat Assoc 92:960–967
Oakes D (1989) Bivariate survival models induced by frailties. J Am Stat Assoc 84:487–493
Rossini A, Moore D (1999) Modeling clustered, discrete, or grouped time survival data with covariates. Biometrics 55:813–819
Sun J (2006) The statistical analysis of interval censored failure time data. Springer, New York
van der Vaart A, Wellner J (1996) Weak convergence and empirical processes with applications to statistics. Springer, New York
Williamson JM, Kim HY, Manatunga A, Addiss DG (2008) Modeling survival data with informative cluster size. Stat Med 27:543–555
Zeng D, Lin D, Yin GS (2005) Maximum likelihood estimation for the proportional odds model with random effects. J Am Stat Assoc 100:470–482
Zhang X, Sun J (2010) Regression analysis of clustered interval-censored data failure time data with informative cluster size. Comput Stat Data Anal 54:1817–1823
Acknowledgements
The authors wish to thank the Editor, Dr. Xinhong Lin, the Associate Editor and two referees for their time and many helpful comments and suggestions. This work was partially supported by a NSF grant and NIH grant 5 R01 CA152035 to the third author. The work was also partially supported by NSF China Zhongdian Project 11131002, NSFC (NO. 10971015) and the Fundamental Research Funds for the Central Universities to the second author.
Author information
Authors and Affiliations
Corresponding author
Appendix A: Proofs of Theorems
Appendix A: Proofs of Theorems
In this appendix, we will use the same notation defined above and provide the sketch of the proofs of the three theorems given in Sect. 3. For this, we need the following regularity conditions.
-
(C1)
There exists a positive constant a such that P(V ij −U ij ≥a)=1.
-
(C2)
Define the parameter space \(\varTheta=\mathcal{B}\times[\eta_{l},\eta_{u}] \times\mathcal{H}\) with \(\mathcal{B}\) supposed to be a bounded open subset of R p, the true regression parameter β 0 an interior point of \(\mathcal{B}\), Λ 0(0)=0, Λ 0(τ)≤M for some large M, and \(\mathcal{H}\) is the collection of all bounded continuous and nondecreasing functions on [0,τ].
-
(C3)
\(\varLambda_{0}\in\mathcal{H}_{r} \subset\mathcal{H}\) for r=1 or 2, define
where Λ (r) stands for the rth derivative function of Λ.
-
(C4)
The covariates Z ij ’s are uniformly bounded. If there exist a vector β∈R p and a constant c such that \(Z_{ij}^{T}\beta=c, \ \mbox{a.s.}\), then β=0 and c=0.
-
(C5)
The joint distribution of (U ij ,V ij ) does not depend on b i and the density function g(u,v∣z) of (U ij ,V ij ) conditional on Z ij =z has uniformly bounded partial derivatives with respect to u and v. In addition, g(u,v∣z)>0 over u,v∈[0,τ].
-
(C6)
The cluster sizes n i ’s are uniformly bounded with P(n i =1)>0 and P(n i ≥2)>0.
Proof of Theorem 1
Since \(\{\hat{\varLambda}_{n}(\cdot), n=1,2,\ldots\}\) is a sequence of bounded nondecreasing functions and \(\{(\hat{\beta}_{n},\hat{\eta}_{n}), n=1,2,\ldots\} \) is also a sequence of bounded vectors, it follows from Helly’s selection theorem that there exists a convergent subsequence of \(\hat{\theta}_{n}\). Let θ ∗=(β ∗,η ∗,Λ ∗) denote the limit of this convergent subsequence and for simplicity, we assume that the subsequence is simply \(\hat{\theta}_{n}\). Then to prove the theorem, it is sufficient to show that β ∗=β 0, η ∗=η 0 and Λ ∗=Λ 0.
Since \(\hat{\theta}_{n}\) is the maximum likelihood estimator, we obtain
where Λ 0n denotes the projection of the true cumulative hazard function \(\varLambda(t) \in\mathcal{H}\) into \(\mathcal{H}_{n}\). Note that as n→∞, Λ 0n (t)→Λ 0(t) for every t, since q is an increasing integer along with n at the rate O(n k) with 0<k<1/2. Letting n→∞, Glivenko–Cantelli theorem yields
Using the Kullback–Leibler divergence, one obtains with probability one, l(θ ∗;D i )=l(θ 0;D i ) which indicates for any cluster i
Letting V ij =t and U ij =0. Then the above equality reduces to
Letting n i =1 or 2, one obtains that for any t∈[0,τ], with probability one,
and
By differentiating the above both sides with respect to t and letting t→0, we have \(Z_{ij}^{T}(\beta_{*}-\beta_{0})=\log\{\varLambda_{0}(0)/\varLambda_{*}(0)\}\) and thus β ∗=β 0 from (C4).
Now we prove that η ∗=η 0 and Λ ∗=Λ 0 for β 0≠0. Otherwise, without loss of generality, we assume that η 0>η ∗. Again from (A.1), one obtains that
Now we fix t>0, the left hand of the above equality is a constant. The above two equations yield the result that the equation g(x)=c has solutions \(\exp(Z_{i1}^{T}\beta_{0})\), where \(g(x)=\frac{1}{x}\{(1+xb)^{\eta_{0}/\eta_{*}}-1\}\), c=Λ ∗(t)/η ∗ with b=Λ 0(t)/η 0>0. Note that the function g(x)={(1+bx)α−1}/x is a strictly increasing function for x>0 when α>1. This means that the equation g(x)=c has at most one solution. But when β 0≠0, condition (C4) yields the result that \(\exp(Z_{i1}^{T}\beta_{0})\) can take multiple values, which leads to a contradiction. This yields η 0/η ∗=1. Again using (A.3), one obtains that Λ ∗(t)=Λ 0(t).
Finally we show that η ∗=η 0 and Λ ∗=Λ 0 when β 0=0. Otherwise, without loss of generality, we assume that η 0>η ∗. When β 0=0, (A.3) reduces to
and (A.2) reduces to
with fixed t>0, the two equations above yield the result that the equation g(x)=c has two solutions: 1 and 2, where \(g(x)=\{(1+xb)^{\eta_{0}/\eta_{*}}-1\}/x\). Similarly this contradicts the monotonicity property of the function g. Therefore, η ∗=η 0 and Λ ∗=Λ 0. These prove Theorem 1. □
Proof of Theorem 2
Let G(u,v) denote the joint distribution of (U ij ,V ij ) as before and θ 0n =(β 0,η 0,Λ 0n ) be the projection of the true parameter θ 0∈Θ into Θ n . Also let ϵ be a small and fixed positive number. Define a function class
where d is the realization of D. It is easy to see that the log-likelihood function has closed form although it is complicated and it satisfies the Lipschitz conditions with respect to its arguments from conditions (C1), (C5), (C6) and the definition of Θ n . Then, for all 0<ξ<ϵ,
where N [](ξ,Ψ,ρ) is the bracketing number under the norm ρ and A 1 is a constant not depending on n; see [15]. Let \(J_{[]}(\epsilon,\varPsi,\rho)=\int_{0}^{\epsilon}\{1+ \log N_{[]}(\xi,\varGamma,\rho)\}^{1/2}\,d\xi\) be the integral entropy. It follows from (A.4) that
where A 2>0 is a constant depending only on A 1. Then, (A.5) and Lemma 3.4.2 in [15, p. 324] implies that for sufficiently large n,
where A 3 and A 4 are all constants not depending on n. One can verify that the conditions of Lemma 3.4.1 in [15, p. 322] are all satisfied. It then follows that \(\rho(\hat{\theta}_{n},\theta_{0n})=O_{p}\{(n/q_{n})^{-1/2}\}=O_{p}(n^{-(1-k)/2})\).
Since \(\varLambda_{0n}\in\mathcal{H}_{n}\), condition (C3) yields
which, together with (C5), immediately yields ρ(θ 0,θ 0n )=O p (n −rk). By triangle inequality, one obtains that \(\rho(\hat{\theta}_{n},\theta_{0})=O_{p}(n^{-rk}+n^{-(1-k)/2})\). The proof is complete. □
Proof of Theorem 3
We show this theorem by verifying the four conditions in Theorem 3.3.1 of [15]. To this end, we first define the random mappings Ψ and Ψ n . Denote the log-likelihood function of θ as l(θ;D). For i=1,…,n, we have
where \(W_{i}=\sum_{j=1}^{n_{i}}\varLambda(U_{ij})\exp(Z_{ij}^{T}\beta)\). For \(g_{\varLambda}\in\mathcal{H}_{r}\), let \(g_{n\varLambda}\in\mathcal{H}_{n}\) be the projection of g Λ . For θ=(β,η,Λ)∈Θ, denote its projection by θ n =(β,η,Λ n ), where θ n ∈Θ n , then ρ(θ,θ n )=O p (n −rk).
Define the random maps Ψ as follows: for any function \(g_{\varLambda}\in\mathcal{H}_{r}\), g β ∈R p with ∥g β ∥≤1 and a scalar g η with |g η |≤1,
To be specific, Ψ(θ)[g β ,g η ,g Λ ] is the score function along the path (β+ag β ,η+ag η ,Λ+ag Λ ). For i=1,…,n, define random maps
where \(\mathcal{P}_{n}\) is the empirical measure of the first n observations.
First, we prove that the first condition holds, which is
for k<0.5 and rk>0.25. By arguments analogous to those in Theorem 1, one can show
for k<0.5 and r>0.5. It follows from the Taylor expansion that
under the assumptions rk>0.25 and k<0.5. Observe that
The second equality of (A.9) follows from the uniform asymptotic equicontinuity of empirical processes indexed by Donsker class of functions [8]. Then (A.6) follows from (A.7)–(A.9) and thus the first condition is satisfied.
Now we verify the second condition. Applying Theorem 2.10.6 of [15, p. 192], one can show that the function class
is P-Donsker for some small ξ>0. Therefore,
in distribution with
which implies that the second condition holds.
To prove the third condition that \(E\dot{\varPsi}_{\theta_{0}}\) is continuously invertible, following the proof of Theorem 2 in [17], it is enough to prove that, if Ψ(θ 0)[g β ,g η ,g Λ ]=0 almost surely, then g β =0,g η =0,g Λ =0. For fixed t∈[0,τ], by letting n i =1, U i1→0, V i1→t, after some calculation one obtains that, with probability one,
Again letting n i =2,U i1→0,U i2→0,V i1→t,V i2→t and Z i1=Z i2, after some calculation one obtains that, with probability one,
Combining the above two equations, one obtains that g η =0 and \(Z_{i1}^{T}g_{\beta}+ g_{\varLambda}(t)=0\), which, together with condition (C4), yield g β =0, g η =0 and g Λ =0.
Observe that \(E\{\varPsi(\theta_{0})[g_{\beta},g_{\eta},g_{\varLambda}]\}= \varPsi_{n}(\hat{\theta}_{n})[g_{\beta},g_{\eta},g_{n\varLambda}]=0\), which implies that the forth condition is satisfied. The proof is complete. □
The proof of the consistent estimation of σ 2 follows similarly from the proof of Theorem 3 in [17]. This completes the proof.
Rights and permissions
About this article
Cite this article
Li, J., Tong, X. & Sun, J. Sieve Estimation for the Cox Model with Clustered Interval-Censored Failure Time Data. Stat Biosci 6, 55–72 (2014). https://doi.org/10.1007/s12561-012-9078-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12561-012-9078-1