Sieve Estimation for the Cox Model with Clustered Interval-Censored Failure Time Data

Li, Junlong; Tong, Xingwei; Sun, Jianguo

doi:10.1007/s12561-012-9078-1

Sieve Estimation for the Cox Model with Clustered Interval-Censored Failure Time Data

Published: 18 December 2012

Volume 6, pages 55–72, (2014)
Cite this article

Statistics in Biosciences Aims and scope Submit manuscript

Junlong Li¹,
Xingwei Tong² &
Jianguo Sun³

316 Accesses
3 Citations
Explore all metrics

Abstract

Clustered interval-censored failure time data occur when the failure times of interest are clustered into small groups and known only to lie in certain intervals. A number of methods have been proposed for regression analysis of clustered failure time data, but most of them apply only to clustered right-censored data. In this paper, a sieve estimation procedure is proposed for fitting a Cox frailty model to clustered interval-censored failure time data. In particular, a two-step algorithm for parameter estimation is developed and the asymptotic properties of the resulting sieve maximum likelihood estimators are established. The finite sample properties of the proposed estimators are investigated through a simulation study and the method is illustrated by the data arising from a lymphatic filariasis study.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A new method for regression analysis of interval-censored data with the additive hazards model

Article 26 February 2020

Regression analysis of doubly censored failure time data with ancillary information

Article 20 April 2024

Regression Analysis of Interval-Censored Data with Informative Observation Times Under the Accelerated Failure Time Model

Article 08 August 2021

References

Bellamy SL, Li Y, Ryan LM, Lipsitz S, Canner MJ, Wright R (2004) Analysis of clustered and interval censored data from a community-based study in asthma. Stat Med 23:3607–3621
Article Google Scholar
Cai J, Prentice RL (1995) Estimating equations for hazard ratio parameters based on correlated failure time data. Biometrika 82:151–164
Article MATH MathSciNet Google Scholar
Cai J, Prentice RL (1997) Regression estimation using multivariate failure time data and a common baseline hazard function model. Lifetime Data Anal 3:197–213
Article MATH Google Scholar
Cai J, Sen P, Zhou H (1999) A random effects model for multivariate failure time data from multicenter clinical trials. Biometrics 55:182–189
Article MATH Google Scholar
Cai T, Wei L, Wilcox M (2000) Semi-parametric regression analysis for clustered failure time data. Biometrika 87:867–878
Article MATH MathSciNet Google Scholar
Chen K, Tong X (2010) Varying coefficient transformation models with censored data. Biometrika 97:969–976
Article MATH MathSciNet Google Scholar
Clayton DG, Cuzick J (1985) Multivariate generalizations of the proportional hazards model. J R Stat Soc 148:82–117
MATH MathSciNet Google Scholar
Dudley RM (1984) A course on empirical processes, ècole d’ ètè de probabilitès de st flour. Lecture notes in mathematics, vol 1097. Springer, New York
Google Scholar
Guo G, Rodriguez G (1992) Estimating a multivariate proportional hazards model for clustered data using the em algorithm, with an application to child survival in Guatemala. J Am Stat Assoc 97:969–976
Article Google Scholar
Hougaard P (2000) Analysis of multivariate survival data: statistics for biology and health. Springer, New York
Book Google Scholar
Huang J, Rossini A (1997) Sieve estimation for the proportional-odds failure-time regression model with interval censoring. J Am Stat Assoc 92:960–967
Article MATH MathSciNet Google Scholar
Oakes D (1989) Bivariate survival models induced by frailties. J Am Stat Assoc 84:487–493
Article MATH MathSciNet Google Scholar
Rossini A, Moore D (1999) Modeling clustered, discrete, or grouped time survival data with covariates. Biometrics 55:813–819
Article Google Scholar
Sun J (2006) The statistical analysis of interval censored failure time data. Springer, New York
MATH Google Scholar
van der Vaart A, Wellner J (1996) Weak convergence and empirical processes with applications to statistics. Springer, New York
Book MATH Google Scholar
Williamson JM, Kim HY, Manatunga A, Addiss DG (2008) Modeling survival data with informative cluster size. Stat Med 27:543–555
Article MathSciNet Google Scholar
Zeng D, Lin D, Yin GS (2005) Maximum likelihood estimation for the proportional odds model with random effects. J Am Stat Assoc 100:470–482
Article MATH MathSciNet Google Scholar
Zhang X, Sun J (2010) Regression analysis of clustered interval-censored data failure time data with informative cluster size. Comput Stat Data Anal 54:1817–1823
Article MATH Google Scholar

Download references

Acknowledgements

The authors wish to thank the Editor, Dr. Xinhong Lin, the Associate Editor and two referees for their time and many helpful comments and suggestions. This work was partially supported by a NSF grant and NIH grant 5 R01 CA152035 to the third author. The work was also partially supported by NSF China Zhongdian Project 11131002, NSFC (NO. 10971015) and the Fundamental Research Funds for the Central Universities to the second author.

Author information

Authors and Affiliations

Department of Biostatistics, Harvard University, Boston, MA, 02115, USA
Junlong Li
School of Mathematical Sciences, Beijing Normal University, Beijing, 100875, China
Xingwei Tong
Department of Statistics, University of Missouri, Columbia, MO, 65211, USA
Jianguo Sun

Authors

Junlong Li
View author publications
You can also search for this author in PubMed Google Scholar
Xingwei Tong
View author publications
You can also search for this author in PubMed Google Scholar
Jianguo Sun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Junlong Li.

Appendix A: Proofs of Theorems

In this appendix, we will use the same notation defined above and provide the sketch of the proofs of the three theorems given in Sect. 3. For this, we need the following regularity conditions.

(C1)
There exists a positive constant a such that P(V _ij−U _ij≥a)=1.
(C2)
Define the parameter space $\varTheta=\mathcal{B}\times[\eta_{l},\eta_{u}] \times\mathcal{H}$ with $\mathcal{B}$ supposed to be a bounded open subset of R ^p, the true regression parameter β ₀ an interior point of $\mathcal{B}$, Λ ₀(0)=0, Λ ₀(τ)≤M for some large M, and $\mathcal{H}$ is the collection of all bounded continuous and nondecreasing functions on [0,τ].
(C3)
$\varLambda_{0}\in\mathcal{H}_{r} \subset\mathcal{H}$ for r=1 or 2, define
where Λ ^(r) stands for the rth derivative function of Λ.
(C4)
The covariates Z _ij’s are uniformly bounded. If there exist a vector β∈R ^p and a constant c such that $Z_{ij}^{T}\beta=c, \ \mbox{a.s.}$, then β=0 and c=0.
(C5)
The joint distribution of (U _ij,V _ij) does not depend on b _i and the density function g(u,v∣z) of (U _ij,V _ij) conditional on Z _ij=z has uniformly bounded partial derivatives with respect to u and v. In addition, g(u,v∣z)>0 over u,v∈[0,τ].
(C6)
The cluster sizes n _i’s are uniformly bounded with P(n _i=1)>0 and P(n _i≥2)>0.

Proof of Theorem 1

Since $\{\hat{\varLambda}_{n}(\cdot), n=1,2,\ldots\}$ is a sequence of bounded nondecreasing functions and $\{(\hat{\beta}_{n},\hat{\eta}_{n}), n=1,2,\ldots\} $ is also a sequence of bounded vectors, it follows from Helly’s selection theorem that there exists a convergent subsequence of $\hat{\theta}_{n}$. Let θ _∗=(β _∗,η _∗,Λ _∗) denote the limit of this convergent subsequence and for simplicity, we assume that the subsequence is simply $\hat{\theta}_{n}$. Then to prove the theorem, it is sufficient to show that β _∗=β ₀, η _∗=η ₀ and Λ _∗=Λ ₀.

Since $\hat{\theta}_{n}$ is the maximum likelihood estimator, we obtain

$$l_n(\hat{\beta}_n,\hat{\eta}_n,\hat{\varLambda}_n)\geq l_n(\beta_0, \eta_0,\varLambda_{0n}), $$

where Λ _0n denotes the projection of the true cumulative hazard function $\varLambda(t) \in\mathcal{H}$ into $\mathcal{H}_{n}$. Note that as n→∞, Λ _0n(t)→Λ ₀(t) for every t, since q is an increasing integer along with n at the rate O(n ^k) with 0<k<1/2. Letting n→∞, Glivenko–Cantelli theorem yields

$$E \bigl[l(\beta_*,\eta_*,\varLambda_*) \bigr]\geq E \bigl[l( \beta_0,\eta_0,\varLambda_0) \bigr]. $$

Using the Kullback–Leibler divergence, one obtains with probability one, l(θ _∗;D _i)=l(θ ₀;D _i) which indicates for any cluster i

Letting V _ij=t and U _ij=0. Then the above equality reduces to

Letting n _i=1 or 2, one obtains that for any t∈[0,τ], with probability one,

$$ \biggl\{1+\frac{1}{\eta_*}\varLambda_*(t)\exp \bigl(Z_{ij}^{T} \beta_* \bigr) \biggr\}^{-\eta_*} = \biggl\{1+\frac{1}{\eta_0} \varLambda_0(t)\exp \bigl(Z_{ij}^{T} \beta_0 \bigr) \biggr\}^{-\eta_0} $$

(A.1)

and

$$ \biggl\{1+\frac{2}{\eta_*}\varLambda_*(t)\exp \bigl(Z_{ij}^{T} \beta_* \bigr) \biggr\}^{-\eta_*} = \biggl\{1+\frac{2}{\eta_0} \varLambda_0(t)\exp \bigl(Z_{ij}^{T} \beta_0 \bigr) \biggr\}^{-\eta_0}. $$

(A.2)

By differentiating the above both sides with respect to t and letting t→0, we have $Z_{ij}^{T}(\beta_{*}-\beta_{0})=\log\{\varLambda_{0}(0)/\varLambda_{*}(0)\}$ and thus β _∗=β ₀ from (C4).

Now we prove that η _∗=η ₀ and Λ _∗=Λ ₀ for β ₀≠0. Otherwise, without loss of generality, we assume that η ₀>η _∗. Again from (A.1), one obtains that

$$ \varLambda_*(t)=\eta_* \biggl[ \biggl\{1+\frac{1}{\eta_0} \varLambda_0(t)\exp \bigl(Z_{i1}^{T} \beta_0 \bigr) \biggr\}^{\eta_0/\eta_*}-1 \biggr]\exp \bigl(-Z_{i1}^{T}\beta_0 \bigr) . $$

(A.3)

Now we fix t>0, the left hand of the above equality is a constant. The above two equations yield the result that the equation g(x)=c has solutions $\exp(Z_{i1}^{T}\beta_{0})$, where $g(x)=\frac{1}{x}\{(1+xb)^{\eta_{0}/\eta_{*}}-1\}$, c=Λ _∗(t)/η _∗ with b=Λ ₀(t)/η ₀>0. Note that the function g(x)={(1+bx)^α−1}/x is a strictly increasing function for x>0 when α>1. This means that the equation g(x)=c has at most one solution. But when β ₀≠0, condition (C4) yields the result that $\exp(Z_{i1}^{T}\beta_{0})$ can take multiple values, which leads to a contradiction. This yields η ₀/η _∗=1. Again using (A.3), one obtains that Λ _∗(t)=Λ ₀(t).

Finally we show that η _∗=η ₀ and Λ _∗=Λ ₀ when β ₀=0. Otherwise, without loss of generality, we assume that η ₀>η _∗. When β ₀=0, (A.3) reduces to

$$\varLambda_*(t)=\eta_* \biggl[ \biggl\{1+\frac{1}{\eta_0} \varLambda_0(t) \biggr\}^{\eta_0/\eta_*}-1 \biggr] $$

and (A.2) reduces to

$$\varLambda_*(t)=\frac{\eta_*}{2} \biggl[ \biggl\{ \biggl(1+\frac{2}{ \eta_0}\varLambda_0(t) \biggr) \biggr\}^{\eta_0/\eta_*}-1 \biggr], $$

with fixed t>0, the two equations above yield the result that the equation g(x)=c has two solutions: 1 and 2, where $g(x)=\{(1+xb)^{\eta_{0}/\eta_{*}}-1\}/x$. Similarly this contradicts the monotonicity property of the function g. Therefore, η _∗=η ₀ and Λ _∗=Λ ₀. These prove Theorem 1. □

Proof of Theorem 2

Let G(u,v) denote the joint distribution of (U _ij,V _ij) as before and θ _0n=(β ₀,η ₀,Λ _0n) be the projection of the true parameter θ ₀∈Θ into Θ _n. Also let ϵ be a small and fixed positive number. Define a function class

$$\varPsi(\epsilon)= \bigl\{l(d; \theta): \rho(\theta,\theta_{0n})\leq \epsilon, \theta\in\varTheta_n \bigr\} , $$

where d is the realization of D. It is easy to see that the log-likelihood function has closed form although it is complicated and it satisfies the Lipschitz conditions with respect to its arguments from conditions (C1), (C5), (C6) and the definition of Θ _n. Then, for all 0<ξ<ϵ,

$$ \log N_{[]}(\xi,\varPsi,\rho)\leq A_1\epsilon/ \xi+A_1 q_n\log(\epsilon/\xi), $$

(A.4)

where N _[](ξ,Ψ,ρ) is the bracketing number under the norm ρ and A ₁ is a constant not depending on n; see [15]. Let $J_{[]}(\epsilon,\varPsi,\rho)=\int_{0}^{\epsilon}\{1+ \log N_{[]}(\xi,\varGamma,\rho)\}^{1/2}\,d\xi$ be the integral entropy. It follows from (A.4) that

$$ J_{[]}(\epsilon,\varPsi,\rho)\leq A_2 q_n^{1/2} \epsilon, $$

(A.5)

where A ₂>0 is a constant depending only on A ₁. Then, (A.5) and Lemma 3.4.2 in [15, p. 324] implies that for sufficiently large n,

where A ₃ and A ₄ are all constants not depending on n. One can verify that the conditions of Lemma 3.4.1 in [15, p. 322] are all satisfied. It then follows that $\rho(\hat{\theta}_{n},\theta_{0n})=O_{p}\{(n/q_{n})^{-1/2}\}=O_{p}(n^{-(1-k)/2})$.

Since $\varLambda_{0n}\in\mathcal{H}_{n}$, condition (C3) yields

$$\int_0^\tau\big|\varLambda_0(t)- \varLambda_{0n}(t)\big|^2\,dt=O \bigl(n^{-2rk} \bigr) \!\quad \mbox{and}\!\quad E \bigl\{l_{n}(\theta_0;D)-l_{n}( \theta_{0n};D) \bigr\}=O \bigl(n^{-2rk} \bigr), $$

which, together with (C5), immediately yields ρ(θ ₀,θ _0n)=O _p(n ^−rk). By triangle inequality, one obtains that $\rho(\hat{\theta}_{n},\theta_{0})=O_{p}(n^{-rk}+n^{-(1-k)/2})$. The proof is complete. □

Proof of Theorem 3

We show this theorem by verifying the four conditions in Theorem 3.3.1 of [15]. To this end, we first define the random mappings Ψ and Ψ _n. Denote the log-likelihood function of θ as l(θ;D). For i=1,…,n, we have

where $W_{i}=\sum_{j=1}^{n_{i}}\varLambda(U_{ij})\exp(Z_{ij}^{T}\beta)$. For $g_{\varLambda}\in\mathcal{H}_{r}$, let $g_{n\varLambda}\in\mathcal{H}_{n}$ be the projection of g _Λ. For θ=(β,η,Λ)∈Θ, denote its projection by θ _n=(β,η,Λ _n), where θ _n∈Θ _n, then ρ(θ,θ _n)=O _p(n ^−rk).

Define the random maps Ψ as follows: for any function $g_{\varLambda}\in\mathcal{H}_{r}$, g _β∈R ^p with ∥g _β∥≤1 and a scalar g _η with |g _η|≤1,

To be specific, Ψ(θ)[g _β,g _η,g _Λ] is the score function along the path (β+ag _β,η+ag _η,Λ+ag _Λ). For i=1,…,n, define random maps

$$\varPsi_n(\theta)[g_\beta,g_\eta,g_\varLambda] = \mathcal{P}_n \varPsi(\theta_n)[g_\beta,g_\eta,g_{n\varLambda}], $$

where $\mathcal{P}_{n}$ is the empirical measure of the first n observations.

First, we prove that the first condition holds, which is

(A.6)

for k<0.5 and rk>0.25. By arguments analogous to those in Theorem 1, one can show

(A.7)

for k<0.5 and r>0.5. It follows from the Taylor expansion that

(A.8)

under the assumptions rk>0.25 and k<0.5. Observe that

(A.9)

The second equality of (A.9) follows from the uniform asymptotic equicontinuity of empirical processes indexed by Donsker class of functions [8]. Then (A.6) follows from (A.7)–(A.9) and thus the first condition is satisfied.

Now we verify the second condition. Applying Theorem 2.10.6 of [15, p. 192], one can show that the function class

$$\bigl\{ \varPsi(\theta)[g_\beta,g_\eta,g_\varLambda]: \|g_\beta\|\leq1, |g_\eta|\leq1, \mbox{$g_{\varLambda}\in \mathcal{H}_{r}$, $\rho(\theta,\theta_{0})<\xi$} \bigr\} $$

is P-Donsker for some small ξ>0. Therefore,

$$n^{1/2} \bigl\{\mathcal{P}_n\varPsi(\theta_0)[g_\beta,g_\eta,g_\varLambda] -E\varPsi (\theta_0)[g_\beta,g_\eta,g_\varLambda] \bigr\}\to N \bigl(0,\sigma^{-2} \bigr) $$

in distribution with

$$ \sigma^2= 1/E \bigl[ \bigl\{\varPsi (\theta_0)[g_\beta,g_\eta,g_\varLambda] \bigr\}^2 \bigr], $$

(A.10)

which implies that the second condition holds.

To prove the third condition that $E\dot{\varPsi}_{\theta_{0}}$ is continuously invertible, following the proof of Theorem 2 in [17], it is enough to prove that, if Ψ(θ ₀)[g _β,g _η,g _Λ]=0 almost surely, then g _β=0,g _η=0,g _Λ=0. For fixed t∈[0,τ], by letting n _i=1, U _i1→0, V _i1→t, after some calculation one obtains that, with probability one,

Again letting n _i=2,U _i1→0,U _i2→0,V _i1→t,V _i2→t and Z _i1=Z _i2, after some calculation one obtains that, with probability one,

Combining the above two equations, one obtains that g _η=0 and $Z_{i1}^{T}g_{\beta}+ g_{\varLambda}(t)=0$, which, together with condition (C4), yield g _β=0, g _η=0 and g _Λ=0.

Observe that $E\{\varPsi(\theta_{0})[g_{\beta},g_{\eta},g_{\varLambda}]\}= \varPsi_{n}(\hat{\theta}_{n})[g_{\beta},g_{\eta},g_{n\varLambda}]=0$, which implies that the forth condition is satisfied. The proof is complete. □

The proof of the consistent estimation of σ ² follows similarly from the proof of Theorem 3 in [17]. This completes the proof.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, J., Tong, X. & Sun, J. Sieve Estimation for the Cox Model with Clustered Interval-Censored Failure Time Data. Stat Biosci 6, 55–72 (2014). https://doi.org/10.1007/s12561-012-9078-1

Download citation

Received: 10 June 2012
Accepted: 29 November 2012
Published: 18 December 2012
Issue Date: May 2014
DOI: https://doi.org/10.1007/s12561-012-9078-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sieve Estimation for the Cox Model with Clustered Interval-Censored Failure Time Data

Abstract

Access this article

Similar content being viewed by others

A new method for regression analysis of interval-censored data with the additive hazards model

Regression analysis of doubly censored failure time data with ancillary information

Regression Analysis of Interval-Censored Data with Informative Observation Times Under the Accelerated Failure Time Model

References

Acknowledgements