Abstract
This paper discusses regression analysis of clustered failure time data, which occur when the failure times of interest are collected from clusters. In particular, we consider the situation where the correlated failure times of interest may be related to cluster sizes. For inference, we present two estimation procedures, the weighted estimating equation-based method and the within-cluster resampling-based method, when the correlated failure times of interest arise from a class of additive transformation models. The former makes use of the inverse of cluster sizes as weights in the estimating equations, while the latter can be easily implemented by using the existing software packages for right-censored failure time data. An extensive simulation study is conducted and indicates that the proposed approaches work well in both the situations with and without informative cluster size. They are applied to a dental study that motivated this study.
Similar content being viewed by others
References
Cai J, Prentice RL (1995) Estimating equation for hazard ratio parameters based on correlated failure time data. Biometrics 82:151–164
Cong X, Yin G, Shen Y (2007) Marginal analysis of correlated failure time data with informative cluster sizes. Biometrics 63:663–672
Cox DR (1972) Regression models and life tables (with discussion). J R Stat Soc B 34:187–220
Dunson DB, Chen Z, Harry J (2003) Bayesian joint models of cluster size and subunit-specific outcomes. Biometrics 63:663–672
Hoffman EB, Sen PK, Weinberg CR (2001) Within cluster resampling. Biometrika 88:1121–1134
Kalbfleisch JD, Prentice RL (2002) The statistical analysis of failure time data, 2nd edn. Wiley, New York
Lin DY, Ying Z (1994) Semiparametric analysis of the additive risk model. Biometrika 81:61–71
McGuire MK, Nunn ME (1996) Prognosis versus actual outcome III: the effectiveness of clinical parameters in accurately predicting tooth survival. J Peridontol 67:666–674
O’Neill TJ (1986) Inconsistency of the misspecified proportional hazards model. Stat Probabil Lett 4:219–222
Williamson J, Kim HY, Manathuga A, Addiss DG (2008) Modeling survival data with informative cluster size. Stat Med 27:543–555
Xue L, Wang L, Qu A (2010) Incorporating correlation for multivariate failure time data when cluster size is large. Biometrics 66:393–404
Yin G, Cai J (2004) Additive hazards model with multivariate failure time data. Biometrika 91:801–818
Zeng D, Cai J (2010) Additive transformation models for clustered failure time data. Lifetime Data Anal 16:333–352
Acknowledgments
The authors wish to thank the Associate Editor and two reviewers for their many helpful comments and suggestions that greatly improved the paper and also are grateful to Dr. Martha Nunn for generously providing the dental data and Prof. Donglin Zeng for very helpful discussions. This research was partly supported by NSFC with Grants No. 11471252 to the second author and a NIH grant and a NSF grant to the third author.
Author information
Authors and Affiliations
Corresponding author
Appendix: Proofs of the asymptotic properties of the estimators
Appendix: Proofs of the asymptotic properties of the estimators
In this Appendix, we will first describe the regularity conditions needed for the asymptotic properties and then give more precise descriptions of the properties in the two theorems before sketching the proofs. First it can be seen from the definition of the matrix \(A_{wee}\) that \(A_{wee}\) converges in probability to a nonsingular deterministic matrix denoted by \(\mathcal A_{wee}.\) In the following, assume that \(n^{-1}\sum _{i=1}^n\sum _{j=1}^{n_i}n_i^{-1}Y_{ij}(t)\varvec{Z}_{ij}(t), n^{-1}\sum _{i=1}^n\sum _{j=1}^{n_i}n_i^{-1}Y_{ij}(t),\ n^{-1}\sum _{i=1}^nY_i^q(t)\varvec{Z}_i^q(t)\) and \(n^{-1}\sum _{i=1}^nY_i^q(t)\) uniformly converge to \(\kappa (t),\ \pi (t),\) \(\tilde{\kappa }(t)\) and \(\tilde{\pi }(t),\) respectively. For \(i=1,\ldots ,n; \, j=1,\ldots ,n_i\) and some constant \(\tau \), we assume the following regularity conditions: \(P\{Y_{ij}(t)=1, 0\le t\le \tau \}>0;\ \int _0^\infty \lambda _0(t)dt<\infty ; \varvec{Z}_{ij}(t)\) is bounded; \(\mathcal A_{wee}\) is positive definite and the cluster sizes are finite.
To give a precise description of the properties and the proof, define the marginal filtration by
and
Obviously, \(M_{ij}(t)\) is a local square-integrable martingale with respect to \(\mathcal {F}_{ij}(t).\) Now we are ready to describe the properties and sketch the proofs.
1.1 Asymptotic properties of \(\hat{\varvec{\beta }}_{wee}\) and the proof
Theorem 1
Under the regularity conditions given above, as \(n\rightarrow \infty ,\ \sqrt{n}(\hat{\varvec{\beta }}_{wee}-\varvec{\beta }_0)\) converges in distribution to a zero-mean normal random vector, and the covariance matrix can be consistently estimated by \(\hat{\varvec{\Sigma }}_{wee}=A_{wee}^{-1}B_{wee}A_{wee}^{-1}.\)
Proof
Some simple algebraic manipulation yields
Applying the same arguments as those in Yin and Cai (2004), under the regularity conditions, the above quantity can be shown to be asymptotically equivalent to
Note that \( A_{wee}\) converges in probability to \(\mathcal A_{wee},\) and \(\Theta _i(\varvec{\beta }_0)\) for \(i=1,\ldots ,n\) are independent random vectors with zero-mean and bounded variance. By the multivariate central limit theorem and the Slutsky’s theorem, it yields that \(\sqrt{n}(\hat{\varvec{\beta }}_{wee}-\varvec{\beta }_0)\) converges to a zero-mean normal random vector with zero-mean and covariance matrix that can be consistently estimated by
So Theorem 1 holds. \(\square \)
1.2 Asymptotic properties of \(\hat{\varvec{\beta }}_{wcr}\) and the proof
Theorem 2
Under regularity conditions, as \(n\rightarrow \infty ,\ \sqrt{n}(\hat{\varvec{\beta }}_{wcr}-\varvec{\beta }_0)\) converges in distribution to a zero-mean normal random vector, and the covariance matrix can be consistently estimated by \(\hat{\varvec{\Sigma }}_{wcr}.\)
Proof
Since \(\hat{\varvec{\beta }}^q\) is the solution of the estimating equation \(U^q(\varvec{\beta })=0\), and by the Taylor’s expansion, we have
where \(\hat{\varvec{\beta }}^*\) is on the line segment between \(\hat{\varvec{\beta }}^q\) and \(\varvec{\beta }_0.\) Rewriting (6) yields that
Note that
which is negative definite, and so \(A_q\) is positive definite. Since the Q resamples are identically distributed, it can be seen that \(A_q\) converges in probability to a deterministic and positive definite matrix denoted by \(\mathcal A_{wcr}.\)
Averaging over \(q=1,\ldots ,Q\) resamples, it yields
It is sufficient to show that \(n^{-1/2}Q^{-1}\sum _{q=1}^QU^q(\varvec{\beta }_0)\) converges to a normal distribution as \(n\rightarrow \infty ,\) changing the order of summation yields that
where \({\mathcal U_i(\varvec{\beta }_0)},\ i=1,\ldots ,n\) are independent with zero mean and finite variance. By the multivariate central limit theorem, \(n^{-1/2}Q^{-1}\sum _{q=1}^QU^q(\varvec{\beta }_0)\) is asymptotically normal with zero mean and some positive definite covariance matrix. Combining with Slutsky’s theorem, \(\sqrt{n}(\hat{\varvec{\beta }}_{wcr}-\varvec{\beta }_0)\) converges in distribution to a normal random vector with zero mean and covariance matrix can be consistently estimated by \(n\hat{\varvec{\Sigma }}_{wcr}\).
For the consistent estimator of the covariance matrix, similar to Hoffman et al. (2001), we first write
where the expectations on the right-hand side are over the resampling distribution for \(\hat{\varvec{\beta }}_q\) given the data. By the fact of \(E(\hat{\varvec{\beta }}_q|\text {data})=\hat{\varvec{\beta }}_{wcr},\) it yields that
For each resampled data, \(\text {var}(\hat{\varvec{\beta }}_q)\) can be consistently estimated by \(\tilde{\varvec{\Sigma }}_q = \hat{\varvec{\Sigma }}_q/n\). By averaging over the Q resamples, the resulting estimator denoted by \(Q^{-1}\sum _{q=1}^Q\tilde{\varvec{\Sigma }}_q\) is also consistent. For the second term on the right-hand side of (7), since
it can be estimated as the covariance matrix based on the Q resampling estimators \(\hat{\varvec{\beta }}_q\), \(q = 1,\ldots ,Q\), that is
Here similar to Cong et al. (2007), let \(\Omega \) denote the estimator of \(\text {var}(\hat{\varvec{\beta }}_q|\text {data}).\) Thus the estimated variance-covariance matrix of \(\hat{\varvec{\beta }}_{wcr}\) is
To show the consistency of \(\hat{\varvec{\Sigma }}_{wcr},\) it suffices to show that \(\Omega -E(\tilde{\Omega })=(\Omega -\tilde{\Omega })+ (\tilde{\Omega }-E(\tilde{\Omega }))\rightarrow 0\) in probability as \(n\rightarrow \infty .\) It is easy to see that \(\Omega -\tilde{\Omega }\rightarrow 0\) in probability as \(n\rightarrow \infty .\) Additionally, by applying the same arguments as those in the proof of Cong et al. (2007), it can be shown that \(\tilde{\Omega }-E(\tilde{\Omega })\rightarrow 0\) in probability as \(n\rightarrow \infty .\) This completes the proof of Theorem 2. \(\square \)
Rights and permissions
About this article
Cite this article
Chen, L., Feng, Y. & Sun, J. Regression analysis of clustered failure time data with informative cluster size under the additive transformation models. Lifetime Data Anal 23, 651–670 (2017). https://doi.org/10.1007/s10985-016-9384-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10985-016-9384-x