Regression analysis of clustered failure time data with informative cluster size under the additive transformation models

Chen, Ling; Feng, Yanqin; Sun, Jianguo

doi:10.1007/s10985-016-9384-x

Regression analysis of clustered failure time data with informative cluster size under the additive transformation models

Published: 19 October 2016

Volume 23, pages 651–670, (2017)
Cite this article

Lifetime Data Analysis Aims and scope Submit manuscript

Ling Chen¹,
Yanqin Feng^2,3 &
Jianguo Sun⁴

468 Accesses
5 Citations
Explore all metrics

Abstract

This paper discusses regression analysis of clustered failure time data, which occur when the failure times of interest are collected from clusters. In particular, we consider the situation where the correlated failure times of interest may be related to cluster sizes. For inference, we present two estimation procedures, the weighted estimating equation-based method and the within-cluster resampling-based method, when the correlated failure times of interest arise from a class of additive transformation models. The former makes use of the inverse of cluster sizes as weights in the estimating equations, while the latter can be easily implemented by using the existing software packages for right-censored failure time data. An extensive simulation study is conducted and indicates that the proposed approaches work well in both the situations with and without informative cluster size. They are applied to a dental study that motivated this study.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Maximum likelihood estimation of the Weibull distribution with reduced bias

Article Open access 17 April 2023

Cox Proportional Hazards Regression Model

hermiter: R package for sequential nonparametric estimation

Article 01 July 2023

References

Cai J, Prentice RL (1995) Estimating equation for hazard ratio parameters based on correlated failure time data. Biometrics 82:151–164
Article MathSciNet MATH Google Scholar
Cong X, Yin G, Shen Y (2007) Marginal analysis of correlated failure time data with informative cluster sizes. Biometrics 63:663–672
Article MathSciNet MATH Google Scholar
Cox DR (1972) Regression models and life tables (with discussion). J R Stat Soc B 34:187–220
MATH Google Scholar
Dunson DB, Chen Z, Harry J (2003) Bayesian joint models of cluster size and subunit-specific outcomes. Biometrics 63:663–672
MATH Google Scholar
Hoffman EB, Sen PK, Weinberg CR (2001) Within cluster resampling. Biometrika 88:1121–1134
Article MathSciNet MATH Google Scholar
Kalbfleisch JD, Prentice RL (2002) The statistical analysis of failure time data, 2nd edn. Wiley, New York
Book MATH Google Scholar
Lin DY, Ying Z (1994) Semiparametric analysis of the additive risk model. Biometrika 81:61–71
Article MathSciNet MATH Google Scholar
McGuire MK, Nunn ME (1996) Prognosis versus actual outcome III: the effectiveness of clinical parameters in accurately predicting tooth survival. J Peridontol 67:666–674
Article Google Scholar
O’Neill TJ (1986) Inconsistency of the misspecified proportional hazards model. Stat Probabil Lett 4:219–222
Article MathSciNet MATH Google Scholar
Williamson J, Kim HY, Manathuga A, Addiss DG (2008) Modeling survival data with informative cluster size. Stat Med 27:543–555
Article MathSciNet Google Scholar
Xue L, Wang L, Qu A (2010) Incorporating correlation for multivariate failure time data when cluster size is large. Biometrics 66:393–404
Article MathSciNet MATH Google Scholar
Yin G, Cai J (2004) Additive hazards model with multivariate failure time data. Biometrika 91:801–818
Article MathSciNet MATH Google Scholar
Zeng D, Cai J (2010) Additive transformation models for clustered failure time data. Lifetime Data Anal 16:333–352
Article MathSciNet MATH Google Scholar

Download references

Acknowledgments

The authors wish to thank the Associate Editor and two reviewers for their many helpful comments and suggestions that greatly improved the paper and also are grateful to Dr. Martha Nunn for generously providing the dental data and Prof. Donglin Zeng for very helpful discussions. This research was partly supported by NSFC with Grants No. 11471252 to the second author and a NIH grant and a NSF grant to the third author.

Author information

Authors and Affiliations

Division of Biostatistics, Washington University School of Medicine, Campus Box 8067, 660 S. Euclid Ave, St. Louis, MO, 63110, USA
Ling Chen
School of Mathematics and Statistics, Wuhan University, Wuhan, 430072, China
Yanqin Feng
Computational Science Hubei Key Laboratory, Wuhan University, Wuhan, 430072, China
Yanqin Feng
Department of Statistics, University of Missouri, 146 Middlebush Hall, Columbia, MO, 65211, USA
Jianguo Sun

Authors

Ling Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yanqin Feng
View author publications
You can also search for this author in PubMed Google Scholar
Jianguo Sun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ling Chen.

Appendix: Proofs of the asymptotic properties of the estimators

In this Appendix, we will first describe the regularity conditions needed for the asymptotic properties and then give more precise descriptions of the properties in the two theorems before sketching the proofs. First it can be seen from the definition of the matrix $A_{wee}$ that $A_{wee}$ converges in probability to a nonsingular deterministic matrix denoted by $\mathcal A_{wee}.$ In the following, assume that $n^{-1}\sum _{i=1}^n\sum _{j=1}^{n_i}n_i^{-1}Y_{ij}(t)\varvec{Z}_{ij}(t), n^{-1}\sum _{i=1}^n\sum _{j=1}^{n_i}n_i^{-1}Y_{ij}(t),\ n^{-1}\sum _{i=1}^nY_i^q(t)\varvec{Z}_i^q(t)$ and $n^{-1}\sum _{i=1}^nY_i^q(t)$ uniformly converge to $\kappa (t),\ \pi (t),$ $\tilde{\kappa }(t)$ and $\tilde{\pi }(t),$ respectively. For $i=1,\ldots ,n; \, j=1,\ldots ,n_i$ and some constant $\tau $, we assume the following regularity conditions: $P\{Y_{ij}(t)=1, 0\le t\le \tau \}>0;\ \int _0^\infty \lambda _0(t)dt<\infty ; \varvec{Z}_{ij}(t)$ is bounded; $\mathcal A_{wee}$ is positive definite and the cluster sizes are finite.

To give a precise description of the properties and the proof, define the marginal filtration by

$$\begin{aligned} \mathcal {F}_{ij}(t)=\sigma \{N_{ij}(u),Y_{ij}(u+),\varvec{Z}_{ij}(u+),\ 0\le u\le t\}, \end{aligned}$$

and

$$\begin{aligned} M_{ij}(t)=N_{ij}(t)-\int _0^tY_{ij}(s)\lambda _0(s)dt- \int _0^tY_{ij}(s)\varvec{\beta }_0^{\prime }\varvec{Z}_{ij}(s)ds. \end{aligned}$$

Obviously, $M_{ij}(t)$ is a local square-integrable martingale with respect to $\mathcal {F}_{ij}(t).$ Now we are ready to describe the properties and sketch the proofs.

1.1 Asymptotic properties of $\hat{\varvec{\beta }}_{wee}$ and the proof

Theorem 1

Under the regularity conditions given above, as $n\rightarrow \infty ,\ \sqrt{n}(\hat{\varvec{\beta }}_{wee}-\varvec{\beta }_0)$ converges in distribution to a zero-mean normal random vector, and the covariance matrix can be consistently estimated by $\hat{\varvec{\Sigma }}_{wee}=A_{wee}^{-1}B_{wee}A_{wee}^{-1}.$

Proof

Some simple algebraic manipulation yields

$$\begin{aligned}&\sqrt{n}(\hat{\varvec{\beta }}_{wee}-\varvec{\beta }_0)\\&\quad =\sqrt{n}(nA_{wee})^{-1} \left( \sum _{i=1}^n\frac{1}{n_i}\sum _{j=1}^{n_i}\int _0^\infty ( \varvec{Z}_{ij}(t)-\bar{\varvec{Z}}(t))dN_{ij} (t)-nA_{wee}\varvec{\beta }_0\right) \\&\quad =\frac{1}{\sqrt{n}}A_{wee}^{-1}\sum _{i=1}^n \frac{1}{n_i}\sum _{j=1}^{n_i}\left( \int _0^\infty ( \varvec{Z}_{ij}(t)-\bar{\varvec{Z}}(t))dN_{ij} (t)\right. \\&\qquad \left. -\int _0^\infty Y_{ij}(t)(\varvec{Z}_{ij} (t)-\bar{\varvec{Z}}(t))^{\otimes 2}\varvec{\beta }_0dt\right) \\&\quad =\frac{1}{\sqrt{n}}A_{wee}^{-1}\sum _{i=1}^n\frac{1}{n_i} \sum _{j=1}^{n_i}\int _0^\infty (\varvec{Z}_{ij} (t)-\bar{\varvec{Z}}(t))\left( dN_{ij}(t)- Y_{ij}(t) \varvec{\beta }_0^{\prime }\varvec{Z}_{ij}(t)dt\right) \\&\quad =\frac{1}{\sqrt{n}}A_{wee}^{-1}\sum _{i=1}^n \frac{1}{n_i}\sum _{j=1}^{n_i}\int _0^\infty (\varvec{Z}_{ij} (t)-\bar{\varvec{Z}}(t))dM_{ij}(t). \end{aligned}$$

Applying the same arguments as those in Yin and Cai (2004), under the regularity conditions, the above quantity can be shown to be asymptotically equivalent to

$$\begin{aligned} \frac{1}{\sqrt{n}}A_{wee}^{-1}\sum _{i=1}^n\left( \frac{1}{n_i} \sum _{j=1}^{n_i}\int _0^\infty \left( \varvec{Z}_{ij}(t) -\frac{\kappa (t)}{\pi (t)}\right) dM_{ij}(t)\right) :=\frac{1}{\sqrt{n}}A_{wee}^{-1}\sum _{i=1}^n \Theta _i(\varvec{\beta }_0). \end{aligned}$$

Note that $ A_{wee}$ converges in probability to $\mathcal A_{wee},$ and $\Theta _i(\varvec{\beta }_0)$ for $i=1,\ldots ,n$ are independent random vectors with zero-mean and bounded variance. By the multivariate central limit theorem and the Slutsky’s theorem, it yields that $\sqrt{n}(\hat{\varvec{\beta }}_{wee}-\varvec{\beta }_0)$ converges to a zero-mean normal random vector with zero-mean and covariance matrix that can be consistently estimated by

$$\begin{aligned} \hat{\varvec{\Sigma }}_{wee}=A_{wee}^{-1}\left( \frac{1}{n} \sum _{i=1}^n\hat{\Theta }_i(\hat{\varvec{\beta }}_{wee}) \hat{\Theta }_i^{\prime }(\hat{\varvec{\beta }}_{wee})\right) A_{wee}^{-1}=A_{wee}^{-1}B_{wee}A_{wee}^{-1}. \end{aligned}$$

So Theorem 1 holds. $\square $

1.2 Asymptotic properties of $\hat{\varvec{\beta }}_{wcr}$ and the proof

Theorem 2

Under regularity conditions, as $n\rightarrow \infty ,\ \sqrt{n}(\hat{\varvec{\beta }}_{wcr}-\varvec{\beta }_0)$ converges in distribution to a zero-mean normal random vector, and the covariance matrix can be consistently estimated by $\hat{\varvec{\Sigma }}_{wcr}.$

Proof

Since $\hat{\varvec{\beta }}^q$ is the solution of the estimating equation $U^q(\varvec{\beta })=0$, and by the Taylor’s expansion, we have

$$\begin{aligned} -U^q(\varvec{\beta }_0)=U^q(\hat{\varvec{\beta }}^q) -U^q(\varvec{\beta }_0)=\frac{\partial U^q(\varvec{\beta }^*)}{\partial {\varvec{\beta }}^*} (\hat{\varvec{\beta }}^q-\varvec{\beta }_0), \end{aligned}$$

(6)

where $\hat{\varvec{\beta }}^*$ is on the line segment between $\hat{\varvec{\beta }}^q$ and $\varvec{\beta }_0.$ Rewriting (6) yields that

$$\begin{aligned} \sqrt{n}(\hat{\varvec{\beta }}^q-\varvec{\beta }_0)= \left( -\frac{1}{n}\frac{\partial U^q(\varvec{\beta }^*)}{\partial {\varvec{\beta }}^*}\right) ^{-1} \left( \frac{1}{\sqrt{n}}U^q(\varvec{\beta }_0)\right) . \end{aligned}$$

Note that

$$\begin{aligned} \frac{1}{n}\frac{\partial U^q(\varvec{\beta })}{\partial {\varvec{\beta }}} =-\frac{1}{n}\sum _{i=1}^n\int _0^\infty Y_i^q(t)\left( \varvec{Z}_i^q(t)-\bar{\varvec{Z}}^q(t)\right) ^{\otimes 2}dt=-A_q, \end{aligned}$$

which is negative definite, and so $A_q$ is positive definite. Since the Q resamples are identically distributed, it can be seen that $A_q$ converges in probability to a deterministic and positive definite matrix denoted by $\mathcal A_{wcr}.$

Averaging over $q=1,\ldots ,Q$ resamples, it yields

$$\begin{aligned} \sqrt{n}(\hat{\varvec{\beta }}_{wcr}-\varvec{\beta }_0)= & {} \frac{1}{Q}\sum _{q=1}^Q\sqrt{n}(\hat{\varvec{\beta }}^q-\varvec{\beta }_0) =\frac{1}{Q}\sum _{q=1}^QA_q^{-1}\frac{1}{\sqrt{n}}U^q(\varvec{\beta }_0)\\= & {} {\mathcal A_{wcr}}^{-1}\frac{1}{\sqrt{n}Q} \sum _{q=1}^QU^q(\varvec{\beta }_0)+o_p(1). \end{aligned}$$

It is sufficient to show that $n^{-1/2}Q^{-1}\sum _{q=1}^QU^q(\varvec{\beta }_0)$ converges to a normal distribution as $n\rightarrow \infty ,$ changing the order of summation yields that

$$\begin{aligned} \frac{1}{\sqrt{n}Q}\sum _{q=1}^QU^q(\varvec{\beta }_0)= & {} \frac{1}{\sqrt{n}}\sum _{i=1}^n\frac{1}{Q}\sum _{q=1}^Q \int _0^\infty \left( \varvec{Z}_i^q(t)-\bar{\varvec{Z}}^q(t)\right) dM_i^q(t)\\= & {} \frac{1}{\sqrt{n}}\sum _{i=1}^n\frac{1}{Q}\sum _{q=1}^Q \int _0^\infty \left( \varvec{Z}_i^q(t)-\frac{\tilde{\kappa }(t)}{\tilde{\pi }(t)}\right) dM_i^q(t)+o_p(1)\\= & {} \frac{1}{\sqrt{n}}\sum _{i=1}^n{\mathcal U_i(\varvec{\beta }_0)}+o_p(1), \end{aligned}$$

where ${\mathcal U_i(\varvec{\beta }_0)},\ i=1,\ldots ,n$ are independent with zero mean and finite variance. By the multivariate central limit theorem, $n^{-1/2}Q^{-1}\sum _{q=1}^QU^q(\varvec{\beta }_0)$ is asymptotically normal with zero mean and some positive definite covariance matrix. Combining with Slutsky’s theorem, $\sqrt{n}(\hat{\varvec{\beta }}_{wcr}-\varvec{\beta }_0)$ converges in distribution to a normal random vector with zero mean and covariance matrix can be consistently estimated by $n\hat{\varvec{\Sigma }}_{wcr}$.

For the consistent estimator of the covariance matrix, similar to Hoffman et al. (2001), we first write

$$\begin{aligned} \text {var}(\hat{\varvec{\beta }}_q)=E\left( \text {var}(\hat{\varvec{\beta }}_q|\text {data})\right) +\text {var}\left( {E}(\hat{\varvec{\beta }}_q|\text {data})\right) , \end{aligned}$$

where the expectations on the right-hand side are over the resampling distribution for $\hat{\varvec{\beta }}_q$ given the data. By the fact of $E(\hat{\varvec{\beta }}_q|\text {data})=\hat{\varvec{\beta }}_{wcr},$ it yields that

$$\begin{aligned} \text {var}(\hat{\varvec{\beta }}_{wcr})=\text {var}( \hat{\varvec{\beta }}_q)-E(\text {var} (\hat{\varvec{\beta }}_q|\text {data})). \end{aligned}$$

(7)

For each resampled data, $\text {var}(\hat{\varvec{\beta }}_q)$ can be consistently estimated by $\tilde{\varvec{\Sigma }}_q = \hat{\varvec{\Sigma }}_q/n$. By averaging over the Q resamples, the resulting estimator denoted by $Q^{-1}\sum _{q=1}^Q\tilde{\varvec{\Sigma }}_q$ is also consistent. For the second term on the right-hand side of (7), since

$$\begin{aligned} E(\text {var}(\hat{\varvec{\beta }}_q|\text {data}))= E\left( \frac{1}{Q}\sum _{q=1}^Q(\hat{\varvec{\beta }}_q- \hat{\varvec{\beta }}_{wcr}) (\hat{\varvec{\beta }}_q-\hat{\varvec{\beta }}_{wcr})^{\prime }\right) , \end{aligned}$$

it can be estimated as the covariance matrix based on the Q resampling estimators $\hat{\varvec{\beta }}_q$, $q = 1,\ldots ,Q$, that is

$$\begin{aligned} \tilde{\Omega }=\frac{1}{Q}\sum _{q=1}^Q(\hat{\varvec{\beta }}_q -\hat{\varvec{\beta }}_{wcr})(\hat{\varvec{\beta }}_q -\hat{\varvec{\beta }}_{wcr})^{\prime }\ \ \ \text {or}\ \ \ \Omega =\frac{1}{Q-1}\sum _{q=1}^Q(\hat{\varvec{\beta }}_q -\hat{\varvec{\beta }}_{wcr})(\hat{\varvec{\beta }}_q -\hat{\varvec{\beta }}_{wcr})^{\prime }\,. \end{aligned}$$

Here similar to Cong et al. (2007), let $\Omega $ denote the estimator of $\text {var}(\hat{\varvec{\beta }}_q|\text {data}).$ Thus the estimated variance-covariance matrix of $\hat{\varvec{\beta }}_{wcr}$ is

$$\begin{aligned} \hat{\varvec{\Sigma }}_{wcr}=\frac{1}{Q}\sum _{q=1}^Q \tilde{\varvec{\Sigma }}_q-\frac{1}{Q-1}\sum _{q=1}^Q (\hat{\varvec{\beta }}_q-\hat{\varvec{\beta }}_{wcr}) (\hat{\varvec{\beta }}_q-\hat{\varvec{\beta }}_{wcr})^{\prime }. \end{aligned}$$

To show the consistency of $\hat{\varvec{\Sigma }}_{wcr},$ it suffices to show that $\Omega -E(\tilde{\Omega })=(\Omega -\tilde{\Omega })+ (\tilde{\Omega }-E(\tilde{\Omega }))\rightarrow 0$ in probability as $n\rightarrow \infty .$ It is easy to see that $\Omega -\tilde{\Omega }\rightarrow 0$ in probability as $n\rightarrow \infty .$ Additionally, by applying the same arguments as those in the proof of Cong et al. (2007), it can be shown that $\tilde{\Omega }-E(\tilde{\Omega })\rightarrow 0$ in probability as $n\rightarrow \infty .$ This completes the proof of Theorem 2. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, L., Feng, Y. & Sun, J. Regression analysis of clustered failure time data with informative cluster size under the additive transformation models. Lifetime Data Anal 23, 651–670 (2017). https://doi.org/10.1007/s10985-016-9384-x

Download citation

Received: 02 September 2015
Accepted: 12 October 2016
Published: 19 October 2016
Issue Date: October 2017
DOI: https://doi.org/10.1007/s10985-016-9384-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Regression analysis of clustered failure time data with informative cluster size under the additive transformation models

Abstract

Access this article

Similar content being viewed by others

Maximum likelihood estimation of the Weibull distribution with reduced bias

Cox Proportional Hazards Regression Model

hermiter: R package for sequential nonparametric estimation

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix: Proofs of the asymptotic properties of the estimators

1.1 Asymptotic properties of \(\hat{\varvec{\beta }}_{wee}\) and the proof

Theorem 1

Proof

1.2 Asymptotic properties of \(\hat{\varvec{\beta }}_{wcr}\) and the proof

Theorem 2

Proof

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Regression analysis of clustered failure time data with informative cluster size under the additive transformation models

Abstract

Access this article

Similar content being viewed by others

Maximum likelihood estimation of the Weibull distribution with reduced bias

Cox Proportional Hazards Regression Model

hermiter: R package for sequential nonparametric estimation

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix: Proofs of the asymptotic properties of the estimators

Appendix: Proofs of the asymptotic properties of the estimators

1.1 Asymptotic properties of \(\hat{\varvec{\beta }}_{wee}\) and the proof

Theorem 1

Proof

1.2 Asymptotic properties of \(\hat{\varvec{\beta }}_{wcr}\) and the proof

Theorem 2

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation