Fast accelerated failure time modeling for case-cohort data

Abstract

Semiparametric accelerated failure time (AFT) models directly relate the expected failure times to covariates and are a useful alternative to models that work on the hazard function or the survival function. For case-cohort data, much less development has been done with AFT models. In addition to the missing covariates outside of the sub-cohort in controls, challenges from AFT model inferences with full cohort are retained. The regression parameter estimator is hard to compute because the most widely used rank-based estimating equations are not smooth. Further, its variance depends on the unspecified error distribution, and most methods rely on computationally intensive bootstrap to estimate it. We propose fast rank-based inference procedures for AFT models, applying recent methodological advances to the context of case-cohort data. Parameters are estimated with an induced smoothing approach that smooths the estimating functions and facilitates the numerical solution. Variance estimators are obtained through efficient resampling methods for nonsmooth estimating functions that avoids full blown bootstrap. Simulation studies suggest that the recommended procedure provides fast and valid inferences among several competing procedures. Application to a tumor study demonstrates the utility of the proposed method in routine data analysis.

This is a preview of subscription content, log in to check access.

References

  1. Barlow, W.E.: Robust variance estimation for the case-cohort design. Biometrics 50, 1064–1072 (1994)

    Article  MATH  Google Scholar 

  2. Breslow, N.E., Lumley, T., Ballantyne, C.M., Chambless, L.E., Kulich, M.: Improved Horvitz–Thompson estimation of model parameters from two-phase stratified samples: applications in epidemiology. Stat. Biosci. 1, 32–49 (2009)

    Article  Google Scholar 

  3. Brown, B.M., Wang, Y.-G.: Standard errors and covariance matrices for smoothed rank estimators. Biometrika 92, 149–158 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  4. Brown, B.M., Wang, Y.-G.: Induced smoothing for rank regression with censored survival times. Stat. Med. 26, 828–836 (2007)

    Article  MathSciNet  Google Scholar 

  5. Chen, H.Y.: Fitting semiparametric transformation regression models to data from a modified case-cohort design. Biometrika 88, 255–268 (2001a)

    Article  MATH  MathSciNet  Google Scholar 

  6. Chen, H.Y.: Weighted semiparametric likelihood method for fitting a proportional odds regression model to data from the case cohort design. J. Am. Stat. Assoc. 96, 1446–1458 (2001b)

    Article  MATH  Google Scholar 

  7. Chiou, S., Kang, S., Yan, J.: aftgee: Accelerated failure time model with generalized estimating equations. R package version 0.2-27 (2012)

  8. D’Angio, G.J., Breslow, N., Beckwith, J.B., Evans, A., Baum, E., Delorimier, A., Fernbach, D., Hrabovsky, E., Jones, B., Kelalis, P., Othersen, H.B., Tefft, M., Thomas, P.R.M.: Treatment of Wilms’ tumor. Results of the third national Wilms’ tumor study. Cancer 64, 349–360 (1989)

    Article  Google Scholar 

  9. Green, D., Breslow, N., Beckwith, J., Finklestein, J., Grundy, P., Thomas, P., Kim, T., Shochat, S., Haase, G., Ritchey, M., Kelalis, P., D’Angio, G.: Comparison between single-dose and dvided-dose administration of dactinomycin and doxorubicin for patients with Wilms’ tumor: a report from the National Wilms’ Tumor Study Group. J. Clin. Oncol. 16, 237–245 (1998)

    Google Scholar 

  10. Hasselman, B.: nleqslv: Solve systems of non linear equations. R package version 1.9.3 (2012). http://CRAN.R-project.org/package=nleqslv

  11. Huang, Y.: Calibration regression of censored lifetime medical cost. J. Am. Stat. Assoc. 97, 318–327 (2002)

    Article  MATH  Google Scholar 

  12. Jin, Z., Lin, D.Y., Wei, L.J., Ying, Z.: Rank-based inference for the accelerated failure time model. Biometrika 90, 341–353 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  13. Johnson, L.M., Strawderman, R.L.: Induced smoothing for the semiparametric accelerated failure time model: asymptotics and extensions to clustered data. Biometrika 96, 577–590 (2009)

    Article  MATH  MathSciNet  Google Scholar 

  14. Kalbfleisch, J.D., Lawless, J.F.: Likelihood analysis of multistate models for disease incidence and mortality. Stat. Med. 7, 149–160 (1988)

    Article  Google Scholar 

  15. Kang, S., Cai, J.: Marginal hazards model for case-cohort studies with multiple disease outcomes. Biometrika 96, 887–901 (2009)

    Article  MATH  MathSciNet  Google Scholar 

  16. Kong, L., Cai, J.: Case-cohort analysis with accelerated failure time model. Biometrics 65, 135–142 (2009)

    Article  MATH  MathSciNet  Google Scholar 

  17. Kong, L., Cai, J., Sen, P.K.: Weighted estimating equations for semiparametric transformation models with censroed data from a case-cohort design. Biometrika 91, 305–319 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  18. Kulich, M., Lin, D.: Additive hazards regression for case-cohort studies. Biometrika 87, 73–87 (2000)

    Article  MATH  MathSciNet  Google Scholar 

  19. Kulich, M., Lin, D.: Improving the efficiency of relative-risk estimation in case-cohort studies. J. Am. Stat. Assoc. 99, 832–844 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  20. Lin, D.Y., Ying, Z.: Cox regression with incomplete covariate measurements. J. Am. Stat. Assoc. 88, 1341–1349 (1993)

    Article  MATH  MathSciNet  Google Scholar 

  21. Lu, W., Tsiatis, A.A.: Semiparametric transformation models for the case-cohort study. Biometrika 93, 207–214 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  22. Nan, B., Yu, M., Kalbfleisch, J.D.: Censored linear regression for case-cohort studies. Biometrika 93, 747–762 (2006)

    Article  MathSciNet  Google Scholar 

  23. Nan, B., Kalbfleisch, J.D., Yu, M.: Asymptotic theory for the semiparametric accelerated failure time model with missing data. Ann. Stat. 37, 2351–2376 (2009)

    Article  MATH  MathSciNet  Google Scholar 

  24. Prentice, R.L.: Linear rank tests with right censored data (Corr: V70 p304). Biometrika 65, 167–180 (1978)

    Article  MATH  MathSciNet  Google Scholar 

  25. Prentice, R.L.: A case-cohort design for epidemiologic cohort studies and disease prevention trials. Biometrika 73, 1–11 (1986)

    Article  MATH  MathSciNet  Google Scholar 

  26. Self, S.G., Prentice, R.L.: Asymptotic distribution theory and efficiency results for case-cohort studies. Ann. Stat. 16, 64–81 (1988)

    Article  MATH  MathSciNet  Google Scholar 

  27. Sun, J., Sun, L., Flournoy, N.: Additive hazards model for competing risks analysis of the case-cohort design. Commun. Stat., Theory Methods 33, 351–366 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  28. Therneau, T.M., Li, H.: Computing the cox model for case cohort designs. Lifetime Data Anal. 5, 99–112 (1999)

    Article  MATH  Google Scholar 

  29. Tsiatis, A.A.: Estimating regression parameters using linear rank tests for censored data. Ann. Stat. 18, 354–372 (1990)

    Article  MATH  MathSciNet  Google Scholar 

  30. Varadhan, R., Gilbert, P.: BB: an R package for solving a large system of nonlinear equations and for optimizing a high-dimensional nonlinear objective function. J. Stat. Softw. 32, 1–26 (2009). http://www.jstatsoft.org/v32/i04/

    Google Scholar 

  31. Wacholder, S., Gail, M.H., Pee, D., Brookmeyer, R.: Alternative variance and efficiency calculations for the case-cohort design. Biometrika 76, 117–123 (1989)

    Article  MATH  MathSciNet  Google Scholar 

  32. Wang, Y.-G., Fu, L.: Rank regression for the accelerated failure time model with clustered and censored data. Comput. Stat. Data Anal. 55, 2334–2343 (2011)

    Article  MathSciNet  Google Scholar 

  33. Ying, Z.: A large sample study of rank estimation for censored regression data. Ann. Stat. 21, 76–99 (1993)

    Article  MATH  Google Scholar 

  34. Yu, M.: Buckley-James type estimator in censored data with covariates missing by design. Scand. J. Stat. 38, 252–267 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  35. Yu, Q., Wong, G.Y.C., Yu, M.: Buckley-James-type of estimators under the classical case cohort design. Ann. Inst. Stat. Math. 59, 675–695 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  36. Zeng, D., Lin, D.Y.: Efficient resampling methods for nonsmooth estimating functions. Biostatistics 9, 355–363 (2008)

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Jun Yan.

Appendix: Analytical details

Appendix: Analytical details

We give the analytical form of S i (β)’s here. Define the general rank based weighted estimating function (Jin et al. 2003)

$$ U_n(\beta)=\sum_{i=1}^n \Delta_i \varphi_{n,i}(\beta) \biggl[ X_i- \frac{W^{(1)}_{n,i}(\beta)}{W^{(0)}_{n,i}(\beta)} \biggr], $$

where φ n,i (β) is an nonnegative weight function and

$$ W^{(k)}_{n,i}(\beta)=\frac{1}{n}\sum _{j=1}^n X_j^k I \bigl[e_j(\beta) \geq e_i(\beta)\bigr], \quad k = 0,1. $$

Equation (1) can be obtained by setting \(\varphi_{n,i}(\beta) = W^{(0)}_{n,i}(\beta)\). On the other hand, the general rank based weighted estimating function for case-cohort samples has the following form:

$$ U_n^c(\beta) = \sum_{i=1}^n \Delta_i\varphi_{n,i}(\beta) \biggl[X_i- \frac{\hat{W}^{(1)}_{n, i}(\beta)}{\hat{W}^{(0)}_{n, i}(\beta)} \biggr], $$

where

$$ \hat{W}^{(k)}_{n, i}(\beta)=\frac{1}{n} \sum _{j=1}^n h_j X_j^k I\bigl[e_j(\beta)\geq e_i(\beta)\bigr], \quad k = 0,1. $$

Similarly, Eq. (2) can be obtained by setting \(\varphi_{n,i}(\beta) = \hat{W}^{(0)}_{n,i}(\beta)\).

With these settings, an explicit form of S i (β 0) is

where

N i (β;t)=Δ i I(e i (β)≤t) and λ(u) is the common hazard function of ϵ i .

The unknown quantities in S i (β 0) include β 0, w (0), w (1) and λ(t). With the explicit form of S i (β 0), \(\hat{S}_{i}(\hat{\beta})\) is obtained by replacing these unknown quantities by their sample estimators.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Chiou, S.H., Kang, S. & Yan, J. Fast accelerated failure time modeling for case-cohort data. Stat Comput 24, 559–568 (2014). https://doi.org/10.1007/s11222-013-9388-2

Download citation

Keywords

  • Induced smoothing
  • Multiplier bootstrap
  • Resampling