Abstract
Semiparametric accelerated failure time (AFT) models directly relate the expected failure times to covariates and are a useful alternative to models that work on the hazard function or the survival function. For case-cohort data, much less development has been done with AFT models. In addition to the missing covariates outside of the sub-cohort in controls, challenges from AFT model inferences with full cohort are retained. The regression parameter estimator is hard to compute because the most widely used rank-based estimating equations are not smooth. Further, its variance depends on the unspecified error distribution, and most methods rely on computationally intensive bootstrap to estimate it. We propose fast rank-based inference procedures for AFT models, applying recent methodological advances to the context of case-cohort data. Parameters are estimated with an induced smoothing approach that smooths the estimating functions and facilitates the numerical solution. Variance estimators are obtained through efficient resampling methods for nonsmooth estimating functions that avoids full blown bootstrap. Simulation studies suggest that the recommended procedure provides fast and valid inferences among several competing procedures. Application to a tumor study demonstrates the utility of the proposed method in routine data analysis.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Barlow, W.E.: Robust variance estimation for the case-cohort design. Biometrics 50, 1064–1072 (1994)
Breslow, N.E., Lumley, T., Ballantyne, C.M., Chambless, L.E., Kulich, M.: Improved Horvitz–Thompson estimation of model parameters from two-phase stratified samples: applications in epidemiology. Stat. Biosci. 1, 32–49 (2009)
Brown, B.M., Wang, Y.-G.: Standard errors and covariance matrices for smoothed rank estimators. Biometrika 92, 149–158 (2005)
Brown, B.M., Wang, Y.-G.: Induced smoothing for rank regression with censored survival times. Stat. Med. 26, 828–836 (2007)
Chen, H.Y.: Fitting semiparametric transformation regression models to data from a modified case-cohort design. Biometrika 88, 255–268 (2001a)
Chen, H.Y.: Weighted semiparametric likelihood method for fitting a proportional odds regression model to data from the case cohort design. J. Am. Stat. Assoc. 96, 1446–1458 (2001b)
Chiou, S., Kang, S., Yan, J.: aftgee: Accelerated failure time model with generalized estimating equations. R package version 0.2-27 (2012)
D’Angio, G.J., Breslow, N., Beckwith, J.B., Evans, A., Baum, E., Delorimier, A., Fernbach, D., Hrabovsky, E., Jones, B., Kelalis, P., Othersen, H.B., Tefft, M., Thomas, P.R.M.: Treatment of Wilms’ tumor. Results of the third national Wilms’ tumor study. Cancer 64, 349–360 (1989)
Green, D., Breslow, N., Beckwith, J., Finklestein, J., Grundy, P., Thomas, P., Kim, T., Shochat, S., Haase, G., Ritchey, M., Kelalis, P., D’Angio, G.: Comparison between single-dose and dvided-dose administration of dactinomycin and doxorubicin for patients with Wilms’ tumor: a report from the National Wilms’ Tumor Study Group. J. Clin. Oncol. 16, 237–245 (1998)
Hasselman, B.: nleqslv: Solve systems of non linear equations. R package version 1.9.3 (2012). http://CRAN.R-project.org/package=nleqslv
Huang, Y.: Calibration regression of censored lifetime medical cost. J. Am. Stat. Assoc. 97, 318–327 (2002)
Jin, Z., Lin, D.Y., Wei, L.J., Ying, Z.: Rank-based inference for the accelerated failure time model. Biometrika 90, 341–353 (2003)
Johnson, L.M., Strawderman, R.L.: Induced smoothing for the semiparametric accelerated failure time model: asymptotics and extensions to clustered data. Biometrika 96, 577–590 (2009)
Kalbfleisch, J.D., Lawless, J.F.: Likelihood analysis of multistate models for disease incidence and mortality. Stat. Med. 7, 149–160 (1988)
Kang, S., Cai, J.: Marginal hazards model for case-cohort studies with multiple disease outcomes. Biometrika 96, 887–901 (2009)
Kong, L., Cai, J.: Case-cohort analysis with accelerated failure time model. Biometrics 65, 135–142 (2009)
Kong, L., Cai, J., Sen, P.K.: Weighted estimating equations for semiparametric transformation models with censroed data from a case-cohort design. Biometrika 91, 305–319 (2004)
Kulich, M., Lin, D.: Additive hazards regression for case-cohort studies. Biometrika 87, 73–87 (2000)
Kulich, M., Lin, D.: Improving the efficiency of relative-risk estimation in case-cohort studies. J. Am. Stat. Assoc. 99, 832–844 (2004)
Lin, D.Y., Ying, Z.: Cox regression with incomplete covariate measurements. J. Am. Stat. Assoc. 88, 1341–1349 (1993)
Lu, W., Tsiatis, A.A.: Semiparametric transformation models for the case-cohort study. Biometrika 93, 207–214 (2006)
Nan, B., Yu, M., Kalbfleisch, J.D.: Censored linear regression for case-cohort studies. Biometrika 93, 747–762 (2006)
Nan, B., Kalbfleisch, J.D., Yu, M.: Asymptotic theory for the semiparametric accelerated failure time model with missing data. Ann. Stat. 37, 2351–2376 (2009)
Prentice, R.L.: Linear rank tests with right censored data (Corr: V70 p304). Biometrika 65, 167–180 (1978)
Prentice, R.L.: A case-cohort design for epidemiologic cohort studies and disease prevention trials. Biometrika 73, 1–11 (1986)
Self, S.G., Prentice, R.L.: Asymptotic distribution theory and efficiency results for case-cohort studies. Ann. Stat. 16, 64–81 (1988)
Sun, J., Sun, L., Flournoy, N.: Additive hazards model for competing risks analysis of the case-cohort design. Commun. Stat., Theory Methods 33, 351–366 (2004)
Therneau, T.M., Li, H.: Computing the cox model for case cohort designs. Lifetime Data Anal. 5, 99–112 (1999)
Tsiatis, A.A.: Estimating regression parameters using linear rank tests for censored data. Ann. Stat. 18, 354–372 (1990)
Varadhan, R., Gilbert, P.: BB: an R package for solving a large system of nonlinear equations and for optimizing a high-dimensional nonlinear objective function. J. Stat. Softw. 32, 1–26 (2009). http://www.jstatsoft.org/v32/i04/
Wacholder, S., Gail, M.H., Pee, D., Brookmeyer, R.: Alternative variance and efficiency calculations for the case-cohort design. Biometrika 76, 117–123 (1989)
Wang, Y.-G., Fu, L.: Rank regression for the accelerated failure time model with clustered and censored data. Comput. Stat. Data Anal. 55, 2334–2343 (2011)
Ying, Z.: A large sample study of rank estimation for censored regression data. Ann. Stat. 21, 76–99 (1993)
Yu, M.: Buckley-James type estimator in censored data with covariates missing by design. Scand. J. Stat. 38, 252–267 (2011)
Yu, Q., Wong, G.Y.C., Yu, M.: Buckley-James-type of estimators under the classical case cohort design. Ann. Inst. Stat. Math. 59, 675–695 (2007)
Zeng, D., Lin, D.Y.: Efficient resampling methods for nonsmooth estimating functions. Biostatistics 9, 355–363 (2008)
Author information
Authors and Affiliations
Corresponding author
Appendix: Analytical details
Appendix: Analytical details
We give the analytical form of S i (β)’s here. Define the general rank based weighted estimating function (Jin et al. 2003)
where φ n,i (β) is an nonnegative weight function and
Equation (1) can be obtained by setting \(\varphi_{n,i}(\beta) = W^{(0)}_{n,i}(\beta)\). On the other hand, the general rank based weighted estimating function for case-cohort samples has the following form:
where
Similarly, Eq. (2) can be obtained by setting \(\varphi_{n,i}(\beta) = \hat{W}^{(0)}_{n,i}(\beta)\).
With these settings, an explicit form of S i (β 0) is
where
N i (β;t)=Δ i I(e i (β)≤t) and λ(u) is the common hazard function of ϵ i .
The unknown quantities in S i (β 0) include β 0, w (0), w (1) and λ(t). With the explicit form of S i (β 0), \(\hat{S}_{i}(\hat{\beta})\) is obtained by replacing these unknown quantities by their sample estimators.
Rights and permissions
About this article
Cite this article
Chiou, S.H., Kang, S. & Yan, J. Fast accelerated failure time modeling for case-cohort data. Stat Comput 24, 559–568 (2014). https://doi.org/10.1007/s11222-013-9388-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-013-9388-2