Abstract
Survival modeling with time-varying coefficients has proven useful in analyzing time-to-event data with one or more distinct failure types. When studying the cause-specific etiology of breast and prostate cancers using the large-scale data from the Surveillance, Epidemiology, and End Results (SEER) Program, we encountered two major challenges that existing methods for estimating time-varying coefficients cannot tackle. First, these methods, dependent on expanding the original data in a repeated measurement format, result in formidable time and memory consumption as the sample size escalates to over one million. In this case, even a well-configured workstation cannot accommodate their implementations. Second, when the large-scale data under analysis include binary predictors with near-zero variance (e.g., only 0.6% of patients in our SEER prostate cancer data had tumors regional to the lymph nodes), existing methods suffer from numerical instability due to ill-conditioned second-order information. The estimation accuracy deteriorates further with multiple competing risks. To address these issues, we propose a proximal Newton algorithm with a shared-memory parallelization scheme and tests of significance and nonproportionality for the time-varying effects. A simulation study shows that our scalable approach reduces the time and memory costs by orders of magnitude and enjoys improved estimation accuracy compared with alternative approaches. Applications to the SEER cancer data demonstrate the real-world performance of the proximal Newton algorithm.
Similar content being viewed by others
References
Armijo L (1966) Minimization of functions having Lipschitz continuous first partial derivatives. Pac J Math 16(1):1–3
Baulies S, Belin L, Mallon P, Senechal C, Pierga J, Cottu P, Sablin M, Sastre X, Asselain B, Rouzier R et al (2015) Time-varying effect and long-term survival analysis in breast cancer patients treated with neoadjuvant chemotherapy. Br J Cancer 113(1):30–36
Bellera CA, MacGrogan G, Debled M, de Lara CT, Brouste V, Mathoulin-Pélissier S (2010) Variables with time-varying effects and the Cox model: some statistical concepts illustrated with a prognostic factor study in breast cancer. BMC Med Res Methodol 10(1):1–12
Beyersmann J, Latouche A, Buchholz A, Schumacher M (2009) Simulating competing risks data in survival analysis. Stat Med 28(6):956–971
Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, Cambridge
Brouwer AF, He K, Chinn SB, Mondul AM, Chapman CH, Ryser MD, Banerjee M, Eisenberg MC, Meza R, Taylor JMG (2020) Time-varying survival effects for squamous cell carcinomas at oropharyngeal and nonoropharyngeal head and neck sites in the United States, 1973–2015. Cancer 126(23):5137–5146
Casanova H, Legrand A, Robert Y (2008) Parallel algorithms. CRC Press, Boca Raton
de Boor C (2001) A practical guide to splines, Revised. Springer, Berlin
de Mutsert R, Snijder MB, van der Sman-de Beer F, Seidell JC, Boeschoten EW, Krediet RT, Dekker JM, Vandenbroucke JP, Dekker FW et al (2007) Association between body mass index and mortality is similar in the hemodialysis population and the general population at high age and equal duration of follow-up. J Am Soc Nephrol 18(3):967–974
Dekker FW, de Mutsert R, Van Dijk PC, Zoccali C, Jager KJ (2008) Survival analysis: time-dependent effects and time-varying risk factors. Kidney Int 74(8):994–997
Do T-N, Poulet F (2015) Parallel multiclass logistic regression for classifying large scale image datasets. In: Le Thi H, Nguyen N, Do T (eds) Advanced computational methods for knowledge engineering. Springer, Cham, pp 255–266
Eddelbuettel D (2021) CRAN task view: high-performance and parallel computing with R. https://cran.r-project.org/web/views/HighPerformanceComputing.html. Accessed 2021-01-26
Eddelbuettel D, Balamuta JJ (2018) Extending R with C++: a brief introduction to Rcpp. Am Stat 72(1):28–36
Eddelbuettel D, François R (2011) Rcpp: seamless R and C++ integration. J Stat Softw 40(8):1–18
Eddelbuettel D, Sanderson C (2014) RcppArmadillo: accelerating R with high-performance C++ linear algebra. Comput Stat Data Anal 71:1054–1063
Goldstein AA (1967) Constructive real analysis. Harper & Row, New York
Goudie RJ, Turner RM, De Angelis D, Thomas A (2020) MultiBUGS: a parallel implementation of the BUGS modelling framework for faster Bayesian inference. J Stat Softw 95(7):1–20
Grambsch PM, Therneau TM (1994) Proportional hazards tests and diagnostics based on weighted residuals. Biometrika 81(3):515–526
Gray RJ (1992) Flexible methods for analyzing survival data using splines, with applications to breast cancer prognosis. J Am Stat Assoc 87(420):942–951
Gray RJ (1994) Spline-based tests in survival analysis. Biometrics 50(3):640–652
Hastie T, Tibshirani R (1993) Varying-coefficient models. J Roy Stat Soc B 55(4):757–779
He K, Yang Y, Li Y, Zhu J, Li Y (2017) Modeling time-varying effects with large-scale survival data: an efficient quasi-newton approach. J Comput Graph Stat 26(3):635–645
He K, Zhu J, Kang J, Li Y (2021) Stratified cox models with time-varying effects for national kidney transplant patients: a new block-wise steepest ascent method. Biometrics. https://doi.org/10.1111/biom.13473
Hester J, Schmidt D (2020) bench: high precision timing of R expressions. https://cran.r-project.org/package=bench. R package version 1.1.1
Jyothi R, Babu P. (2020) Piano: a fast parallel iterative algorithm for multinomial and sparse multinomial logistic regression. https://arxiv.org/abs/2002.09133. Accessed 2021-09-14
Kalantar-Zadeh K (2005) Causes and consequences of the reverse epidemiology of body mass index in dialysis patients. J Ren Nutr 15(1):142–147
Kalantar-Zadeh K, Block G, Humphreys MH, Kopple JD (2003) Reverse epidemiology of cardiovascular risk factors in maintenance dialysis patients. Kidney Int 63(3):793–808
Kalbfleisch JD, Prentice RL (2002) The statistical analysis of failure time data, 2nd edn. Wiley, New York
Lange K (2013) Optimization, 2nd edn. Springer, Berlin
Lee JD, Sun Y, Saunders M (2012) Proximal Newton-type methods for convex optimization. Adv Neural Inf Process Syst 25:827–835
Lee JD, Sun Y, Saunders MA (2014) Proximal Newton-type methods for minimizing composite functions. SIAM J Optim 24(3):1420–1443
Levenberg K (1944) A method for the solution of certain non-linear problems in least squares. Q Appl Math 2(2):164–168
Lu C-L, Wang S, Ji Z, Wu Y, Xiong L, Jiang X, Ohno-Machado L (2015) WebDISCO: a web service for distributed Cox model learning without patient-level data sharing. J Am Med Inform Assoc 22(6):1212–1219
Marquardt DW (1963) An algorithm for least-squares estimation of nonlinear parameters. J Soc Ind Appl Math 11(2):431–441
Nocedal J, Wright S (2006) Numerical optimization. Springer, Berlin
Parikh N, Boyd S (2014) Proximal algorithms. Found Trends Optim 1(3):127–239
Peng H, Liang D, Choi C (2013) Evaluating parallel logistic regression models. In: 2013 IEEE international conference on big data. IEEE, pp 119–126
Perperoglou A, le Cessie S, van Houwelingen HC (2006) A fast routine for fitting Cox models with time varying effects of the covariates. Comput Methods Programs Biomed 81(2):154–161
Rockafellar RT (1970) Convex analysis. Princeton University Press, Princeton
Surveillance, Epidemiology, and End Results Program (2017) Incidence - SEER 9 Regs research data, Nov 2017 Sub (1973–2015) \(<\)Katrina/Rita Population Adjustment\(>\). https://seer.cancer.gov/data-software/documentation/seerstat/nov2017. Accessed 2021-1-26
Surveillance, Epidemiology, and End Results Program (2019) Incidence - SEER Research Data, 18 Registries, Nov 2019 Sub (2000–2017). https://seer.cancer.gov/data-software/documentation/seerstat/nov2019. Accessed 2021-1-26
Therneau T, Crowson C, Atkinson E (2020) Using time dependent covariates and time dependent coefficients in the Cox model. https://cran.r-project.org/web/packages/survival/vignettes/timedep.pdf. Accessed 2021-01-26
Therneau TM (2020) A package for survival analysis in R. R package version 3.2-7
Therneau TM, Grambsch PM (2000) Modeling survival data: extending the Cox model. Springer, Berlin
Thior I, Lockman S, Smeaton LM, Shapiro RL, Wester C, Heymann SJ, Gilbert PB, Stevens L, Peter T, Kim S et al (2006) Breastfeeding plus infant zidovudine prophylaxis for 6 months vs formula feeding plus infant zidovudine for 1 month to reduce mother-to-child HIV transmission in Botswana. JAMA 296(7):794–805
Tutz G, Binder H (2004) Flexible modelling of discrete failure time including time-varying smooth effects. Stat Med 23(15):2445–2461
Verweij PJM, van Houwelingen HC (1995) Time-dependent effects of fixed covariates in Cox regression. Biometrics 51(4):1550–1556
Wolfe RA, Ashby VB, Milford EL, Ojo AO, Ettenger RE, Agodoa LY, Held PJ, Port FK (1999) Comparison of mortality in all patients on dialysis, patients on dialysis awaiting transplantation, and recipients of a first cadaveric transplant. N Engl J Med 341(23):1725–1730
Wright MN, Ziegler A (2017) Ranger: a fast implementation of random forests for high dimensional data in C++ and R. J Stat Softw 95(7):1–17
Yan J, Huang J (2012) Model selection for Cox models with time-varying coefficients. Biometrics 68(2):419–428
Yang Y (2020) Novel methods for estimation and inference in varying coefficient models. PhD thesis, University of Michigan, ProQuest LLC, Ann Arbor, pp 48106–1346. https://deepblue.lib.umich.edu/bitstream/handle/2027.42/163251/yuanyang_1.pdf?sequence=1
Zucker DM, Karr AF (1990) Nonparametric survival analysis with time-dependent covariate effects: a penalized partial likelihood approach. Ann Stat 18(1):329–353
Acknowledgements
The authors would like to thank Dr. Kirsten F. Herold (University of Michigan), the Associate Editor and two referees for helpful comments on the manuscript. This work was partially supported by The University of Michigan Office of Research, the University of Michigan Rogel Cancer Center (Project Number P30CA046592) and the National Institutes of Health (Grant Number UL1TR002240).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have no conflicts of interest to declare that are relevant to the content of this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Appendix
Appendix
This appendix is devoted to the derivation of the gradient \(\nabla \ell _j(\varvec{\gamma }_j)\) and Hessian matrix \(\nabla ^2 \ell _j(\varvec{\gamma }_j)\) of \(\ell _j(\varvec{\gamma }_j)\) as in (4). We define
where for a vector \(\mathbf {v}\in {\mathbb {R}}^p\), \(\mathbf {v}^{\odot 0} :=1\), \(\mathbf {v}^{\odot 1} :=\mathbf {v}\), and \(\mathbf {v}^{\odot 2} :=\mathbf {v}\mathbf {v}^\top \). The gradient \(\nabla \ell _j(\varvec{\gamma }_j)\) and Hessian \(\nabla ^2\ell _j(\varvec{\gamma }_j)\) of \(\ell _j(\varvec{\gamma }_j)\) are hence given by
in which
Rights and permissions
About this article
Cite this article
Wu, W., Taylor, J.M.G., Brouwer, A.F. et al. Scalable proximal methods for cause-specific hazard modeling with time-varying coefficients. Lifetime Data Anal 28, 194–218 (2022). https://doi.org/10.1007/s10985-021-09544-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10985-021-09544-2