Skip to main content

Scalable proximal methods for cause-specific hazard modeling with time-varying coefficients

Abstract

Survival modeling with time-varying coefficients has proven useful in analyzing time-to-event data with one or more distinct failure types. When studying the cause-specific etiology of breast and prostate cancers using the large-scale data from the Surveillance, Epidemiology, and End Results (SEER) Program, we encountered two major challenges that existing methods for estimating time-varying coefficients cannot tackle. First, these methods, dependent on expanding the original data in a repeated measurement format, result in formidable time and memory consumption as the sample size escalates to over one million. In this case, even a well-configured workstation cannot accommodate their implementations. Second, when the large-scale data under analysis include binary predictors with near-zero variance (e.g., only 0.6% of patients in our SEER prostate cancer data had tumors regional to the lymph nodes), existing methods suffer from numerical instability due to ill-conditioned second-order information. The estimation accuracy deteriorates further with multiple competing risks. To address these issues, we propose a proximal Newton algorithm with a shared-memory parallelization scheme and tests of significance and nonproportionality for the time-varying effects. A simulation study shows that our scalable approach reduces the time and memory costs by orders of magnitude and enjoys improved estimation accuracy compared with alternative approaches. Applications to the SEER cancer data demonstrate the real-world performance of the proximal Newton algorithm.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

References

  • Armijo L (1966) Minimization of functions having Lipschitz continuous first partial derivatives. Pac J Math 16(1):1–3

    MathSciNet  MATH  Google Scholar 

  • Baulies S, Belin L, Mallon P, Senechal C, Pierga J, Cottu P, Sablin M, Sastre X, Asselain B, Rouzier R et al (2015) Time-varying effect and long-term survival analysis in breast cancer patients treated with neoadjuvant chemotherapy. Br J Cancer 113(1):30–36

    Google Scholar 

  • Bellera CA, MacGrogan G, Debled M, de Lara CT, Brouste V, Mathoulin-Pélissier S (2010) Variables with time-varying effects and the Cox model: some statistical concepts illustrated with a prognostic factor study in breast cancer. BMC Med Res Methodol 10(1):1–12

    Google Scholar 

  • Beyersmann J, Latouche A, Buchholz A, Schumacher M (2009) Simulating competing risks data in survival analysis. Stat Med 28(6):956–971

    MathSciNet  Google Scholar 

  • Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  • Brouwer AF, He K, Chinn SB, Mondul AM, Chapman CH, Ryser MD, Banerjee M, Eisenberg MC, Meza R, Taylor JMG (2020) Time-varying survival effects for squamous cell carcinomas at oropharyngeal and nonoropharyngeal head and neck sites in the United States, 1973–2015. Cancer 126(23):5137–5146

    Google Scholar 

  • Casanova H, Legrand A, Robert Y (2008) Parallel algorithms. CRC Press, Boca Raton

    MATH  Google Scholar 

  • de Boor C (2001) A practical guide to splines, Revised. Springer, Berlin

    MATH  Google Scholar 

  • de Mutsert R, Snijder MB, van der Sman-de Beer F, Seidell JC, Boeschoten EW, Krediet RT, Dekker JM, Vandenbroucke JP, Dekker FW et al (2007) Association between body mass index and mortality is similar in the hemodialysis population and the general population at high age and equal duration of follow-up. J Am Soc Nephrol 18(3):967–974

    Google Scholar 

  • Dekker FW, de Mutsert R, Van Dijk PC, Zoccali C, Jager KJ (2008) Survival analysis: time-dependent effects and time-varying risk factors. Kidney Int 74(8):994–997

    Google Scholar 

  • Do T-N, Poulet F (2015) Parallel multiclass logistic regression for classifying large scale image datasets. In: Le Thi H, Nguyen N, Do T (eds) Advanced computational methods for knowledge engineering. Springer, Cham, pp 255–266

    Google Scholar 

  • Eddelbuettel D (2021) CRAN task view: high-performance and parallel computing with R. https://cran.r-project.org/web/views/HighPerformanceComputing.html. Accessed 2021-01-26

  • Eddelbuettel D, Balamuta JJ (2018) Extending R with C++: a brief introduction to Rcpp. Am Stat 72(1):28–36

    MathSciNet  Google Scholar 

  • Eddelbuettel D, François R (2011) Rcpp: seamless R and C++ integration. J Stat Softw 40(8):1–18

    Google Scholar 

  • Eddelbuettel D, Sanderson C (2014) RcppArmadillo: accelerating R with high-performance C++ linear algebra. Comput Stat Data Anal 71:1054–1063

    MathSciNet  MATH  Google Scholar 

  • Goldstein AA (1967) Constructive real analysis. Harper & Row, New York

    MATH  Google Scholar 

  • Goudie RJ, Turner RM, De Angelis D, Thomas A (2020) MultiBUGS: a parallel implementation of the BUGS modelling framework for faster Bayesian inference. J Stat Softw 95(7):1–20

    Google Scholar 

  • Grambsch PM, Therneau TM (1994) Proportional hazards tests and diagnostics based on weighted residuals. Biometrika 81(3):515–526

    MathSciNet  MATH  Google Scholar 

  • Gray RJ (1992) Flexible methods for analyzing survival data using splines, with applications to breast cancer prognosis. J Am Stat Assoc 87(420):942–951

    Google Scholar 

  • Gray RJ (1994) Spline-based tests in survival analysis. Biometrics 50(3):640–652

    MathSciNet  MATH  Google Scholar 

  • Hastie T, Tibshirani R (1993) Varying-coefficient models. J Roy Stat Soc B 55(4):757–779

    MathSciNet  MATH  Google Scholar 

  • He K, Yang Y, Li Y, Zhu J, Li Y (2017) Modeling time-varying effects with large-scale survival data: an efficient quasi-newton approach. J Comput Graph Stat 26(3):635–645

    MathSciNet  Google Scholar 

  • He K, Zhu J, Kang J, Li Y (2021) Stratified cox models with time-varying effects for national kidney transplant patients: a new block-wise steepest ascent method. Biometrics. https://doi.org/10.1111/biom.13473

  • Hester J, Schmidt D (2020) bench: high precision timing of R expressions. https://cran.r-project.org/package=bench. R package version 1.1.1

  • Jyothi R, Babu P. (2020) Piano: a fast parallel iterative algorithm for multinomial and sparse multinomial logistic regression. https://arxiv.org/abs/2002.09133. Accessed 2021-09-14

  • Kalantar-Zadeh K (2005) Causes and consequences of the reverse epidemiology of body mass index in dialysis patients. J Ren Nutr 15(1):142–147

    Google Scholar 

  • Kalantar-Zadeh K, Block G, Humphreys MH, Kopple JD (2003) Reverse epidemiology of cardiovascular risk factors in maintenance dialysis patients. Kidney Int 63(3):793–808

    Google Scholar 

  • Kalbfleisch JD, Prentice RL (2002) The statistical analysis of failure time data, 2nd edn. Wiley, New York

    MATH  Google Scholar 

  • Lange K (2013) Optimization, 2nd edn. Springer, Berlin

    MATH  Google Scholar 

  • Lee JD, Sun Y, Saunders M (2012) Proximal Newton-type methods for convex optimization. Adv Neural Inf Process Syst 25:827–835

    Google Scholar 

  • Lee JD, Sun Y, Saunders MA (2014) Proximal Newton-type methods for minimizing composite functions. SIAM J Optim 24(3):1420–1443

    MathSciNet  MATH  Google Scholar 

  • Levenberg K (1944) A method for the solution of certain non-linear problems in least squares. Q Appl Math 2(2):164–168

    MathSciNet  MATH  Google Scholar 

  • Lu C-L, Wang S, Ji Z, Wu Y, Xiong L, Jiang X, Ohno-Machado L (2015) WebDISCO: a web service for distributed Cox model learning without patient-level data sharing. J Am Med Inform Assoc 22(6):1212–1219

    Google Scholar 

  • Marquardt DW (1963) An algorithm for least-squares estimation of nonlinear parameters. J Soc Ind Appl Math 11(2):431–441

    MathSciNet  MATH  Google Scholar 

  • Nocedal J, Wright S (2006) Numerical optimization. Springer, Berlin

    MATH  Google Scholar 

  • Parikh N, Boyd S (2014) Proximal algorithms. Found Trends Optim 1(3):127–239

    Google Scholar 

  • Peng H, Liang D, Choi C (2013) Evaluating parallel logistic regression models. In: 2013 IEEE international conference on big data. IEEE, pp 119–126

  • Perperoglou A, le Cessie S, van Houwelingen HC (2006) A fast routine for fitting Cox models with time varying effects of the covariates. Comput Methods Programs Biomed 81(2):154–161

    Google Scholar 

  • Rockafellar RT (1970) Convex analysis. Princeton University Press, Princeton

    MATH  Google Scholar 

  • Surveillance, Epidemiology, and End Results Program (2017) Incidence - SEER 9 Regs research data, Nov 2017 Sub (1973–2015) \(<\)Katrina/Rita Population Adjustment\(>\). https://seer.cancer.gov/data-software/documentation/seerstat/nov2017. Accessed 2021-1-26

  • Surveillance, Epidemiology, and End Results Program (2019) Incidence - SEER Research Data, 18 Registries, Nov 2019 Sub (2000–2017). https://seer.cancer.gov/data-software/documentation/seerstat/nov2019. Accessed 2021-1-26

  • Therneau T, Crowson C, Atkinson E (2020) Using time dependent covariates and time dependent coefficients in the Cox model. https://cran.r-project.org/web/packages/survival/vignettes/timedep.pdf. Accessed 2021-01-26

  • Therneau TM (2020) A package for survival analysis in R. R package version 3.2-7

  • Therneau TM, Grambsch PM (2000) Modeling survival data: extending the Cox model. Springer, Berlin

    MATH  Google Scholar 

  • Thior I, Lockman S, Smeaton LM, Shapiro RL, Wester C, Heymann SJ, Gilbert PB, Stevens L, Peter T, Kim S et al (2006) Breastfeeding plus infant zidovudine prophylaxis for 6 months vs formula feeding plus infant zidovudine for 1 month to reduce mother-to-child HIV transmission in Botswana. JAMA 296(7):794–805

    Google Scholar 

  • Tutz G, Binder H (2004) Flexible modelling of discrete failure time including time-varying smooth effects. Stat Med 23(15):2445–2461

    Google Scholar 

  • Verweij PJM, van Houwelingen HC (1995) Time-dependent effects of fixed covariates in Cox regression. Biometrics 51(4):1550–1556

    MATH  Google Scholar 

  • Wolfe RA, Ashby VB, Milford EL, Ojo AO, Ettenger RE, Agodoa LY, Held PJ, Port FK (1999) Comparison of mortality in all patients on dialysis, patients on dialysis awaiting transplantation, and recipients of a first cadaveric transplant. N Engl J Med 341(23):1725–1730

    Google Scholar 

  • Wright MN, Ziegler A (2017) Ranger: a fast implementation of random forests for high dimensional data in C++ and R. J Stat Softw 95(7):1–17

    Google Scholar 

  • Yan J, Huang J (2012) Model selection for Cox models with time-varying coefficients. Biometrics 68(2):419–428

    MathSciNet  MATH  Google Scholar 

  • Yang Y (2020) Novel methods for estimation and inference in varying coefficient models. PhD thesis, University of Michigan, ProQuest LLC, Ann Arbor, pp 48106–1346. https://deepblue.lib.umich.edu/bitstream/handle/2027.42/163251/yuanyang_1.pdf?sequence=1

  • Zucker DM, Karr AF (1990) Nonparametric survival analysis with time-dependent covariate effects: a penalized partial likelihood approach. Ann Stat 18(1):329–353

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

The authors would like to thank Dr. Kirsten F. Herold (University of Michigan), the Associate Editor and two referees for helpful comments on the manuscript. This work was partially supported by The University of Michigan Office of Research, the University of Michigan Rogel Cancer Center (Project Number P30CA046592) and the National Institutes of Health (Grant Number UL1TR002240).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kevin He.

Ethics declarations

Conflict of interest

The authors have no conflicts of interest to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 298 KB)

Appendix

Appendix

This appendix is devoted to the derivation of the gradient \(\nabla \ell _j(\varvec{\gamma }_j)\) and Hessian matrix \(\nabla ^2 \ell _j(\varvec{\gamma }_j)\) of \(\ell _j(\varvec{\gamma }_j)\) as in (4). We define

$$\begin{aligned} S_{ij}^{(u)}(\varvec{\gamma }_j, X_i) :=\sum _{r \in R(X_i)} \exp \{[\mathbf {Z}_r \otimes \mathbf {B}(X_{i})]^\top \varvec{\gamma }_j\} \mathbf {Z}_r^{\odot u}, \quad u = 0, 1 ,2, \end{aligned}$$

where for a vector \(\mathbf {v}\in {\mathbb {R}}^p\), \(\mathbf {v}^{\odot 0} :=1\), \(\mathbf {v}^{\odot 1} :=\mathbf {v}\), and \(\mathbf {v}^{\odot 2} :=\mathbf {v}\mathbf {v}^\top \). The gradient \(\nabla \ell _j(\varvec{\gamma }_j)\) and Hessian \(\nabla ^2\ell _j(\varvec{\gamma }_j)\) of \(\ell _j(\varvec{\gamma }_j)\) are hence given by

$$\begin{aligned} \nabla \ell _j(\varvec{\gamma }_j)&=\frac{1}{n}\sum _{i=1}^{n} \varDelta _{ij} \left\{ \mathbf {Z}_i - {\overline{\mathbf {Z}}}_{ij} (\varvec{\gamma }_j, X_i) \right\} \otimes \mathbf {B}(X_i), \end{aligned}$$
(10)
$$\begin{aligned} \nabla ^2\ell _j(\varvec{\gamma }_j)&=-\frac{1}{n}\sum _{i=1}^{n} \varDelta _{ij} \mathbf {V}_{ij}(\varvec{\gamma }_j, X_i) \otimes \left\{ \mathbf {B}(X_i) \mathbf {B}^\top (X_i) \right\} , \end{aligned}$$
(11)

in which

$$\begin{aligned} {\overline{\mathbf {Z}}}_{ij}(\varvec{\gamma }_j, X_i) :=\frac{S_{ij}^{(1)}(\varvec{\gamma }_j, X_i)}{S_{ij}^{(0)}(\varvec{\gamma }_j, X_i)}, \quad \mathbf {V}_{ij}(\varvec{\gamma }_j, X_i) :=\frac{S_{ij}^{(2)}(\varvec{\gamma }_j, X_i)}{ S_{ij}^{(0)}(\varvec{\gamma }_j, X_i)} - {\overline{\mathbf {Z}}}^{\odot 2}_{ij}(\varvec{\gamma }_j, X_i). \end{aligned}$$

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Wu, W., Taylor, J.M.G., Brouwer, A.F. et al. Scalable proximal methods for cause-specific hazard modeling with time-varying coefficients. Lifetime Data Anal 28, 194–218 (2022). https://doi.org/10.1007/s10985-021-09544-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10985-021-09544-2

Keywords

  • Kronecker product
  • B-spline
  • Proximal algorithm
  • Parallel computing
  • Breast cancer
  • Prostate cancer