Abstract
Flexible incorporation of both geographical patterning and risk effects in cancer survival models is becoming increasingly important, due in part to the recent availability of large cancer registries. Most spatial survival models stochastically order survival curves from different subpopulations. However, it is common for survival curves from two subpopulations to cross in epidemiological cancer studies and thus interpretable standard survival models can not be used without some modification. Common fixes are the inclusion of time-varying regression effects in the proportional hazards model or fully nonparametric modeling, either of which destroys any easy interpretability from the fitted model. To address this issue, we develop a generalized accelerated failure time model which allows stratification on continuous or categorical covariates, as well as providing per-variable tests for whether stratification is necessary via novel approximate Bayes factors. The model is interpretable in terms of how median survival changes and is able to capture crossing survival curves in the presence of spatial correlation. A detailed Markov chain Monte Carlo algorithm is presented for posterior inference and a freely available function frailtyGAFT is provided to fit the model in the R package spBayesSurv. We apply our approach to a subset of the prostate cancer data gathered for Louisiana by the surveillance, epidemiology, and end results program of the National Cancer Institute.
Similar content being viewed by others
References
Banerjee S, Carlin BP (2003) Semiparametric spatio-temporal frailty modeling. Environmetrics 14(5):523–535
Banerjee S, Dey DK (2005) Semiparametric proportional odds models for spatially correlated survival data. Lifetime Data Anal 11(2):175–191
Banerjee S, Wall MM, Carlin BP (2003) Frailty modeling for spatially correlated survival data, with application to infant mortality in Minnesota. Biostatistics 4(1):123–142
Besag J (1974) Spatial interaction and the statistical analysis of lattice systems. J R Stat Soc 36(2):192–236
Bouliotis G, Billingham L (2011) Crossing survival curves: alternatives to the log-rank test. Trials 12(Suppl 1):A137
Chiou SH, Kang S, Yan J (2015) Semiparametric accelerated failure time modeling for clustered failure times from stratified sampling. J Am Stat Assoc 110:621–629
Christensen R, Johnson W (1988) Modeling accelerated failure time with a Dirichlet process. Biometrika 75(4):693–704
Cox DR (1975) Partial likelihood. Biometrika 62(2):269–276
De Iorio M, Johnson WO, Müller P, Rosner GL (2009) Bayesian nonparametric nonproportional hazards survival modeling. Biometrics 65(3):762–771
Dickey JM (1971) The weighted likelihood ratio, linear hypotheses on normal location parameters. Ann Math Stat 42(1):204–223
Gamerman D (1997) Sampling from the posterior distribution in generalized linear mixed models. Stat Comput 7(1):57–68
Geisser S, Eddy WF (1979) A predictive approach to model selection. J Am Stat Assoc 74(365):153–160
Gelfand AE, Dey DK (1994) Bayesian model choice: asymptotics and exact calculations. J R Stat Soc 56(3):501–514
Gelfand AE, Vounatsou P (2003) Proper multivariate conditional autoregressive models for spatial data analysis. Biostatistics 4(1):11–15
Haario H, Saksman E, Tamminen J (2001) An adaptive Metropolis algorithm. Bernoulli 7(2):223–242
Hanson T, Johnson WO (2002) Modeling regression error with a mixture of Polya trees. J Am Stat Assoc 97(460):1020–1033
Hanson T, Johnson WO (2004) A Bayesian semiparametric AFT model for interval-censored data. J Comput Gr Stat 13(2):341–361
Hanson T, Kottas A, Branscum A (2008) Modelling stochastic order in the analysis of receiver operating characteristic data: Bayesian nonparametric approaches. J R Stat Soc 57(2):207–225
Hanson TE (2006) Inference for mixtures of finite Polya tree models. J Am Stat Assoc 101(476):1548–1565
Hanson TE, Jara A (2013) Surviving fully Bayesian nonparametricregression models. In: Bayesian theory and applications. Oxford University Press, Oxford, pp 592–615
Hanson TE, Jara A, Zhao L et al (2012) A Bayesian semiparametric temporally-stratified proportional hazards model with spatial frailties. Bayesian Anal 7(1):147–188
Henderson R, Shimakura S, Gorst D (2002) Modeling spatial variation in leukemia survival data. J Am Stat Assoc 97(460):965–972
Hennerfeind A, Brezger A, Fahrmeir L (2006) Geoadditive survival models. J Am Stat Assoc 101(475):1065–1075
Jara A, Hanson TE (2011) A class of mixtures of dependent tailfree processes. Biometrika 98(3):553–566
Koenker R (2008) Censored quantile regression redux. J Stat Softw 27(6):1–25
Kottas A, Gelfand AE (2001) Bayesian semiparametric median regression modeling. J Am Stat Assoc 96(456):1458–1468
Kuo L, Mallick B (1997) Bayesian semiparametric inference for the accelerated failure-time model. Can J Stat 25(4):457–472
Li Y, Ryan L (2002) Modeling spatial survival data using semiparametric frailty models. Biometrics 58(2):287–297
Logan BR, Klein JP, Zhang M-J (2008) Comparing treatments in the presence of crossing survival curves: an application to bone marrow transplantation. Biometrics 64(3):733–740
Neal RM (2003) Slice sampling. Ann Stat 31(3):705–767
Pang L, Lu W, Wang HJ (2015) Local Buckley-James estimation for heteroscedastic accelerated failure time model. Stat Sin 25(3):863–877
Portnoy S (2003) Censored regression quantiles. J Am Stat Assoc 98(464):1001–1012
Raftery AE (1996) Hypothesis testing and model selection via posterior simulation. In: Markov Chain Monte Carlo in practice. Springer, New York, pp 163–187
Robert C, Casella G (2005) Monte Carlo statistical methods. Springer, New York
Verdinelli I, Wasserman L (1995) Computing Bayes factors using a generalization of the Savage-Dickey density ratio. J Am Stat Assoc 90(430):614–618
Walker SG, Mallick BK (1999) A Bayesian semiparametric accelerated failure time model. Biometrics 55(2):477–483
Wang S, Zhang J, Lawson AB (2012) A Bayesian normal mixture accelerated failure time spatial model and its application to prostate cancer. Stat Methods Med Res. doi:10.1177/0962280212466189
Zellner A (1983) Applications of Bayesian analysis in econometrics. Statistician 32(1/2):23–34
Zhang J, Lawson AB (2011) Bayesian parametric accelerated failure time spatial model and its application to prostate cancer. J Appl Stat 38(3):591–603
Zhao L, Hanson TE (2011) Spatially dependent Polya tree modeling for survival data. Biometrics 67(2):391–403
Zhao L, Hanson TE, Carlin BP (2009) Mixtures of Polya trees for flexible spatial frailty survival modelling. Biometrika 96(2):263–276
Zhou H, Hanson T, Jara A, Zhang J (2015) Modeling county level breast cancer survival data using a covariate-adjusted frailty proportional hazards model. Ann Appl Stat 9(1):43–68
Acknowledgments
This work was supported by NCI grant 5R03CA176739. The authors would like to thank the editor, the associate editor, and the two referees for their valuable comments, which led to great improvements to the paper.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Appendix
Appendix
Proposition 1
Assume that \({\varvec{\gamma }}_{l,k}|\alpha \overset{ind.}{\sim } N_{q+1}\left( {\mathbf {0}}, \frac{2n}{\alpha \rho (l+1)} ({\mathbf {X}}'{\mathbf {X}})^{-1}\right) \) under \(H_1\) and \({\varvec{\gamma }}_{l,k,-j}|\alpha \overset{ind.}{\sim } N_{q}\left( {\mathbf {0}}, \frac{2n}{\alpha \rho (l+1)} ({\mathbf {X}}_{-j}'{\mathbf {X}}_{-j})^{-1}\right) \) under \(H_0\), where \(\alpha \) is fixed and \({\mathbf {X}}_{-j}\) is the design matrix \({\mathbf {X}}\) excluding the \((j+1)\)th column. Then the Assumption (2.12) holds, and
where \(({\mathbf {X}}'{\mathbf {X}})_{jj}^{-1}\) is the \((j+1,j+1)\)th element of \(({\mathbf {X}}'{\mathbf {X}})^{-1}\), and \(\phi (\cdot |\mu , \sigma ^2)\) denotes the normal density with mean \(\mu \) and variance \(\sigma ^2\).
Proof
Since \({\varvec{\gamma }}_{l,k}|\alpha \) follows a multivariate normal, \(({\varvec{\gamma }}_{l,k,-j}|{\varvec{\gamma }}_{l,k,j}=0,\alpha )\) still follows a multivariate normal distribution
This implies that \(p({\varvec{\gamma }}_{l,k,-j}|{\varvec{\gamma }}_{l,k,j}=0,\alpha )=p_0({\varvec{\gamma }}_{l,k,-j}|\alpha )\) and by independence \(p({\varvec{\varUpsilon }}_{-j}|{\varvec{\varUpsilon }}_j=0,\alpha )=p_0({\varvec{\varUpsilon }}_{-j}|\alpha )\). In addition, \(\alpha \) is fixed and \({\varvec{\varUpsilon }}_{-j}\) is independent of all other parameters in \({\varvec{\psi }}\), thus the Assumption (2.12) holds. It is easy to evaluate \(p({\varvec{\varUpsilon }}_j=0)\) by noting the properties of multivariate normal. \(\square \)
Proposition 2
Assume the same priors on \({\varvec{\gamma }}_{l,k}\) as Proposition 1 and additional prior on \(\alpha \) as \(\pi (\alpha )={\varGamma }(\alpha |a_0, b_0)\) under both \(H_1\) and \(H_0\). Then given existence of all involved expectations, \(BF_{10}\) can be written as
where the expectation is with respect to \(p(\alpha |{\varvec{\varUpsilon }}_j={\mathbf {0}}, {\mathcal {D}})\).
Proof
First note that \({\varvec{\psi }}\) represents all remaining model parameters but \({\varvec{\varUpsilon }}_j\) and the prior for \({\varvec{\varUpsilon }}_j\) only depend on the precision parameter \(\alpha \), so we have \(p({\varvec{\varUpsilon }}_j={\mathbf {0}}|{\varvec{\psi }})=p({\varvec{\varUpsilon }}_j={\mathbf {0}}|\alpha )\). Also note that \({\mathcal {L}}({\varvec{\varUpsilon }}_j, {\varvec{\psi }})\) is the likelihood function, so we could denote it by \(p({\mathcal {D}}|{\varvec{\varUpsilon }}_j, {\varvec{\psi }}) \). It follows that
Then we have
where \(({\varvec{\psi }}_{-\alpha }, \alpha )={\varvec{\psi }}\). \(\square \)
Rights and permissions
About this article
Cite this article
Zhou, H., Hanson, T. & Zhang, J. Generalized accelerated failure time spatial frailty model for arbitrarily censored data. Lifetime Data Anal 23, 495–515 (2017). https://doi.org/10.1007/s10985-016-9361-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10985-016-9361-4