Abstract
Missing covariates are commonly encountered when evaluating covariate effects on survival outcomes. Excluding missing data from the analysis may lead to biased parameter estimation and a misleading conclusion. The inverse probability weighting method is widely used to handle missing covariates. However, obtaining asymptotic variance in frequentist inference is complicated because it involves estimating parameters for propensity scores. In this paper, we propose a new approach based on an approximate Bayesian method without using Taylor expansion to handle missing covariates for survival data. We consider a stratified proportional hazards model so that it can be used for the non-proportional hazards structure. Two cases for missing pattern are studied: a single missing pattern and multiple missing patterns. The proposed estimators are shown to be consistent and asymptotically normal, which matches the frequentist asymptotic properties. Simulation studies show that our proposed estimators are asymptotically unbiased and the credible region obtained from posterior distribution is close to the frequentist confidence interval. The algorithm is straightforward and computationally efficient. We apply the proposed method to a stem cell transplantation data set.
Similar content being viewed by others
References
Andersen PK, Gill RD (1982) Cox’s regression model for counting processes: a large sample study. The annals of statistics pp 1100–1120
Bacher U, Klyuchnikov E, Le-Rademacher J, Carreras J, Armand P, Bishop M, Bredeson C, Cairo M, Fenske T, Freytes CO, Gale R, Gibson J, Isola L, Inwards D, Laport G, Lazarus H, Maziarz R, Wiernik P, Schouten H, Slavin S, Smith S, Vose J, Waller E, Hari P (2012) Conditioning regimens for allotransplants for diffuse large B-cell lymphoma: myeloablative or reduced intensity? Blood 120(20):4256–62
Bartlett JW, Seaman SR, White IR, Carpenter JR, Initiative ADN (2015) Multiple imputation of covariates by fully conditional specification: Accommodating the substantive model. Stat Methods Med Res 24(4):462–487
Bradshaw PT, Ibrahim JG, Gammon MD (2010) A Bayesian proportional hazards regression model with non-ignorably missing time-varying covariates. Stat Med 29(29):3017–3029
Chen H, Little R (1999) Proportional hazards regression with missing covariates. J Am Stat Assoc 94:896–908
Chen MH, Ibrahim JG, Lipsitz SR (2002) Bayesian methods for missing covariates in cure rate models. Lifetime Data Anal 8(2):117–146
Chen MH, Shao QM (1999) Monte carlo estimation of Bayesian credible and HPD intervals. J Comput Graph Stat 8(1):69–92
Chen Q, Wu H, Ware LB, Koyama T (2014) A Bayesian approach for the Cox proportional hazards model with covariates subject to detection limit. Int J Stat Med Res 3(1):32
Cox DR (1972) Regression models and life-tables. J Royal Stat Soc Series B (Methodol) 34(2):187–202
Cox DR (1975) Partial likelihood. Biometrika 62:269
Dreger P, Sureda A, Ahn KW, Eapen M, Litovich C, Finel H, Boumendil A, Gopal A, Herrera AF, Schmid C et al (2019) PTCy-based haploidentical vs matched related or unrelated donor reduced-intensity conditioning transplant for DLBCL. Blood Adv 3(3):360–369
Fine JP, Gray RJ (1999) A proportional hazards model for the subdistribution of a competing risk. J Am Stat Assoc 94(446):496–509
Hemming K, Hutton JL (2012) Bayesian sensitivity models for missing covariates in the analysis of survival data. J Eval Clin Pract 18(2):238–246
Herring AH, Ibrahim JG (2001) Likelihood-based methods for missing covariates in the Cox proportional hazards model. J Am Stat Assoc 96(453):292–302
Herring AH, Ibrahim JG, Lipsitz SR (2004) Non-ignorable missing covariate data in survival analysis: a case-study of an international breast cancer study group trial. J Royal Stat Soc Series C (Appl Stat) 53(2):293–310
Horvitz DG, Thompson DJ (1952) A generalization of sampling without replacement from a finite universe. J Am Stat Assoc 47(260):663–685
Ibrahim J, Chen M, Kim S (2008) Bayesian variable selection for the Cox regression model with missing covariates. Lifetime Data Anal 14:496–520
Kim S, Cai J, Couper D (2016) Improving the efficiency of estimation in the additive hazards model for stratified case-cohort design with multiple diseases. Stat Med 35(2):282–293
Kim S, Zeng D, Cai J (2018) Analysis of multiple survival events in generalized case-cohort designs. Biometrics 74(4):1250–1260
Klein JP, Moschberger ML (2003) Survival analysis: techniques for censored and truncated data. Springer, New York, NY
Kumar AJ, Kim S, Hemmer MT, Arora M, Spellman SR, Pidala JA, Couriel DR, Alousi AM, Aljurf MD, Cahn JY et al (2018) Graft-versus-host disease in recipients of male unrelated donor compared with parous female sibling donor transplants. Blood Adv 2(9):1022–1031
Lazarus HM, Zhang M, Carreras J, Hayes-Lattin BM, Ataergin AS, Bitran J, Bolwell BJ, Freytes CO, Gale RP, Goldstein SC, Hale GA, Inwards DJ, Klumpp TR, Marks DI, Maziarz RT, McCarthy P, Pavlovsky S, Rizzo J, Shea T, Schouten H, Slavin S, Winter JN, Besien K.v, Vose JM, Hari PN (2010) A comparison of HLA-identical sibling allogeneic versus autologous transplantation for diffuse large B cell lymphoma: a report from the CIBMTR. Biol Blood Marrow Transpl 16(1):35–45
Lin D, Ying Z (1993) Cox regression with incomplete covariate measurements. J Am Stat Assoc 88(424):1341–1349
Paik MC (1997) Multiple imputation for the Cox proportional hazards model with missing covariates. Lifetime Data Anal 3(3):289–298
Papanicolaou GA, Ustun C, Young JAH, Chen M, Kim S, Woo Ahn K, Komanduri K, Lindemans C, Auletta JJ, Riches ML et al (2019) Bloodstream infection due to vancomycin-resistant enterococcus is associated with increased mortality after hematopoietic cell transplantation for acute leukemia and myelodysplastic syndrome: a multicenter, retrospective cohort study. Clin Infect Dis 69(10):1771–1779
Pidala J, Lee SJ, Ahn KW, Spellman S, Wang HL, Aljurf M, Askar M, Dehn J, Fernandez Viña M, Gratwohl A et al (2014) Nonpermissive hla-dpb1 mismatch increases mortality after myeloablative unrelated allogeneic hematopoietic cell transplantation. Blood J Am Soc Hematol 124(16):2596–2606
Pidala J, Lee SJ, Ahn KW, Spellman S, Wang HL, Aljurf M, Askar M, Dehn J, Viña MF, Gratwohl A et al (2014) Non-permissive-DPB1 mismatch among otherwise HLA-matched donor-recipient pairs results in increased overall mortality after myeloablative unrelated allogeneic hematopoietic cell transplantation for hematologic malignancies. Blood 124:2596–2606
Prentice RL, Kalbfleisch JD Jr, Peterson AV, Flournoy N, Farewell VT, Breslow NE (1978) The analysis of failure times in the presence of competing risks. Biometrics 34:541–554
Pugh MG, Robins J, Lipsitz S, Harrington D (1993) Inference in the Cox proportional hazards model with missing covariate data. Ph.D. thesis, Harvard School of Public Health Boston, MA
Robins JM, Rotnitzky A, Zhao LP (1994) Estimation of regression coefficients when some regressors are not always observed. J Am Stat Assoc 89(427):846–866
Rubin DB (1976) Inference and missing data. Biometrika 63(3):581–592
Sang H, Kwang Kim J (2017) An approximate bayesian inference on propensity score estimation under unit nonresponse. Canadian J Stat 49:793–807
Shah NN, Ahn KW, Litovich C, Sureda A, Kharfan-Dabaja MA, Awan FT, Ganguly S, Gergis U, Inwards D, Karmali R et al (2019) Allogeneic transplantation in elderly patients \(\ge \) 65 years with non-hodgkin lymphoma: a time-trend analysis. Blood Cancer J 9(12):1–10
Sharef E, Strawderman R, Ruppert D, Cowen M, Halasyamani L (2010) Bayesian adaptive B-spline estimation in proportional hazards frailty models. Electron J Stat 4:606–642
Soubeyrand S, Haon-Lasportes E (2015) Weak convergence of posteriors conditional on maximum pseudo-likelihood estimates and implications in ABC. Stat Probab Lett 107:84–92
Sun B, Tchetgen Tchetgen EJ (2018) On inverse probability weighting for nonmonotone missing at random data. J Am Stat Assoc 113:369–379
Ustun C, Kim S, Chen M, Beitinjaneh AM, Brown VI, Dahi PB, Daly A, Diaz MA, Freytes CO, Ganguly S et al (2019) Increased overall and bacterial infections following myeloablative allogeneic hct for patients with aml in cr1. Blood Adv 3(17):2525–2536
Verneris MR, Lee SJ, Ahn KW, Wang HL, Battiwalla M, Inamoto Y, Fernandez-Vina MA, Gajewski J, Pidala J, Munker R et al (2015) HLA mismatch is associated with worse outcomes after unrelated donor reduced-intensity conditioning hematopoietic cell transplantation: an analysis from the Center for International Blood and Marrow Transplant Research. Biol Blood Marrow Transpl 21(10):1783–1789
Wang C, Chen HY (2001) Augmented inverse probability weighted estimator for Cox missing covariate regression. Biometrics 57(2):414–419
White IR, Royston P (2009) Imputing missing covariate values for the Cox model. Stat Med 28(15):1982–1998
Xu Q, Paik MC, Luo X, Tsai WY (2009) Reweighting estimators for Cox regression with missing covariates. J Am Stat Assoc 104(487):1155–1167
Yoo H, Lee JW (2018) Comparison of missing data methods in clustered survival data using Bayesian adaptive B-Spline estimation. Commun Stat Appl Methods 25(2):159–172
Yuan KH, Jennrich RI (1998) Asymptotics of estimating equations under natural conditions. J Multivar Anal 65(2):245–260
Zhou H, Pepe MS (1995) Auxiliary covariate data in failure time regression. Biometrika 82(1):139–149
Acknowledgements
We would like to thank the Associate Editor and two reviewers for their constructive comments which significantly improved the paper. This work was supported in part by the Medical College of Wisconsin Cancer Center, the Advancing a Healthier Wisconsin Endowment (Project # 5520461), and the US National Cancer Institute (U24CA076518).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Appendix
Appendix
We derive \(\varvec{\varSigma }\) and its estimator in the Appendix. Let \(dM_{li}(t) = dN_{li}(t) - Y_{li}(t) \exp \{\varvec{\beta }^T \varvec{Z}_{li}\} d\varLambda _{l}(t)\). The posterior distribution is
where
We can estimate \(Var\{\varvec{U}_{1}(\varvec{\beta },\varvec{\phi }) \}\) given \(\varvec{\beta }\) and \(\varvec{\phi }\) as follows:
where \(d\widehat{\varLambda }_{0l}(t) = \sum _{i=1}^{n_l} dN_{li}(t)/ n_l S_{l}^{(0)}(\varvec{\beta },t)\).
We can obtain \(\widehat{Var}(\varvec{U}_2)\) given \(\varvec{\beta }\) and \(\varvec{\phi }\) as follows:
Next, \(Cov(\varvec{U}_{1},\varvec{U}_{2})\) given \(\varvec{\beta }\) and \(\varvec{\phi }\) can be estimated by
The estimator \(\widehat{\varvec{\varSigma }}\) is
Rights and permissions
About this article
Cite this article
Kim, S., Kim, JK. & Ahn, K.W. A calibrated Bayesian method for the stratified proportional hazards model with missing covariates. Lifetime Data Anal 28, 169–193 (2022). https://doi.org/10.1007/s10985-021-09542-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10985-021-09542-4