Skip to main content

A calibrated Bayesian method for the stratified proportional hazards model with missing covariates

Abstract

Missing covariates are commonly encountered when evaluating covariate effects on survival outcomes. Excluding missing data from the analysis may lead to biased parameter estimation and a misleading conclusion. The inverse probability weighting method is widely used to handle missing covariates. However, obtaining asymptotic variance in frequentist inference is complicated because it involves estimating parameters for propensity scores. In this paper, we propose a new approach based on an approximate Bayesian method without using Taylor expansion to handle missing covariates for survival data. We consider a stratified proportional hazards model so that it can be used for the non-proportional hazards structure. Two cases for missing pattern are studied: a single missing pattern and multiple missing patterns. The proposed estimators are shown to be consistent and asymptotically normal, which matches the frequentist asymptotic properties. Simulation studies show that our proposed estimators are asymptotically unbiased and the credible region obtained from posterior distribution is close to the frequentist confidence interval. The algorithm is straightforward and computationally efficient. We apply the proposed method to a stem cell transplantation data set.

This is a preview of subscription content, access via your institution.

References

  • Andersen PK, Gill RD (1982) Cox’s regression model for counting processes: a large sample study. The annals of statistics pp 1100–1120

  • Bacher U, Klyuchnikov E, Le-Rademacher J, Carreras J, Armand P, Bishop M, Bredeson C, Cairo M, Fenske T, Freytes CO, Gale R, Gibson J, Isola L, Inwards D, Laport G, Lazarus H, Maziarz R, Wiernik P, Schouten H, Slavin S, Smith S, Vose J, Waller E, Hari P (2012) Conditioning regimens for allotransplants for diffuse large B-cell lymphoma: myeloablative or reduced intensity? Blood 120(20):4256–62

    Google Scholar 

  • Bartlett JW, Seaman SR, White IR, Carpenter JR, Initiative ADN (2015) Multiple imputation of covariates by fully conditional specification: Accommodating the substantive model. Stat Methods Med Res 24(4):462–487

    MathSciNet  Google Scholar 

  • Bradshaw PT, Ibrahim JG, Gammon MD (2010) A Bayesian proportional hazards regression model with non-ignorably missing time-varying covariates. Stat Med 29(29):3017–3029

    MathSciNet  Google Scholar 

  • Chen H, Little R (1999) Proportional hazards regression with missing covariates. J Am Stat Assoc 94:896–908

    MathSciNet  MATH  Google Scholar 

  • Chen MH, Ibrahim JG, Lipsitz SR (2002) Bayesian methods for missing covariates in cure rate models. Lifetime Data Anal 8(2):117–146

    MathSciNet  MATH  Google Scholar 

  • Chen MH, Shao QM (1999) Monte carlo estimation of Bayesian credible and HPD intervals. J Comput Graph Stat 8(1):69–92

    MathSciNet  Google Scholar 

  • Chen Q, Wu H, Ware LB, Koyama T (2014) A Bayesian approach for the Cox proportional hazards model with covariates subject to detection limit. Int J Stat Med Res 3(1):32

    Google Scholar 

  • Cox DR (1972) Regression models and life-tables. J Royal Stat Soc Series B (Methodol) 34(2):187–202

    MathSciNet  MATH  Google Scholar 

  • Cox DR (1975) Partial likelihood. Biometrika 62:269

    MathSciNet  MATH  Google Scholar 

  • Dreger P, Sureda A, Ahn KW, Eapen M, Litovich C, Finel H, Boumendil A, Gopal A, Herrera AF, Schmid C et al (2019) PTCy-based haploidentical vs matched related or unrelated donor reduced-intensity conditioning transplant for DLBCL. Blood Adv 3(3):360–369

    Google Scholar 

  • Fine JP, Gray RJ (1999) A proportional hazards model for the subdistribution of a competing risk. J Am Stat Assoc 94(446):496–509

    MathSciNet  MATH  Google Scholar 

  • Hemming K, Hutton JL (2012) Bayesian sensitivity models for missing covariates in the analysis of survival data. J Eval Clin Pract 18(2):238–246

    Google Scholar 

  • Herring AH, Ibrahim JG (2001) Likelihood-based methods for missing covariates in the Cox proportional hazards model. J Am Stat Assoc 96(453):292–302

    MathSciNet  MATH  Google Scholar 

  • Herring AH, Ibrahim JG, Lipsitz SR (2004) Non-ignorable missing covariate data in survival analysis: a case-study of an international breast cancer study group trial. J Royal Stat Soc Series C (Appl Stat) 53(2):293–310

    MathSciNet  MATH  Google Scholar 

  • Horvitz DG, Thompson DJ (1952) A generalization of sampling without replacement from a finite universe. J Am Stat Assoc 47(260):663–685

    MathSciNet  MATH  Google Scholar 

  • Ibrahim J, Chen M, Kim S (2008) Bayesian variable selection for the Cox regression model with missing covariates. Lifetime Data Anal 14:496–520

    MathSciNet  MATH  Google Scholar 

  • Kim S, Cai J, Couper D (2016) Improving the efficiency of estimation in the additive hazards model for stratified case-cohort design with multiple diseases. Stat Med 35(2):282–293

    MathSciNet  Google Scholar 

  • Kim S, Zeng D, Cai J (2018) Analysis of multiple survival events in generalized case-cohort designs. Biometrics 74(4):1250–1260

    MathSciNet  Google Scholar 

  • Klein JP, Moschberger ML (2003) Survival analysis: techniques for censored and truncated data. Springer, New York, NY

    Google Scholar 

  • Kumar AJ, Kim S, Hemmer MT, Arora M, Spellman SR, Pidala JA, Couriel DR, Alousi AM, Aljurf MD, Cahn JY et al (2018) Graft-versus-host disease in recipients of male unrelated donor compared with parous female sibling donor transplants. Blood Adv 2(9):1022–1031

    Google Scholar 

  • Lazarus HM, Zhang M, Carreras J, Hayes-Lattin BM, Ataergin AS, Bitran J, Bolwell BJ, Freytes CO, Gale RP, Goldstein SC, Hale GA, Inwards DJ, Klumpp TR, Marks DI, Maziarz RT, McCarthy P, Pavlovsky S, Rizzo J, Shea T, Schouten H, Slavin S, Winter JN, Besien K.v, Vose JM, Hari PN (2010) A comparison of HLA-identical sibling allogeneic versus autologous transplantation for diffuse large B cell lymphoma: a report from the CIBMTR. Biol Blood Marrow Transpl 16(1):35–45

    Google Scholar 

  • Lin D, Ying Z (1993) Cox regression with incomplete covariate measurements. J Am Stat Assoc 88(424):1341–1349

    MathSciNet  MATH  Google Scholar 

  • Paik MC (1997) Multiple imputation for the Cox proportional hazards model with missing covariates. Lifetime Data Anal 3(3):289–298

    MATH  Google Scholar 

  • Papanicolaou GA, Ustun C, Young JAH, Chen M, Kim S, Woo Ahn K, Komanduri K, Lindemans C, Auletta JJ, Riches ML et al (2019) Bloodstream infection due to vancomycin-resistant enterococcus is associated with increased mortality after hematopoietic cell transplantation for acute leukemia and myelodysplastic syndrome: a multicenter, retrospective cohort study. Clin Infect Dis 69(10):1771–1779

    Google Scholar 

  • Pidala J, Lee SJ, Ahn KW, Spellman S, Wang HL, Aljurf M, Askar M, Dehn J, Fernandez Viña M, Gratwohl A et al (2014) Nonpermissive hla-dpb1 mismatch increases mortality after myeloablative unrelated allogeneic hematopoietic cell transplantation. Blood J Am Soc Hematol 124(16):2596–2606

    Google Scholar 

  • Pidala J, Lee SJ, Ahn KW, Spellman S, Wang HL, Aljurf M, Askar M, Dehn J, Viña MF, Gratwohl A et al (2014) Non-permissive-DPB1 mismatch among otherwise HLA-matched donor-recipient pairs results in increased overall mortality after myeloablative unrelated allogeneic hematopoietic cell transplantation for hematologic malignancies. Blood 124:2596–2606

    Google Scholar 

  • Prentice RL, Kalbfleisch JD Jr, Peterson AV, Flournoy N, Farewell VT, Breslow NE (1978) The analysis of failure times in the presence of competing risks. Biometrics 34:541–554

    MATH  Google Scholar 

  • Pugh MG, Robins J, Lipsitz S, Harrington D (1993) Inference in the Cox proportional hazards model with missing covariate data. Ph.D. thesis, Harvard School of Public Health Boston, MA

  • Robins JM, Rotnitzky A, Zhao LP (1994) Estimation of regression coefficients when some regressors are not always observed. J Am Stat Assoc 89(427):846–866

    MathSciNet  MATH  Google Scholar 

  • Rubin DB (1976) Inference and missing data. Biometrika 63(3):581–592

    MathSciNet  MATH  Google Scholar 

  • Sang H, Kwang Kim J (2017) An approximate bayesian inference on propensity score estimation under unit nonresponse. Canadian J Stat 49:793–807

    MathSciNet  Google Scholar 

  • Shah NN, Ahn KW, Litovich C, Sureda A, Kharfan-Dabaja MA, Awan FT, Ganguly S, Gergis U, Inwards D, Karmali R et al (2019) Allogeneic transplantation in elderly patients \(\ge \) 65 years with non-hodgkin lymphoma: a time-trend analysis. Blood Cancer J 9(12):1–10

    Google Scholar 

  • Sharef E, Strawderman R, Ruppert D, Cowen M, Halasyamani L (2010) Bayesian adaptive B-spline estimation in proportional hazards frailty models. Electron J Stat 4:606–642

    MathSciNet  MATH  Google Scholar 

  • Soubeyrand S, Haon-Lasportes E (2015) Weak convergence of posteriors conditional on maximum pseudo-likelihood estimates and implications in ABC. Stat Probab Lett 107:84–92

    MathSciNet  MATH  Google Scholar 

  • Sun B, Tchetgen Tchetgen EJ (2018) On inverse probability weighting for nonmonotone missing at random data. J Am Stat Assoc 113:369–379

    MathSciNet  MATH  Google Scholar 

  • Ustun C, Kim S, Chen M, Beitinjaneh AM, Brown VI, Dahi PB, Daly A, Diaz MA, Freytes CO, Ganguly S et al (2019) Increased overall and bacterial infections following myeloablative allogeneic hct for patients with aml in cr1. Blood Adv 3(17):2525–2536

    Google Scholar 

  • Verneris MR, Lee SJ, Ahn KW, Wang HL, Battiwalla M, Inamoto Y, Fernandez-Vina MA, Gajewski J, Pidala J, Munker R et al (2015) HLA mismatch is associated with worse outcomes after unrelated donor reduced-intensity conditioning hematopoietic cell transplantation: an analysis from the Center for International Blood and Marrow Transplant Research. Biol Blood Marrow Transpl 21(10):1783–1789

    Google Scholar 

  • Wang C, Chen HY (2001) Augmented inverse probability weighted estimator for Cox missing covariate regression. Biometrics 57(2):414–419

    MathSciNet  MATH  Google Scholar 

  • White IR, Royston P (2009) Imputing missing covariate values for the Cox model. Stat Med 28(15):1982–1998

    MathSciNet  Google Scholar 

  • Xu Q, Paik MC, Luo X, Tsai WY (2009) Reweighting estimators for Cox regression with missing covariates. J Am Stat Assoc 104(487):1155–1167

    MathSciNet  MATH  Google Scholar 

  • Yoo H, Lee JW (2018) Comparison of missing data methods in clustered survival data using Bayesian adaptive B-Spline estimation. Commun Stat Appl Methods 25(2):159–172

    Google Scholar 

  • Yuan KH, Jennrich RI (1998) Asymptotics of estimating equations under natural conditions. J Multivar Anal 65(2):245–260

    MathSciNet  MATH  Google Scholar 

  • Zhou H, Pepe MS (1995) Auxiliary covariate data in failure time regression. Biometrika 82(1):139–149

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

We would like to thank the Associate Editor and two reviewers for their constructive comments which significantly improved the paper. This work was supported in part by the Medical College of Wisconsin Cancer Center, the Advancing a Healthier Wisconsin Endowment (Project # 5520461), and the US National Cancer Institute (U24CA076518).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Soyoung Kim.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 104 KB)

Appendix

Appendix

We derive \(\varvec{\varSigma }\) and its estimator in the Appendix. Let \(dM_{li}(t) = dN_{li}(t) - Y_{li}(t) \exp \{\varvec{\beta }^T \varvec{Z}_{li}\} d\varLambda _{l}(t)\). The posterior distribution is

$$\begin{aligned} p(\varvec{\eta }|\varvec{U}_n)\sim N \left[ \left( \begin{array}{c}{} \mathbf{0} \\ \mathbf{0} \end{array}\right) , \frac{\varvec{\varSigma }}{n}=\left( \begin{array}{cc} Var(\varvec{U}_{1}) &{} Cov(\varvec{U}_{1},\varvec{U}_{2})\\ Cov(\varvec{U}_{1},\varvec{U}_{2}) &{} Var(\varvec{U}_{2}) \end{array}\right) \right] , \end{aligned}$$

where

$$\begin{aligned}&Var\{\varvec{U}_{1}(\varvec{\beta },\varvec{\phi }) \}\\&\quad = Var\left[ n^{-1} \sum _{l=1}^{L} \sum _{i=1}^{n_l} \frac{\xi _{li}}{\pi _{li}} \int _{0}^{\tau } \Big \{\varvec{Z}_{li} - \frac{\varvec{S}_{l}^{(1)}(\varvec{\beta },t)}{S_{l}^{(0)}(\varvec{\beta },t) }\Big \} dM_{li}(t) \right] \\&\quad = Var\left[ n^{-1} \sum _{l=1}^{L} \sum _{i=1}^{n_l} \frac{\xi _{li}}{\pi _{li}} \int _{0}^{\tau } \Big \{\varvec{Z}_{li} - \varvec{e}_{l}(\varvec{\beta },t)\Big \} dM_{li}(t)\right] \\&\quad = E\Big [n^{-2} \sum _{l=1}^{L} \sum _{i=1}^{n_l} \frac{\xi _{li}}{\pi _{li}} \int _{0}^{\tau } \Big \{\varvec{Z}_{li} - \varvec{e}_{l}(\varvec{\beta },t)\Big \} dM_{li}(t) \Big ]^{\otimes 2}. \end{aligned}$$

We can estimate \(Var\{\varvec{U}_{1}(\varvec{\beta },\varvec{\phi }) \}\) given \(\varvec{\beta }\) and \(\varvec{\phi }\) as follows:

$$\begin{aligned} \widehat{ Var}\{\varvec{U}_{1}(\varvec{\beta },\varvec{\phi }) \} = \frac{1}{n^2} \sum _{l=1}^{L} \sum _{i=1}^{n_l} \frac{\xi _{li}}{\pi ^2_{li}} \int _{0}^{\tau } \left[ \left\{ \varvec{Z}_{li} - \frac{\varvec{S}_{l}^{(1)}(\varvec{\beta },t)}{S_{l}^{(0)}(\varvec{\beta },t) }\right\} \{dN_{li}(t) - d\widehat{\varLambda }_{0l}(t)\}\right] ^{\otimes 2}, \end{aligned}$$

where \(d\widehat{\varLambda }_{0l}(t) = \sum _{i=1}^{n_l} dN_{li}(t)/ n_l S_{l}^{(0)}(\varvec{\beta },t)\).

We can obtain \(\widehat{Var}(\varvec{U}_2)\) given \(\varvec{\beta }\) and \(\varvec{\phi }\) as follows:

$$\begin{aligned} \widehat{Var}(\varvec{U}_{2}) = \frac{1}{n^2} \sum _{l=1}^{L} \sum _{i=1}^{n_l} \pi _{li}(1-\pi _{li}) \varvec{\omega }_{li} \varvec{\omega }_{li}^T. \end{aligned}$$

Next, \(Cov(\varvec{U}_{1},\varvec{U}_{2})\) given \(\varvec{\beta }\) and \(\varvec{\phi }\) can be estimated by

$$\begin{aligned} \widehat{Cov}(\varvec{U}_{1},\varvec{U}_{2})= & {} \widehat{Cov}\Big [\frac{1}{n}\sum _{l=1}^{L} \sum _{i=1}^{n_l} \frac{\xi _{li}}{\pi _{li}}\int _{0}^{\tau } \left\{ \varvec{Z}_{li} - \frac{\varvec{S}_{l}^{(1)}(\varvec{\beta },t)}{S_{l}^{(0)}(\varvec{\beta },t) }\right\} dM_{li}(t), \\&\frac{1}{n} \sum _{l=1}^{L} \sum _{i=1}^{n_l} \{\xi _{li} - \pi _{li}\} \varvec{\omega }_{li}^T\Big ]\\= & {} \frac{1}{n^2} \sum _{l=1}^{L} \sum _{i=1}^{n_l} \frac{\xi _{li} (1-\pi _{li})}{\pi _{li}} \int _{0}^{\tau } \left\{ \varvec{Z}_{li} -\frac{\varvec{S}_{l}^{(1)}(\varvec{\beta },t)}{S_{l}^{(0)}(\varvec{\beta },t)}\right\} dM_{li}(t) \times \varvec{\omega }_{li}^T. \end{aligned}$$

The estimator \(\widehat{\varvec{\varSigma }}\) is

$$\begin{aligned} \frac{\widehat{\varvec{\varSigma }}}{n}=\left( \begin{array}{cc} \widehat{Var}(\varvec{U}_{1}) &{} \widehat{Cov}(\varvec{U}_{1},\varvec{U}_{2})\\ \widehat{Cov}(\varvec{U}_{1},\varvec{U}_{2}) &{} \widehat{Var}(\varvec{U}_{2}) \end{array}\right) . \end{aligned}$$

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Kim, S., Kim, JK. & Ahn, K.W. A calibrated Bayesian method for the stratified proportional hazards model with missing covariates. Lifetime Data Anal 28, 169–193 (2022). https://doi.org/10.1007/s10985-021-09542-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10985-021-09542-4

Keywords

  • Bayesian computation
  • Cox model
  • Missing data
  • Posterior distribution
  • Survival data.