Mark-specific hazard ratio model with missing multivariate marks

Abstract

An objective of randomized placebo-controlled preventive HIV vaccine efficacy (VE) trials is to assess the relationship between vaccine effects to prevent HIV acquisition and continuous genetic distances of the exposing HIVs to multiple HIV strains represented in the vaccine. The set of genetic distances, only observed in failures, is collectively termed the ‘mark.’ The objective has motivated a recent study of a multivariate mark-specific hazard ratio model in the competing risks failure time analysis framework. Marks of interest, however, are commonly subject to substantial missingness, largely due to rapid post-acquisition viral evolution. In this article, we investigate the mark-specific hazard ratio model with missing multivariate marks and develop two inferential procedures based on (i) inverse probability weighting (IPW) of the complete cases, and (ii) augmentation of the IPW estimating functions by leveraging auxiliary data predictive of the mark. Asymptotic properties and finite-sample performance of the inferential procedures are presented. This research also provides general inferential methods for semiparametric density ratio/biased sampling models with missing data. We apply the developed procedures to data from the HVTN 502 ‘Step’ HIV VE trial.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3

References

  1. Buchbinder SP, Mehrotra DV, Duerr A, Fitzgerald DW, Mogg R, Li D, Gilbert PB, Lama JR, Marmor M, del Rio C, McElrath MJ, Casimiro DR, Gottesdiener KM, Chodakewitz JA, Corey L, Robertson MN, The Step Study Protocol Team (2008) Efficacy assessment of a cell-mediated immunity HIV-1 vaccine (the Step Study): a double-blind, randomised, placebo-controlled, test-of-concept trial. Lancet 372(9653):1881–1893

    Article  Google Scholar 

  2. Buus S, Lauemoller SL, Worning P, Kesmir C, Frimurer T, Corbet S, Fomsgaard A, Hilden J, Holm A, Brunak S (2003) Sensitive quantitative predictions of peptide-mhc binding by a ’query by committee’ artificial neural network approach. Tissue Antigens 62(5):378–384

    Article  Google Scholar 

  3. Cao W, Tsiatis AA, Davidian M (2009) Improving efficiency and robustness of the doubly robust estimator for a population mean with incomplete data. Biometrika 96(3):723–734. doi:10.1093/biomet/asp033

    MathSciNet  Article  MATH  Google Scholar 

  4. Cox DR, Oakes D (1984) Analysis of survival data., CRC monographs on statistics and applied probability seriesChapman and Hall, London

    Google Scholar 

  5. Ferrari S, Cribari-Neto F (2004) Beta regression for modelling rates and proportions. J Appl Statistics 31(7):799–815

    MathSciNet  Article  MATH  Google Scholar 

  6. Gilbert PB (2000) Large sample theory of maximum likelihood estimates in semiparametric biased sampling models. Ann Stat 28(1):151–194

    MathSciNet  Article  MATH  Google Scholar 

  7. Gilbert PB, Lele SR, Vardi Y (1999) Maximum likelihood estimation in semiparametric selection bias models with application to AIDS vaccine trials. Biometrika 86(1):27–43

    MathSciNet  Article  MATH  Google Scholar 

  8. Gilbert PB, McKeague IW, Sun Y (2004) Tests for comparing mark-specific hazards and cumulative incidence functions. Lifetime Data Anal 10:5–28

    MathSciNet  Article  MATH  Google Scholar 

  9. Gilbert PB, McKeague IW, Sun Y (2008) The 2-sample problem for failure rates depending on a continuous mark: an application to vaccine efficacy. Biostatistics 9(2):263–276

    Article  MATH  Google Scholar 

  10. Goetghebeur E, Ryan L (1995) Analysis of competing risks survival data when some failure types are missing. Biometrika 82(4):821–833

    MathSciNet  Article  MATH  Google Scholar 

  11. Grambsch PM, Therneau TM (1994) Proportional hazards tests and diagnostics based on weighted residuals. Biometrika 81(3):515–526. doi:10.1093/biomet/81.3.515

    MathSciNet  Article  MATH  Google Scholar 

  12. Halloran ME, Struchiner CJ, Longini IM (1997) Study designs for evaluating different efficacy and effectiveness aspects of vaccines. Am J Epidemiol 146(10):789–803

    Article  Google Scholar 

  13. Heckerman D, Kadie C, Listgarten J (2007) Leveraging information across hla alleles/supertypes improves epitope prediction. J Comput Biol 14:736–746

    Article  MATH  Google Scholar 

  14. Horvitz DG, Thompson DJ (1952) A generalization of sampling without replacement from a finite universe. J Am Stat Assoc 47(260):663–685

    MathSciNet  Article  MATH  Google Scholar 

  15. Juraska M, Gilbert PB (2013) Mark-specific hazard ratio model with multivariate continuous marks: an application to vaccine efficacy. Biometrics 69(2):328–337. doi:10.1111/biom.12016

    MathSciNet  Article  MATH  Google Scholar 

  16. Kang JDY, Schafer JL (2007) Demystifying double robustness: a comparison of alternative strategies for estimating a population mean from incomplete data. Stat Sci 22(4):523–539

    MathSciNet  Article  MATH  Google Scholar 

  17. Keele B, Giorgi E, Salazar-Gonzalez J, Decker J, Pham K, Salazar M, Sun C, Grayson T (2008) Identification and characterization of transmitted and early founder virus envelopes in primary HIV-1 infection. Proc Natl Acad Sci 105:7552–7557

    Article  Google Scholar 

  18. Lu K, Tsiatis AA (2001) Multiple imputation methods for estimating regression coefficients in the competing risks model with missing cause of failure. Biometrics 57:1191–1197

    MathSciNet  Article  MATH  Google Scholar 

  19. Lu X, Tsiatis AA (2008) Improving the efficiency of the log-rank test using auxiliary covariates. Biometrika 95(3):679–694. doi:10.1093/biomet/asn003

    MathSciNet  Article  MATH  Google Scholar 

  20. Prentice RL, Pyke R (1979) Logistic disease incidence models and case-control studies. Biometrika 66(3):403–411. doi:10.2307/2335158

    MathSciNet  Article  MATH  Google Scholar 

  21. Prentice RL, Kalbfleisch JD, Peterson JAV, Flournoy N, Farewell VT, Breslow NE (1978) The analysis of failure times in the presence of competing risks. Biometrics 34(4):541–554

    Article  MATH  Google Scholar 

  22. Qin J (1998) Inferences for case-control and semiparametric two-sample density ratio models. Biometrika 85(3):619–630

    MathSciNet  Article  MATH  Google Scholar 

  23. Qin J, Zhang B (1997) A goodness-of-fit test for logistic regression models based on case-control data. Biometrika 84(3):609–618

    MathSciNet  Article  MATH  Google Scholar 

  24. Robins JM, Rotnitzky A, Zhao LP (1994) Estimation of regression coefficients when some regressors are not always observed. J Am Stat Assoc 89(427):846–866

    MathSciNet  Article  MATH  Google Scholar 

  25. Rolland M, Tovanabutra S, deCamp AC, Frahm N, Gilbert PB, Sanders-Buell E, Heath L, Magaret CA, Bose M, Bradfield A, O’Sullivan A, Crossler J, Jones T, Nau M, Wong K, Zhao H, Raugi DN, Sorensen S, Stoddard JN, Maust BS, Deng W, Hural J, Dubey S, Michael NL, Shiver J, Corey L, Li F, Self SG, Kim J, Buchbinder S, Casimiro DR, Robertson MN, Duerr A, McElrath MJ, McCutchan FE, Mullins JI (2011) Genetic impact of vaccination on breakthrough HIV-1 sequences from the STEP trial. Nat Med 17(3):366–371. doi:10.1038/nm.2316

    Article  Google Scholar 

  26. Rotnitzky A, Robins JM (1995) Semiparametric regression estimation in the presence of dependent censoring. Biometrika 82(4):805–820. doi:10.1093/biomet/82.4.805

    MathSciNet  Article  MATH  Google Scholar 

  27. Rubin DB (1976) Inference and missing data. Biometrika 63(3):581–590

    MathSciNet  Article  MATH  Google Scholar 

  28. Scharfstein DO, Rotnitzky A, Robins JM (1999) Adjusting for nonignorable drop-out using semiparametric nonresponse models: rejoinder. J Am Stat Assoc 94(448):1135–1146

    MathSciNet  MATH  Google Scholar 

  29. Sun Y, Gilbert PB (2012) Estimation of stratified mark-specific proportional hazards models with missing marks. Scand J Stat 39:34–52

    MathSciNet  Article  MATH  Google Scholar 

  30. Sun Y, Gilbert PB, McKeague IW (2009) Proportional hazards models with continuous marks. Ann Stat 37(1):394–426. doi:10.1214/07-AOS554

    MathSciNet  Article  MATH  Google Scholar 

  31. Sun Y, Li M, Gilbert PB (2013) Mark-specific proportional hazards model with multivariate continuous marks and its application to HIV vaccine efficacy trials. Biostatistics 14(1):60–74

    Article  Google Scholar 

  32. Tan Z (2006) A distributional approach for causal inference using propensity scores. J Am Stat Assoc 101(476):1619–1637

    MathSciNet  Article  MATH  Google Scholar 

  33. van der Laan M, Rose S (2011) Targeted learning., Springer series in statisticsSpringer, New York

    Google Scholar 

  34. Vardi Y (1985) Empirical distributions in selection bias models. Ann Stat 13(1):178–203

    MathSciNet  Article  MATH  Google Scholar 

  35. Zhao LP, Lipsitz S, Lew D (1996) Regression analysis with missing covariate data using estimating equations. Biometrics 52(4):1165–1182. doi:10.2307/2532833

    Article  MATH  Google Scholar 

Download references

Acknowledgments

The authors thank the participants, investigators, and sponsors of the HVTN 502 Step HIV vaccine trial. Research reported in this publication was supported by the National Institute of Allergy and Infectious Diseases of the National Institutes of Health under Award Numbers UM1AI068635 and R37AI054165 and by the Bill and Melinda Gates Foundation (BMGF) Award Number OPP1110049. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH or BMGF.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Michal Juraska.

Ethics declarations

Ethical statement

Institutional review board approval for the HVTN 502 Step study was obtained at all study sites. The study was undertaken in conformance with applicable local and country requirements, and participants gave written informed consent.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 862 KB)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Juraska, M., Gilbert, P.B. Mark-specific hazard ratio model with missing multivariate marks. Lifetime Data Anal 22, 606–625 (2016). https://doi.org/10.1007/s10985-015-9353-9

Download citation

Keywords

  • Augmented inverse probability weighting
  • Biased sampling model
  • Competing risks
  • Cox model
  • Density ratio model
  • Missing data
  • Semiparametric model