Hazard function estimation with cause-of-death data missing at random

  • Qihua WangEmail author
  • Gregg E. Dinse
  • Chunling Liu


Hazard function estimation is an important part of survival analysis. Interest often centers on estimating the hazard function associated with a particular cause of death. We propose three nonparametric kernel estimators for the hazard function, all of which are appropriate when death times are subject to random censorship and censoring indicators can be missing at random. Specifically, we present a regression surrogate estimator, an imputation estimator, and an inverse probability weighted estimator. All three estimators are uniformly strongly consistent and asymptotically normal. We derive asymptotic representations of the mean squared error and the mean integrated squared error for these estimators and we discuss a data-driven bandwidth selection method. A simulation study, conducted to assess finite sample behavior, demonstrates that the proposed hazard estimators perform relatively well. We illustrate our methods with an analysis of some vascular disease data.


Imputation estimator Inverse probability weighted estimator Kernel estimator Regression surrogate estimator 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Blum J.R., Susarla V. (1980) Maximal deviation theory of density and failure rate function estimates based on censored data. In: Krishniah P.R. (ed.) Multivariate analysis. North-Holland, New York, pp 213–222Google Scholar
  2. Cao R., Jácome M.A. (2004) Presmoothed kernel density estimator for censored data. Journal of Nonparametric Statistics 16: 289–309MathSciNetzbMATHCrossRefGoogle Scholar
  3. Cao R., López-de-Ullibarri I., Janssen P., Veraverbeke N. (2005) Presmoothed Kaplan–Meier and Nelson–Aalen estimators. Journal of Nonparametric Statistics 17: 31–56MathSciNetzbMATHCrossRefGoogle Scholar
  4. Cheng P.E. (1994) Nonparametric estimation of mean functionals with data missing at random. Journal of the American Statistical Association 89: 81–87zbMATHCrossRefGoogle Scholar
  5. Dewanji A. (1992) A note on a test for competing risks with missing failure type. Biometrika 79: 855–857CrossRefGoogle Scholar
  6. Diehl S., Stute W. (1988) Kernel density and hazard function estimation in the presence of censoring. Journal of Multivariate Analysis 25: 299–310MathSciNetzbMATHCrossRefGoogle Scholar
  7. Dikta G. (1998) On semiparametric random censorship models. Journal of Statistical Planning and Inference 66: 253–279MathSciNetzbMATHCrossRefGoogle Scholar
  8. Dinse G.E. (1982) Nonparametric estimation for partially-complete time and type of failure data. Biometrics 38: 417–431CrossRefGoogle Scholar
  9. Dinse G.E. (1986) Nonparametric prevalence and mortality estimators for animal experiments with incomplete cause-of-death data. Journal of the American Statistical Association 81: 328–336zbMATHCrossRefGoogle Scholar
  10. Gao G.Z., Tsaitis A.A. (2005) Semiparametric estimators for the regression coefficients in the linear transformation competing risks model with missing cause of failing. Biometrika 92: 875–891MathSciNetzbMATHCrossRefGoogle Scholar
  11. Goetghebeur E.J., Ryan L.M. (1990) A modified log rank test for competing risks with missing failure type. Biometrika 77: 207–211MathSciNetzbMATHCrossRefGoogle Scholar
  12. Goetghebeur E.J., Ryan L.M. (1995) Analysis of competing risks survival data when some failure types are missing. Biometrika 82: 821–833MathSciNetzbMATHCrossRefGoogle Scholar
  13. González-Manteiga W., Cao R., Marron J.S. (1996) Bootstrap selection of the smoothing parameter in nonparametric hazard rate estimation. Journal of the American Statistical Association 91: 1130–1140MathSciNetzbMATHCrossRefGoogle Scholar
  14. Jacome M.A., Gijbels I., Cao R. (2008) Comparison of presmoothing methods in kernel density estimation under censoring. Computational Statistics 23: 381–406MathSciNetzbMATHCrossRefGoogle Scholar
  15. Kalbfleisch J.D., Prentice R.L. (1980) The statistical analysis of failure time data. Wiley, New YorkzbMATHGoogle Scholar
  16. Kaplan E.L., Meier P. (1958) Nonparametric estimation from incomplete observations. Journal of the American Statistical Association 53: 457–481MathSciNetzbMATHCrossRefGoogle Scholar
  17. Klein J.P., Moeschberger M.L. (2003) Survival analysis. Springer, New YorkzbMATHGoogle Scholar
  18. Lipsitz S.R., Zhao L.P., Molenberghs G. (1998) A semiparametric method of multiple imputation. Journal of the Royal Statistical Society, Series B 60: 127–144MathSciNetzbMATHCrossRefGoogle Scholar
  19. Little R.J.A., Rubin D.B. (1987) Statistical analysis with missing data. Wiley, New YorkzbMATHGoogle Scholar
  20. Lo S.-H. (1991) Estimating a survival function with incomplete cause-of-death data. Journal of Multivariate Analysis 39: 217–235MathSciNetzbMATHCrossRefGoogle Scholar
  21. Marron J.S., Padgett J.W. (1987) Asymptotically optimal bandwidth selection for kernel density estimators from randomly right-censored samples. The Annals of Statistics 15: 1520–1535MathSciNetzbMATHCrossRefGoogle Scholar
  22. McKeague I.W., Subramanian S. (1998) Product-limit estimators and Cox regression with missing censoring information. Scandinavian Journal of Statistics 25: 589–601MathSciNetzbMATHCrossRefGoogle Scholar
  23. Nelson W. (1972) Theory and applications of hazard plotting for censored failure data. Technometrics 14: 945–966CrossRefGoogle Scholar
  24. Patil P.N. (1993a) Bandwidth choice for nonparametric hazard rate estimation. Journal of Statistical Planning and Inference 35: 15–30MathSciNetzbMATHCrossRefGoogle Scholar
  25. Patil P.N. (1993b) On the least squares cross-validation bandwidth in hazard rate estimation. The Annals of Statistics 21: 1792–1810MathSciNetzbMATHCrossRefGoogle Scholar
  26. Ramlau-Hansen H. (1983) Smoothing counting process intensities by means of kernel functions. The Annals of Statistics 11: 453–466MathSciNetzbMATHCrossRefGoogle Scholar
  27. Regina Y.C., John V.R. (1985) A histogram estimator of the hazard rate with censored data. The Annals of Statistics 13: 592–605MathSciNetzbMATHCrossRefGoogle Scholar
  28. Robins J.M., Rotnitzky A. (1992) Recovery of information and adjustment for dependent censoring using surrogate markers. In: Jewell N., Dietz K., Farewell V. (eds) AIDS epidemiology—methodological issues. Birkhäuser, Boston, pp 297–331Google Scholar
  29. Robins J.M., Rotnitzky A., Zhao L.P. (1994) Estimation of regression coefficients when some regressors are not always observed. Journal of the American Statistical Association 89: 846–866MathSciNetzbMATHCrossRefGoogle Scholar
  30. Robins J.M., Wang N. (2000) Inference for imputation estimators. Biometrika 87: 113–124MathSciNetzbMATHCrossRefGoogle Scholar
  31. Rubin D.B. (1987) Multiple imputation for nonresponse in surveys. Wiley, New YorkCrossRefGoogle Scholar
  32. Sarda P., Vieu P. (1991) Smoothing parameter selection in hazard estimation. Statistics & Probability Letters 11: 429–434MathSciNetzbMATHCrossRefGoogle Scholar
  33. Scharfstein D.O., Rotnitzky A., Robins J. (1999) Adjusting for nonignorable drop-out using semiparametric nonresponse models (with discussion). Journal of the American Statistical Association 94: 1096–1146MathSciNetzbMATHCrossRefGoogle Scholar
  34. Subramanian S. (2004) Asymptotically efficient estimation of a survival function in the missing censoring indicator model. Journal of Nonparametric Statistics 16: 797–817MathSciNetzbMATHCrossRefGoogle Scholar
  35. Subramanian S. (2006) Survival analysis for the missing censoring indicator model using kernel density estimation techniques. Statistical Methodology 3: 125–136MathSciNetCrossRefGoogle Scholar
  36. Tanner M.A. (1983) A note on the variable kernel estimator of the hazard function from randomly censored data. The Annals of Statistics 11: 994–998MathSciNetzbMATHCrossRefGoogle Scholar
  37. Tanner M.A., Wong W.H. (1983) The estimation of the hazard function from randomly censored data by the kernel method. The Annals of Statistics 11: 989–993MathSciNetzbMATHCrossRefGoogle Scholar
  38. Tsiatis A.A., Davidian M., McNeney B. (2002) Multiple imputation methods for testing treatment differences in survival distributions with missing cause of failure. Biometrika 89: 238–244MathSciNetzbMATHCrossRefGoogle Scholar
  39. van der Laan M.J., McKeague I.W. (1998) Efficient estimation from right-censored data when failure indicators are missing at random. The Annals of Statistics 26: 164–182MathSciNetzbMATHCrossRefGoogle Scholar
  40. Wang Q.H. (1999) Some bounds for the error of an estimator of the hazard function with censored data. Statistics & Probability Letters 44: 319–326MathSciNetzbMATHCrossRefGoogle Scholar
  41. Wang Q.H., Linton O., Härdle W. (2004) Semiparametric regression analysis with missing response at random. Journal of the American Statistical Association 99: 334–345MathSciNetzbMATHCrossRefGoogle Scholar
  42. Wang Q.H., Rao J.N.K. (2002) Empirical likelihood-based inference under imputation for missing response data. The Annals of Statistics 30: 896–924MathSciNetzbMATHCrossRefGoogle Scholar
  43. Zhao L.P., Lipsitz S.R., Lew D. (1996) Regression analysis with missing covariate data using estimating equations. Biometrics 52: 1165–1182zbMATHCrossRefGoogle Scholar

Copyright information

© The Institute of Statistical Mathematics, Tokyo 2010

Authors and Affiliations

  1. 1.Academy of Mathematics and Systems ScienceChinese Academy of ScienceBeijingChina
  2. 2.Department of Mathematics and StatisticsYunnan UniversityKunmingChina
  3. 3.Department of Statistics and Actuarial ScienceThe University of Hong KongPokfulamHong Kong
  4. 4.Biostatistics BranchNational Institute of Environmental Health SciencesResearch Triangle ParkUSA
  5. 5.Department of Applied MathematicsHong Kong Polytechnic UniversityHong HumHong Kong

Personalised recommendations