Journal of Statistical Theory and Practice

, Volume 7, Issue 2, pp 381–400 | Cite as

Bias Correction Methods for Misclassified Covariates in the Cox Model: Comparison of Five Correction Methods by Simulation and Data Analysis

  • Heejung BangEmail author
  • Ya-Lin Chiu
  • Jay S. Kaufman
  • Mehul D. Patel
  • Gerardo Heiss
  • Kathryn M. Rose


Measurement error/misclassification is commonplace in research when variable(s) cannot be measured accurately. A number of statistical methods have been developed to tackle this problem in a variety of settings and contexts. However, relatively few methods are available to handle misclassified categorical exposure variable(s) in the Cox proportional hazards regression model. In this article, we aim to review and compare different methods to handle this problem—naive methods, regression calibration, pooled estimation, multiple imputation, corrected score estimation, and MC-SIMEX—by simulation. These methods are also applied to a life course study with recalled data and historical records. In practice, the issue of measurement error/misclassification should be accounted for in design and analysis, whenever possible. Also, in the analysis, it could be more ideal to implement more than one correction method for estimation and inference, with proper understanding of underlying assumptions.


ARIC Childhood SES Cox proportional hazards regression Measurement error Misclassification Recalled error 

AMS Classification



Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Akazawa, K., N. Kinukawa, and T. Nakamura. 1998. A note on the corrected score function corrected for misclassification. J. Jpn. Stat. Soc. 28, 115–123.CrossRefzbMATHGoogle Scholar
  2. ARIC. 1989. The ARIC Investigators. The Atherosclerosis Risk in Community (ARIC) study: Design and objectives. Am. J. Epidemiol. 129, 687–702.CrossRefGoogle Scholar
  3. Armstrong, B. 1985. Measurement error in generalized linear models. Commu. Stat. Seri. B 14, 529–544.CrossRefzbMATHGoogle Scholar
  4. Bang, H. 2005. Medical cost analysis: Application to colorectal cancer data from the SEER Medicare database. Contemp. Clin. Trials, 26, 586–597.CrossRefGoogle Scholar
  5. Bang, H. 2010. Introduction to observational studies, In Analysis of observational health-care data using SAS, ed. D. Faries, A. Leon, J. Haro, and R. Obenchain, 3–19 Cary, NC, SAS Press Series.Google Scholar
  6. Burris, J., T. Johnson, and D. O’Rourke. 2003. Validating self-reports of socially desirable behaviors. American Statistical Association Proceedings, American Association for Public Opinion Research—Section on Survey Research Methods, 32–36.Google Scholar
  7. Carroll, R. 2005. Measurement error in epidemiologic studies. In Encyclopedia of biostatistics. New York, NY, Wiley.Google Scholar
  8. Carroll, R., D. Ruppert, and L. Stefanski. 1995. Measurement error in nonlinear models. London, UK, Chapman & Hall.CrossRefzbMATHGoogle Scholar
  9. Carroll, R., and L. Stefanski. 1990. Approximate quasilikelihood estimation in models with surrogate predictors. J. Am. Stat. Assoc., 85, 652–663.CrossRefGoogle Scholar
  10. Cole, S., H. Chu, and S. Greenland. 2006. Multiple-imputation for measurement-error correction. Int. J. Epidemiol. 35, 1074–1081.CrossRefGoogle Scholar
  11. Cook, J., and L. Stefanski. 1995. A simulation extrapolation method for parametric measurement error models. J. Am. Stat. Assoc., 89, 1314–1328.CrossRefzbMATHGoogle Scholar
  12. Cox, D. 1972. Regression models and life-tables (with discussion). J. R. Stat. Soc. Ser. B, 34, 187–220.zbMATHGoogle Scholar
  13. Dalen, I., J. Buonaccorsi, P. Laake, A. Hjartåker, and M. Thoresen. 2006. Regression analysis with categorized regression calibrated exposure: some interesting findings. Emerging Themes Epidemiol., 3, 6.CrossRefGoogle Scholar
  14. Freedman, L. S., D. Midthune, R. J. Carroll, and V. Kipnis. 2008. A comparison of regression calibration, moment reconstruction and imputation for adjusting for covariate measurement error. Stat. Med. 27, 5195–5216.CrossRefMathSciNetGoogle Scholar
  15. Fuller, W. 1987. Measurement error models. New York, NY, John Wiley & Sons.CrossRefzbMATHGoogle Scholar
  16. Galobardes, B., J. Lynch, and G. Smith. 2004. Childhood socioeconomic circumstances and cause-specific mortality in adulthood: Systematic review and interpretation. Epidemiol. Rev., 26, 7–21.CrossRefGoogle Scholar
  17. Gleser, L. J. 1990. Improvements of the naive approach to estimation in nonlinear errors-in-variables regression models. In Statistical analysis of measurement error models and applications, ed. P. Brown, and W. Fuller, 99–114. Providence, RI, American Mathematics Society.CrossRefGoogle Scholar
  18. Greenland, S. 1980. The effect of misclassification in the presence of covariates. Am. J. Epidemiol., 112, 564–569.CrossRefGoogle Scholar
  19. Greenland, S., and P. Gustafson. 2006. Accounting for independent nondifferential misclassification does not increase certainty that an observed association is in the correct direction. Am. J. Epidemiol. 164, 63–68.CrossRefGoogle Scholar
  20. Greenland, S., and J. M. Robins. 1985. Confounding and misclassification. Am. J. Epidemiol., 122, 495–506.CrossRefGoogle Scholar
  21. Hernandez-Diaz, S., E. Schisterman, and M. Hernan. 2006. The birth weight “paradox” uncovered? Am. J. Epidemiol., 164, 1115–1120.CrossRefGoogle Scholar
  22. Huang, Y., and C. Wang. 2001. Consistent function methods for logistic regression with errors in covariates. J. Am. Stat. Assoc., 95, 1209–1219.CrossRefGoogle Scholar
  23. Jurek, A., G. Maldonado, S. Greenland, and T. Church. 2006. Exposure-measurement error is frequently ignored when interpreting epidemiologic study results. Eur. J. Epidemiol., 21, 871–876.CrossRefGoogle Scholar
  24. Kalbfleisch, J., and R. Prentice. 2002. The statistical analysis of failure time data. New York, NY, Wiley.CrossRefzbMATHGoogle Scholar
  25. Kauhanen, L., H.-M. Lakka, J. Lynch, and J. Kauhanen. 2006. Social disadvantages in childhood and risk of all-cause death and cardiovascular disease in later life: A comparison of historical and retrospective childhood information. Int. J. Epidemiol., 35, 962–968.CrossRefGoogle Scholar
  26. Kuchenhoff, H., W. Lederer, and E. Lesaffre. 2007. Asymptotic variance estimation for the misclassification SIMEX. Comput. Stat. Data Analy., 51, 6197–6211.CrossRefMathSciNetzbMATHGoogle Scholar
  27. Kuchenhoff, H., S. Mwalili, and E. Lesaffre. 2006. A general method for dealing with misclassification in regression: The misclassification SIMEX. Biometrics, 62, 85–96.CrossRefMathSciNetzbMATHGoogle Scholar
  28. Lederer, W., and H. Kuchenhoff. 2006. A short introduction to the SIMEX and MCSIMEX. R News, 6, 26–31.Google Scholar
  29. Liao, X., D. M. Zucker, Y. Li, and D. Spiegelman, 2011. Survival analysis with error-prone time-varying covariates: A risk set calibration approach. Biometrics, 67, 50–58.CrossRefMathSciNetzbMATHGoogle Scholar
  30. Lindsey, J. 1995. Fitting parametric counting processes by using log-linear models. App. Stat. 44, 201–212.CrossRefzbMATHGoogle Scholar
  31. Little, R., and D. Rubin. 2002. Statistical analysis with missing data. New York, NY, John Wiley & Sons.CrossRefzbMATHGoogle Scholar
  32. Loomis, D., D. B. Richardson, and L. Elliott. 2005. Poisson regression analysis of ungrouped data. Occu. Environ. Med. 62, 325–329.CrossRefGoogle Scholar
  33. Messer, K., and L. Natarajan, 2008. Maximum likelihood, multiple imputation and regression calibration for measurement error adjustment. Stat. Med., 27, 6332–6350.CrossRefMathSciNetGoogle Scholar
  34. Nakamura, T. 1990. Corrected score function of errors-in-variables models: Methodology and application to generalized linear models. Biometrika, 77, 127–137.CrossRefMathSciNetzbMATHGoogle Scholar
  35. Oakes, J., and J. Kaufman. 2006. Methods in social epidemiology. San Francisco, CA, Jossey-Bass.Google Scholar
  36. Patel, M. D., K. M. Rose, C. R. Owens, H. Bang, and J. S. Kaufman. 2012. Performance of automated and manual coding systems for occupational data: A case study of historical records. Am. J. Ind. Med. 55, 228–231.CrossRefGoogle Scholar
  37. Prentice, R. 1982. Covariate measurement errors and parameter estimation in a failure time regression model. Biometrika, 69, 331–342.CrossRefMathSciNetzbMATHGoogle Scholar
  38. Qi, L., Y.-F. Wang, and Y. He. 2010. A comparison of multiple imputation and fully augmented weighted estimators for Cox regression with missing covariates. Stat. Med., 29, 2592–2604.CrossRefMathSciNetGoogle Scholar
  39. Rose, K., J. S. Perhac, H. Bang, and G. Heiss, 2008. Historical records as a source of information for childhood socioeconomic status: results from a pilot study of decedents. Ann. Epidemiol., 18, 357–363.CrossRefGoogle Scholar
  40. Rose, K. M., J. L. Wood, E. A. Whitsel, R. Pollitt, A. V. Diez Roux, D. K. Yoon, S. Knowles, and G. Heiss. 2004. Linking historical addresses with census tract data from the 1960–80 decennial censuses: Experiences from the life course SES, social context and cardiovascular disease study. Int. J. Health Geogr., 17, 27.CrossRefGoogle Scholar
  41. Rosner, B., W. C. Willett, and D. Spiegelman. 1989. Correction of logistic regression relative risk estimates and confidence intervals for systematic within-person measurement error (with Discussion). Stat. Med., 8, 1051–1069.CrossRefGoogle Scholar
  42. Rubin, D. 1976. Inference and missing data. Biometrika, 63, 581–692.CrossRefMathSciNetzbMATHGoogle Scholar
  43. Seppa, K., and T. Hakulinen. 2009. Mean and median survival times of cancer patients should be corrected for informative censoring. J. Clin. Epidemiol., 62, 1095–1102.CrossRefGoogle Scholar
  44. Slate, E. H., and D. Bandyopadhyay, 2009. An investigation of the MC-SIMEX method with application to measurement error in periodontal outcomes. Stat. Med., 28, 3523–3538.CrossRefMathSciNetGoogle Scholar
  45. Spiegelman, D. 1997. Regression calibration method for correcting measurement error bias in nutrition epidemiology. Am. J. Clin. Nutri., 65, 1179S–1186S.CrossRefGoogle Scholar
  46. Spiegelman, D., R. J. Carroll, and V. Kipnis, 2001. Efficient regression calibration for logistic regression in main study/internal validation study designs with an imperfect reference instrument. Stat. Med., 20, 139–160.CrossRefGoogle Scholar
  47. Van Buuren, S., H. Boshuizen, and D. Knook, 1999. Multiple imputation of missing blood pressure covariates in survival analysis. Stat. Med., 18, 681–694.CrossRefGoogle Scholar
  48. White, I. 2006. Commentary: Dealing with measurement error: Multiple imputation or regression calibration? Int. J. Epidemiol., 35, 1081–1082.CrossRefGoogle Scholar
  49. Yanez, N. D., R. Kronmal, L. Shemanski, and B. Psaty. 2002. A regression model for longitudinal change in the presence of measurement error. Ann. Epidemiol., 12, 34–38.CrossRefGoogle Scholar
  50. Zucker, D. and D. Spiegelman. 2008. Corrected score estimation in the proportional hazards model with misclassified discrete covariates. Stat. Med., 27, 1911–1933.CrossRefMathSciNetGoogle Scholar

Copyright information

© Grace Scientific Publishing 2013

Authors and Affiliations

  • Heejung Bang
    • 1
    Email author
  • Ya-Lin Chiu
    • 2
  • Jay S. Kaufman
    • 3
  • Mehul D. Patel
    • 4
  • Gerardo Heiss
    • 4
  • Kathryn M. Rose
    • 5
  1. 1.Division of Biostatistics, Department of Public Health SciencesUniversity of CaliforniaDavisUSA
  2. 2.Division of Biostatistics and Epidemiology, Department of Public HealthWeill Cornell Medical CollegeNew YorkUSA
  3. 3.Department of Epidemiology, Biostatistics and Occupational HealthMcGill UniversityMontrealCanada
  4. 4.Department of EpidemiologyUniversity of North CarolinaChapel HillUSA
  5. 5.SRA International, Inc.DurhamUSA

Personalised recommendations