Skip to main content

Accounting for Differential Error in Time-to-Event Analyses Using Imperfect Electronic Health Record-Derived Endpoints

  • Chapter
  • First Online:
New Advances in Statistics and Data Science

Part of the book series: ICSA Book Series in Statistics ((ICSABSS))

Abstract

Estimates of the relationship between an outcome and an exposure are biased by imperfect ascertainment of the outcome of interest. In studies using data derived from electronic health records (EHRs), misclassification of outcomes is common and is often related to patient characteristics. For instance, patients with greater comorbid disease burden may use the healthcare system more frequently making it more likely that the EHR will contain a record of their diagnosis, possibly resulting in poorer outcome classification for healthier patients who do not seek care as frequently. This is particularly problematic in studies of time-to-event outcomes in which both the occurrence of an event and the timing of the event, if it occurs, may be captured with error in the EHR. Misclassification-adjusted estimators in the context of time-to-event outcomes are available using discrete time proportional hazards models but may be biased if operating characteristics of the EHR-derived endpoint vary across exposure categories. Motivated by an algorithm for identifying second breast cancer events using EHR data, we investigated the implications of using an imperfectly assessed outcome with differential measurement error in time-to-event analyses. We used simulation studies to demonstrate the magnitude of bias induced by failure to account for error in the status and timing of recurrence and compared alternative methods for correcting this bias. We conclude with general guidance on accounting for outcome misclassification in time-to-event studies using EHR data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

  • Bluethmann, S. M., Mariotto, A. B., & Rowland, J. H. (2016). Anticipating the “silver tsunami”: Prevalence trajectories and comorbidity burden among older cancer survivors in the united states. Cancer Epidemiology, Biomarkers & Prevention, 25(7), 1029–1036.

    Article  Google Scholar 

  • Carroll, R. J., Ruppert, D., Stefanski, L. A., & Crainiceanu, C. M. (2006). Measurement error in nonlinear models: A modern perspective. Boca Raton: CRC Press.

    Book  MATH  Google Scholar 

  • Chubak, J., Onega, T., Zhu, W., Buist, D. S., & Hubbard, R. A. (2015). An electronic health record-based algorithm to ascertain the date of second breast cancer events. Medical Care. https://doi.org/10.1097/MLR.0000000000000352. http://www.ncbi.nlm.nih.gov/pubmed/25856568.

  • Chubak, J., Yu, O., Pocobelli, G., Lamerato, L., Webster, J., Prout, M. N., et al. (2012). Administrative data algorithms to identify second breast cancer events following early-stage invasive breast cancer. Journal of the National Cancer Institute, 104(12), 931–940. https://doi.org/10.1093/jnci/djs233. http://www.ncbi.nlm.nih.gov/pubmed/22547340.

  • Dean, B. B., Lam, J., Natoli, J. L., Butler, Q., Aguilar, D., & Nordyke, R. J. (2009). Review: Use of electronic medical records for health outcomes research a literature review. Medical Care Research and Review, 66(6), 611–638.

    Article  Google Scholar 

  • Earle, C. C., Nattinger, A. B., Potosky, A. L., Lang, K., Mallick, R., Berger, M., et al. (2002). Identifying cancer relapse using seer-medicare data. Medical Care, 40(8), 75–81.

    Google Scholar 

  • Hassett, M. J., Ritzwoller, D. P., Taback, N., Carroll, N., Cronin, A. M., Ting, G.V., et al. (2014). Validating billing/encounter codes as indicators of lung, colorectal, breast, and prostate cancer recurrence using 2 large contemporary cohorts. Medical Care, 52(10), E65–E73.

    Article  Google Scholar 

  • Hersh, W. R., Weiner, M. G., Embi, P. J., Logan, J. R., Payne, P. R., Bernstam, E. V., et al. (2013). Caveats for the use of operational electronic health record data in comparative effectiveness research. Medical care, 51(803), S30.

    Article  Google Scholar 

  • Hripcsak, G., & Albers, D. J. (2013). Next-generation phenotyping of electronic health records. Journal of the American Medical Informatics Association, 20(1), 117–121.

    Article  Google Scholar 

  • Kalbfleisch, J. D., & Prentice, R. L. (1980). The statistical analysis of failure time data. New York: Wiley.

    MATH  Google Scholar 

  • Lamont, E. B., Herndon, J. E., Weeks, J. C., Henderson, I. C., Earle, C. C., Schilsky, R. L., et al. (2006). Measuring disease-free survival and cancer relapse using medicare claims from CALGB breast cancer trial participants (companion to 9344). Journal of the National Cancer Institute, 98(18), 1335–1338.

    Article  Google Scholar 

  • Magder, L. S., & Hughes, J. P. (1997). Logistic regression when the outcome is measured with uncertainty. American Journal of Epidemiology, 146(2), 195–203.

    Article  Google Scholar 

  • Meier, A. S., Richardson, B. A., & Hughes, J. P. (2003). Discrete proportional hazards models for mismeasured outcomes. Biometrics 59(4), 947–954.

    Article  MathSciNet  MATH  Google Scholar 

  • Neuhaus, J. M. (1999). Bias and efficiency loss due to misclassified responses in binary regression. Biometrika, 86(4), 843–855.

    Article  MathSciNet  MATH  Google Scholar 

  • Overhage, J. M., & Overhage, L. M. (2013). Sensible use of observational clinical data. Statistical Methods in Medical Research, 22(1), 7–13.

    Article  MathSciNet  Google Scholar 

  • Richardson, B. A., & Hughes, J. P. (2000). Product limit estimation for infectious disease data when the diagnostic test for the outcome is measured with uncertainty. Biostatistics, 1(3), 341–354.

    Article  MATH  Google Scholar 

  • Snapinn, S. M. (1998). Survival analysis with uncertain endpoints. Biometrics, 54, 209–218.

    Article  MATH  Google Scholar 

  • Warren, J. L., & Yabroff, K. R. (2015). Challenges and opportunities in measuring cancer recurrence in the united states. Journal of the National Cancer Institute, 107(8), djv134.

    Google Scholar 

  • Warren, J. L., Mariotto, A., Melbert, D., Schrag, D., Doria-Rose, P., Penson, D., et al. (2016). Sensitivity of medicare claims to identify cancer recurrence in elderly colorectal and breast cancer patients. Medical Care, 54(8), E47–E54.

    Article  Google Scholar 

  • Weiskopf, N. G., & Weng, C. (2013). Methods and dimensions of electronic health record data quality assessment: Enabling reuse for clinical research. Journal of the American Medical Informatics Association, 20(1), 144–151.

    Article  Google Scholar 

  • Zee, J., & Xie, S. X. (2015). Nonparametric discrete survival function estimation with uncertain endpoints using an internal validation subsample. Biometrics, 71(3), 772–781.

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

Research reported in this paper was supported by the National Cancer Institute of the National Institutes of Health under award number R01CA120562 and R21CA143242. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rebecca A. Hubbard .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Hubbard, R.A., Harton, J., Zhu, W., Wang, L., Chubak, J. (2017). Accounting for Differential Error in Time-to-Event Analyses Using Imperfect Electronic Health Record-Derived Endpoints. In: Chen, DG., Jin, Z., Li, G., Li, Y., Liu, A., Zhao, Y. (eds) New Advances in Statistics and Data Science. ICSA Book Series in Statistics. Springer, Cham. https://doi.org/10.1007/978-3-319-69416-0_14

Download citation

Publish with us

Policies and ethics