Abstract
Estimates of the relationship between an outcome and an exposure are biased by imperfect ascertainment of the outcome of interest. In studies using data derived from electronic health records (EHRs), misclassification of outcomes is common and is often related to patient characteristics. For instance, patients with greater comorbid disease burden may use the healthcare system more frequently making it more likely that the EHR will contain a record of their diagnosis, possibly resulting in poorer outcome classification for healthier patients who do not seek care as frequently. This is particularly problematic in studies of time-to-event outcomes in which both the occurrence of an event and the timing of the event, if it occurs, may be captured with error in the EHR. Misclassification-adjusted estimators in the context of time-to-event outcomes are available using discrete time proportional hazards models but may be biased if operating characteristics of the EHR-derived endpoint vary across exposure categories. Motivated by an algorithm for identifying second breast cancer events using EHR data, we investigated the implications of using an imperfectly assessed outcome with differential measurement error in time-to-event analyses. We used simulation studies to demonstrate the magnitude of bias induced by failure to account for error in the status and timing of recurrence and compared alternative methods for correcting this bias. We conclude with general guidance on accounting for outcome misclassification in time-to-event studies using EHR data.
References
Bluethmann, S. M., Mariotto, A. B., & Rowland, J. H. (2016). Anticipating the “silver tsunami”: Prevalence trajectories and comorbidity burden among older cancer survivors in the united states. Cancer Epidemiology, Biomarkers & Prevention, 25(7), 1029–1036.
Carroll, R. J., Ruppert, D., Stefanski, L. A., & Crainiceanu, C. M. (2006). Measurement error in nonlinear models: A modern perspective. Boca Raton: CRC Press.
Chubak, J., Onega, T., Zhu, W., Buist, D. S., & Hubbard, R. A. (2015). An electronic health record-based algorithm to ascertain the date of second breast cancer events. Medical Care. https://doi.org/10.1097/MLR.0000000000000352. http://www.ncbi.nlm.nih.gov/pubmed/25856568.
Chubak, J., Yu, O., Pocobelli, G., Lamerato, L., Webster, J., Prout, M. N., et al. (2012). Administrative data algorithms to identify second breast cancer events following early-stage invasive breast cancer. Journal of the National Cancer Institute, 104(12), 931–940. https://doi.org/10.1093/jnci/djs233. http://www.ncbi.nlm.nih.gov/pubmed/22547340.
Dean, B. B., Lam, J., Natoli, J. L., Butler, Q., Aguilar, D., & Nordyke, R. J. (2009). Review: Use of electronic medical records for health outcomes research a literature review. Medical Care Research and Review, 66(6), 611–638.
Earle, C. C., Nattinger, A. B., Potosky, A. L., Lang, K., Mallick, R., Berger, M., et al. (2002). Identifying cancer relapse using seer-medicare data. Medical Care, 40(8), 75–81.
Hassett, M. J., Ritzwoller, D. P., Taback, N., Carroll, N., Cronin, A. M., Ting, G.V., et al. (2014). Validating billing/encounter codes as indicators of lung, colorectal, breast, and prostate cancer recurrence using 2 large contemporary cohorts. Medical Care, 52(10), E65–E73.
Hersh, W. R., Weiner, M. G., Embi, P. J., Logan, J. R., Payne, P. R., Bernstam, E. V., et al. (2013). Caveats for the use of operational electronic health record data in comparative effectiveness research. Medical care, 51(803), S30.
Hripcsak, G., & Albers, D. J. (2013). Next-generation phenotyping of electronic health records. Journal of the American Medical Informatics Association, 20(1), 117–121.
Kalbfleisch, J. D., & Prentice, R. L. (1980). The statistical analysis of failure time data. New York: Wiley.
Lamont, E. B., Herndon, J. E., Weeks, J. C., Henderson, I. C., Earle, C. C., Schilsky, R. L., et al. (2006). Measuring disease-free survival and cancer relapse using medicare claims from CALGB breast cancer trial participants (companion to 9344). Journal of the National Cancer Institute, 98(18), 1335–1338.
Magder, L. S., & Hughes, J. P. (1997). Logistic regression when the outcome is measured with uncertainty. American Journal of Epidemiology, 146(2), 195–203.
Meier, A. S., Richardson, B. A., & Hughes, J. P. (2003). Discrete proportional hazards models for mismeasured outcomes. Biometrics 59(4), 947–954.
Neuhaus, J. M. (1999). Bias and efficiency loss due to misclassified responses in binary regression. Biometrika, 86(4), 843–855.
Overhage, J. M., & Overhage, L. M. (2013). Sensible use of observational clinical data. Statistical Methods in Medical Research, 22(1), 7–13.
Richardson, B. A., & Hughes, J. P. (2000). Product limit estimation for infectious disease data when the diagnostic test for the outcome is measured with uncertainty. Biostatistics, 1(3), 341–354.
Snapinn, S. M. (1998). Survival analysis with uncertain endpoints. Biometrics, 54, 209–218.
Warren, J. L., & Yabroff, K. R. (2015). Challenges and opportunities in measuring cancer recurrence in the united states. Journal of the National Cancer Institute, 107(8), djv134.
Warren, J. L., Mariotto, A., Melbert, D., Schrag, D., Doria-Rose, P., Penson, D., et al. (2016). Sensitivity of medicare claims to identify cancer recurrence in elderly colorectal and breast cancer patients. Medical Care, 54(8), E47–E54.
Weiskopf, N. G., & Weng, C. (2013). Methods and dimensions of electronic health record data quality assessment: Enabling reuse for clinical research. Journal of the American Medical Informatics Association, 20(1), 144–151.
Zee, J., & Xie, S. X. (2015). Nonparametric discrete survival function estimation with uncertain endpoints using an internal validation subsample. Biometrics, 71(3), 772–781.
Acknowledgements
Research reported in this paper was supported by the National Cancer Institute of the National Institutes of Health under award number R01CA120562 and R21CA143242. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this chapter
Cite this chapter
Hubbard, R.A., Harton, J., Zhu, W., Wang, L., Chubak, J. (2017). Accounting for Differential Error in Time-to-Event Analyses Using Imperfect Electronic Health Record-Derived Endpoints. In: Chen, DG., Jin, Z., Li, G., Li, Y., Liu, A., Zhao, Y. (eds) New Advances in Statistics and Data Science. ICSA Book Series in Statistics. Springer, Cham. https://doi.org/10.1007/978-3-319-69416-0_14
Download citation
DOI: https://doi.org/10.1007/978-3-319-69416-0_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-69415-3
Online ISBN: 978-3-319-69416-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)