Accounting for Differential Error in Time-to-Event Analyses Using Imperfect Electronic Health Record-Derived Endpoints

Hubbard, Rebecca A.; Harton, Joanna; Zhu, Weiwei; Wang, Le; Chubak, Jessica

doi:10.1007/978-3-319-69416-0_14

Rebecca A. Hubbard⁹,
Joanna Harton⁹,
Weiwei Zhu¹⁰,
Le Wang⁹ &
…
Jessica Chubak¹⁰

Part of the book series: ICSA Book Series in Statistics ((ICSABSS))

1708 Accesses
2 Citations

Abstract

Estimates of the relationship between an outcome and an exposure are biased by imperfect ascertainment of the outcome of interest. In studies using data derived from electronic health records (EHRs), misclassification of outcomes is common and is often related to patient characteristics. For instance, patients with greater comorbid disease burden may use the healthcare system more frequently making it more likely that the EHR will contain a record of their diagnosis, possibly resulting in poorer outcome classification for healthier patients who do not seek care as frequently. This is particularly problematic in studies of time-to-event outcomes in which both the occurrence of an event and the timing of the event, if it occurs, may be captured with error in the EHR. Misclassification-adjusted estimators in the context of time-to-event outcomes are available using discrete time proportional hazards models but may be biased if operating characteristics of the EHR-derived endpoint vary across exposure categories. Motivated by an algorithm for identifying second breast cancer events using EHR data, we investigated the implications of using an imperfectly assessed outcome with differential measurement error in time-to-event analyses. We used simulation studies to demonstrate the magnitude of bias induced by failure to account for error in the status and timing of recurrence and compared alternative methods for correcting this bias. We conclude with general guidance on accounting for outcome misclassification in time-to-event studies using EHR data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

References

Bluethmann, S. M., Mariotto, A. B., & Rowland, J. H. (2016). Anticipating the “silver tsunami”: Prevalence trajectories and comorbidity burden among older cancer survivors in the united states. Cancer Epidemiology, Biomarkers & Prevention, 25(7), 1029–1036.
Article Google Scholar
Carroll, R. J., Ruppert, D., Stefanski, L. A., & Crainiceanu, C. M. (2006). Measurement error in nonlinear models: A modern perspective. Boca Raton: CRC Press.
Book MATH Google Scholar
Chubak, J., Onega, T., Zhu, W., Buist, D. S., & Hubbard, R. A. (2015). An electronic health record-based algorithm to ascertain the date of second breast cancer events. Medical Care. https://doi.org/10.1097/MLR.0000000000000352. http://www.ncbi.nlm.nih.gov/pubmed/25856568.
Chubak, J., Yu, O., Pocobelli, G., Lamerato, L., Webster, J., Prout, M. N., et al. (2012). Administrative data algorithms to identify second breast cancer events following early-stage invasive breast cancer. Journal of the National Cancer Institute, 104(12), 931–940. https://doi.org/10.1093/jnci/djs233. http://www.ncbi.nlm.nih.gov/pubmed/22547340.
Dean, B. B., Lam, J., Natoli, J. L., Butler, Q., Aguilar, D., & Nordyke, R. J. (2009). Review: Use of electronic medical records for health outcomes research a literature review. Medical Care Research and Review, 66(6), 611–638.
Article Google Scholar
Earle, C. C., Nattinger, A. B., Potosky, A. L., Lang, K., Mallick, R., Berger, M., et al. (2002). Identifying cancer relapse using seer-medicare data. Medical Care, 40(8), 75–81.
Google Scholar
Hassett, M. J., Ritzwoller, D. P., Taback, N., Carroll, N., Cronin, A. M., Ting, G.V., et al. (2014). Validating billing/encounter codes as indicators of lung, colorectal, breast, and prostate cancer recurrence using 2 large contemporary cohorts. Medical Care, 52(10), E65–E73.
Article Google Scholar
Hersh, W. R., Weiner, M. G., Embi, P. J., Logan, J. R., Payne, P. R., Bernstam, E. V., et al. (2013). Caveats for the use of operational electronic health record data in comparative effectiveness research. Medical care, 51(803), S30.
Article Google Scholar
Hripcsak, G., & Albers, D. J. (2013). Next-generation phenotyping of electronic health records. Journal of the American Medical Informatics Association, 20(1), 117–121.
Article Google Scholar
Kalbfleisch, J. D., & Prentice, R. L. (1980). The statistical analysis of failure time data. New York: Wiley.
MATH Google Scholar
Lamont, E. B., Herndon, J. E., Weeks, J. C., Henderson, I. C., Earle, C. C., Schilsky, R. L., et al. (2006). Measuring disease-free survival and cancer relapse using medicare claims from CALGB breast cancer trial participants (companion to 9344). Journal of the National Cancer Institute, 98(18), 1335–1338.
Article Google Scholar
Magder, L. S., & Hughes, J. P. (1997). Logistic regression when the outcome is measured with uncertainty. American Journal of Epidemiology, 146(2), 195–203.
Article Google Scholar
Meier, A. S., Richardson, B. A., & Hughes, J. P. (2003). Discrete proportional hazards models for mismeasured outcomes. Biometrics 59(4), 947–954.
Article MathSciNet MATH Google Scholar
Neuhaus, J. M. (1999). Bias and efficiency loss due to misclassified responses in binary regression. Biometrika, 86(4), 843–855.
Article MathSciNet MATH Google Scholar
Overhage, J. M., & Overhage, L. M. (2013). Sensible use of observational clinical data. Statistical Methods in Medical Research, 22(1), 7–13.
Article MathSciNet Google Scholar
Richardson, B. A., & Hughes, J. P. (2000). Product limit estimation for infectious disease data when the diagnostic test for the outcome is measured with uncertainty. Biostatistics, 1(3), 341–354.
Article MATH Google Scholar
Snapinn, S. M. (1998). Survival analysis with uncertain endpoints. Biometrics, 54, 209–218.
Article MATH Google Scholar
Warren, J. L., & Yabroff, K. R. (2015). Challenges and opportunities in measuring cancer recurrence in the united states. Journal of the National Cancer Institute, 107(8), djv134.
Google Scholar
Warren, J. L., Mariotto, A., Melbert, D., Schrag, D., Doria-Rose, P., Penson, D., et al. (2016). Sensitivity of medicare claims to identify cancer recurrence in elderly colorectal and breast cancer patients. Medical Care, 54(8), E47–E54.
Article Google Scholar
Weiskopf, N. G., & Weng, C. (2013). Methods and dimensions of electronic health record data quality assessment: Enabling reuse for clinical research. Journal of the American Medical Informatics Association, 20(1), 144–151.
Article Google Scholar
Zee, J., & Xie, S. X. (2015). Nonparametric discrete survival function estimation with uncertain endpoints using an internal validation subsample. Biometrics, 71(3), 772–781.
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

Research reported in this paper was supported by the National Cancer Institute of the National Institutes of Health under award number R01CA120562 and R21CA143242. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Author information

Authors and Affiliations

Department of Biostatistics, Epidemiology & Informatics, University of Pennsylvania, Philadelphia, PA, USA
Rebecca A. Hubbard, Joanna Harton & Le Wang
Kaiser Permanente Washington Health Research Institute, Seattle, WA, USA
Weiwei Zhu & Jessica Chubak

Authors

Rebecca A. Hubbard
View author publications
You can also search for this author in PubMed Google Scholar
Joanna Harton
View author publications
You can also search for this author in PubMed Google Scholar
Weiwei Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Le Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jessica Chubak
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rebecca A. Hubbard .

Editor information

Editors and Affiliations

University of North Carolina, Chapel Hill, North Carolina, USA
Ding-Geng Chen
Columbia University, New York, New York, USA
Zhezhen Jin
University of California, Los Angeles, California, USA
Gang Li
University of Michigan-Ann Arbor, Ann Arbor, Michigan, USA
Yi Li
National Institutes of Health, Bethesda, Maryland, USA
Aiyi Liu
Georgia State University, Atlanta, Georgia, USA
Yichuan Zhao

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Hubbard, R.A., Harton, J., Zhu, W., Wang, L., Chubak, J. (2017). Accounting for Differential Error in Time-to-Event Analyses Using Imperfect Electronic Health Record-Derived Endpoints. In: Chen, DG., Jin, Z., Li, G., Li, Y., Liu, A., Zhao, Y. (eds) New Advances in Statistics and Data Science. ICSA Book Series in Statistics. Springer, Cham. https://doi.org/10.1007/978-3-319-69416-0_14

Download citation

DOI: https://doi.org/10.1007/978-3-319-69416-0_14
Published: 18 January 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-69415-3
Online ISBN: 978-3-319-69416-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics