Accounting for misclassification in electronic health records-derived exposures using generalized linear finite mixture models

Hubbard, Rebecca A.; Johnson, Eric; Chubak, Jessica; Wernli, Karen J.; Kamineni, Aruna; Bogart, Andy; Rutter, Carolyn M.

doi:10.1007/s10742-016-0149-5

Accounting for misclassification in electronic health records-derived exposures using generalized linear finite mixture models

Published: 03 June 2016

Volume 17, pages 101–112, (2017)
Cite this article

Health Services and Outcomes Research Methodology Aims and scope Submit manuscript

Rebecca A. Hubbard¹,
Eric Johnson²,
Jessica Chubak^2,3,
Karen J. Wernli²,
Aruna Kamineni²,
Andy Bogart⁴ &
…
Carolyn M. Rutter⁴

571 Accesses
10 Citations
3 Altmetric
Explore all metrics

Abstract

Exposures derived from electronic health records (EHR) may be misclassified, leading to biased estimates of their association with outcomes of interest. An example of this problem arises in the context of cancer screening where test indication, the purpose for which a test was performed, is often unavailable. This poses a challenge to understanding the effectiveness of screening tests because estimates of screening test effectiveness are biased if some diagnostic tests are misclassified as screening. Prediction models have been developed for a variety of exposure variables that can be derived from EHR, but no previous research has investigated appropriate methods for obtaining unbiased association estimates using these predicted probabilities. The full likelihood incorporating information on both the predicted probability of exposure-class membership and the association between the exposure and outcome of interest can be expressed using a finite mixture model. When the regression model of interest is a generalized linear model (GLM), the expectation–maximization algorithm can be used to estimate the parameters using standard software for GLMs. Using simulation studies, we compared the bias and efficiency of this mixture model approach to alternative approaches including multiple imputation and dichotomization of the predicted probabilities to create a proxy for the missing predictor. The mixture model was the only approach that was unbiased across all scenarios investigated. Finally, we explored the performance of these alternatives in a study of colorectal cancer screening with colonoscopy. These findings have broad applicability in studies using EHR data where gold-standard exposures are unavailable and prediction models have been developed for estimating proxies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bias reduction methods for propensity scores estimated from error-prone EHR-derived covariates

Article 10 September 2020

The validity of electronic health data for measuring smoking status: a systematic review and meta-analysis

Article Open access 02 February 2024

The Impact of Joint Misclassification of Exposures and Outcomes on the Results of Epidemiologic Research

Article 20 April 2018

References

Adams, K.F., Johnson, E.A., Chubak, J., Kamineni, A., Doubeni, C.A., Buist, D.S.M., Weinmann, S., Doria-Rose, V.P., Rutter, C.M.: Ascertainment of colonoscopy indication using administrative data. Egems 3(1), 11 (2015)
Google Scholar
Ananthakrishnan, A.N., Cai, T., Savova, G., Cheng, S.C., Chen, P., Perez, R.G., Gainer, V.S., Murphy, S.N., Szolovits, P., Xia, Z., Shaw, S., Churchill, S., Karlson, E.W., Kohane, I., Plenge, R.M., Liao, K.P.: Improving case definition of Crohn’s disease and ulcerative colitis in electronic medical records using natural language processing: a novel informatics approach. Inflamm. Bowel Dis. 19(7), 1411–1420 (2013). doi:10.1097/MIB.0b013e31828133fd
Article PubMed PubMed Central Google Scholar
Boehmer, U., Kressin, N.R., Berlowitz, D.R., Christiansen, C.L., Kazis, L.E., Jones, J.A.: Self-reported vs administrative race/ethnicity data and study results. Am. J. Public Health 92(9), 1471–1473 (2002). doi:10.2105/Ajph.92.9.1471
Article PubMed PubMed Central Google Scholar
Brenner, H., Stock, C., Hoffmeister, M.: Effect of screening sigmoidoscopy and screening colonoscopy on colorectal cancer incidence and mortality: systematic review and meta-analysis of randomised controlled trials and observational studies. BMJ 348, g2467 (2014). doi:10.1136/bmj.g2467
Article PubMed PubMed Central Google Scholar
El-Serag, H.B., Petersen, L., Hampel, H., Richardson, P., Cooper, G.: The use of screening colonoscopy for patients cared for by the Department of Veterans Affairs. Arch. Intern. Med. 166(20), 2202–2208 (2006). doi:10.1001/archinte.166.20.2202
Article PubMed Google Scholar
Elliott, M.N., Morrison, P.A., Fremont, A., McCaffrey, D.F., Pantoja, P., Lurie, N.: Using the census bureau’s surname list to improve estimates of race/ethnicity and associated disparities. Health Serv. Outcomes Res. Method. 9(2), 69–83 (2009)
Article Google Scholar
Fisher, D.A., Grubber, J.M., Castor, J.M., Coffman, C.J.: Ascertainment of colonoscopy indication using administrative data. Dig. Dis. Sci. 55(6), 1721–1725 (2011). doi:10.1007/s10620-010-1200-y
Article Google Scholar
Gomez, S.L., Kelsey, J.L., Glaser, S.L., Lee, M.M., Sidney, S.: Inconsistencies between self-reported ethnicity and ethnicity recorded in a health maintenance organization. Ann. Epidemiol. 15(1), 71–79 (2005). doi:10.1016/j.annepidem.2004.03.002
Article PubMed Google Scholar
Hunt, L., Jorgensen, M.: Mixture model clustering for mixed data with missing information. Comput Stat Data An 41(3–4), 429–440 (2003). doi:10.1016/S0167-9473(02)00190-1
Article Google Scholar
Jansen, R.: Maximum likelihood in a generalized linear finite mixture model by using the EM algorithm. Biometrics 49, 227–231 (1993)
Article Google Scholar
Levin, T.R., Zhao, W., Conell, C., Seeff, L.C., Manninen, D.L., Shapiro, J.A., Schulman, J.: Complications of colonoscopy in an integrated health care delivery system. Ann. Intern. Med. 145(12), 880–886 (2006)
Article PubMed Google Scholar
Liao, K.P., Cai, T., Gainer, V., Goryachev, S., Zeng-treitler, Q., Raychaudhuri, S., Szolovits, P., Churchill, S., Murphy, S., Kohane, I., Karlson, E.W., Plenge, R.M.: Electronic medical records for discovery research in rheumatoid arthritis. Arthritis Care Res. 62(8), 1120–1127 (2010). doi:10.1002/acr.20184
Article Google Scholar
Lin, O.S., Kozarek, R.A., Cha, J.M.: Impact of sigmoidoscopy and colonoscopy on colorectal cancer incidence and mortality: an evidence-based review of published prospective and retrospective studies. Intest. Res. 12(4), 268–274 (2014). doi:10.5217/ir.2014.12.4.268
Article PubMed PubMed Central Google Scholar
Little, R.J., Rubin, D.B.: Statistical analysis with missing data, 2nd edn. Wiley, Hoboken (2002)
Google Scholar
McCaffrey, D.F., Elliott, M.N.: Power of tests for a dichotomous independent variable measured with error. Health Serv. Res. 43(3), 1085–1101 (2008). doi:10.1111/j.1475-6773.2007.00810.x
Article PubMed PubMed Central Google Scholar
Redner, R.A., Walker, H.F.: Mixture densities, maximum-likelihood and the Em algorithm. Siam Rev 26(2), 195–237 (1984). doi:10.1137/1026034
Article Google Scholar
Richesson, R., Smerek, M.: Electronic health records-based phenotyping. http://sites.duke.edu/rethinkingclinicaltrials/informed-consent-in-pragmatic-clinical-trials/ (2015). Accessed 30 Nov 2015
Siegel, R., Desantis, C., Jemal, A.: Colorectal cancer statistics, 2014. CA Cancer J. Clin. 64(2), 104–117 (2014). doi:10.3322/caac.21220
Article PubMed Google Scholar
Sun, J.M., McNaughton, C.D., Zhang, P., Perer, A., Gkoulalas-Divanis, A., Denny, J.C., Kirby, J., Lasko, T., Saip, A., Malin, B.A.: Predicting changes in hypertension control using electronic health records from a chronic disease management program. J. Am. Med. Inform. Assoc. 21(2), 337–344 (2014). doi:10.1136/Amiajnl-2013-002033
Article PubMed Google Scholar
Tamblyn, R., Eguale, T., Huang, A., Winslade, N., Doran, P.: The incidence and determinants of primary nonadherence with prescribed medication in primary care: a cohort study. Ann. Intern. Med. 160(7), 441–450 (2014). doi:10.7326/M13-1705
Article PubMed Google Scholar
Thompson, T.J., Smith, P.J., Boyle, J.P.: Finite mixture models with concomitant information: assessing diagnostic criteria for diabetes. J. R. Stat. Soc. Ser. C Appl. Stat. 47, 393–404 (1998)
Article Google Scholar
US Preventive Services Task Force: Screening for colorectal cancer: U.S. Preventive Services Task Force recommendation statement. Ann. Intern. Med. 149(9), 627–637 (2008)
Article Google Scholar
Vermunt, J.K.: Latent class modeling with covariates: two improved three-step approaches. Polit. Anal. 18(4), 450–469 (2010). doi:10.1093/pan/mpq025
Article Google Scholar
Weiss, N.S.: Analysis of case-control studies of the efficacy of screening for cancer: how should we deal with tests done in persons with symptoms? Am. J. Epidemiol. 147(12), 1099–1102 (1998)
Article CAS PubMed Google Scholar
Weiss, N.S., McKnight, B., Stevens, N.G.: Approaches to the analysis of case–control studies of the efficacy of screening for cancer. Am. J. Epidemiol. 135(7), 817–823 (1992)
Article CAS PubMed Google Scholar
West, C.N., Geiger, A.M., Greene, S.M., Harris, E.L., Liu, I.L., Barton, M.B., Elmore, J.G., Rolnick, S., Nekhlyudov, L., Altschuler, A., Herrinton, L.J., Fletcher, S.W., Emmons, K.M.: Race and ethnicity: comparing medical records to self-reports. J. Natl. Cancer Inst. Monogr. 35, 72–74 (2005). doi:10.1093/jncimonographs/lgi041
Article Google Scholar
Winawer, S.J., Fletcher, R.H., Miller, L., Godlee, F., Stolar, M.H., Mulrow, C.D., Woolf, S.H., Glick, S.N., Ganiats, T.G., Bond, J.H., Rosen, L., Zapka, J.G., Olsen, S.J., Giardiello, F.M., Sisk, J.E., vanAntwerp, R., BrownDavis, C., Marciniak, D.A., Mayer, R.J.: Colorectal cancer screening: clinical guidelines and rationale. Gastroenterology 112(2), 594–642 (1997). doi:10.1053/Gast.1997.V112.Agast970594
Article CAS PubMed Google Scholar
Wu, L.T., Gersing, K., Burchett, B., Woody, G.E., Blazer, D.G.: Substance use disorders and comorbid Axis I and II psychiatric disorders among young psychiatric patients: findings from a large electronic health records database. J. Psychiatr. Res. 45(11), 1453–1462 (2011). doi:10.1016/j.jpsychires.2011.06.012
Article PubMed PubMed Central Google Scholar

Download references

Acknowledgments

This work was supported by the National Cancer Institute of the National Institutes of Health (Grant Number U01CA152959). The collection of cancer incidence data used in this study was supported by the Cancer Surveillance System of the Fred Hutchinson Cancer Research Center, which is funded by Contract Nos. N01-CN-67009 and N01-PC-35142 from the Surveillance, Epidemiology and End Results (SEER) Program of the National Cancer Institute with additional support from the Fred Hutchinson Cancer Research Center and the State of Washington.

Author information

Authors and Affiliations

Department of Biostatistics and Epidemiology, University of Pennsylvania, 604 Blockley Hall, 423 Guardian Dr, Philadelphia, PA, 19104, USA
Rebecca A. Hubbard
Group Health Research Institute, Seattle, WA, USA
Eric Johnson, Jessica Chubak, Karen J. Wernli & Aruna Kamineni
Department of Epidemiology, University of Washington, Seattle, WA, USA
Jessica Chubak
RAND Corporation, Santa Monica, CA, USA
Andy Bogart & Carolyn M. Rutter

Authors

Rebecca A. Hubbard
View author publications
You can also search for this author in PubMed Google Scholar
Eric Johnson
View author publications
You can also search for this author in PubMed Google Scholar
Jessica Chubak
View author publications
You can also search for this author in PubMed Google Scholar
Karen J. Wernli
View author publications
You can also search for this author in PubMed Google Scholar
Aruna Kamineni
View author publications
You can also search for this author in PubMed Google Scholar
Andy Bogart
View author publications
You can also search for this author in PubMed Google Scholar
Carolyn M. Rutter
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rebecca A. Hubbard.

Ethics declarations

Conflict of interest

All authors declare that they have no conflicts of interest.

Ethical approval

All procedures performed in studies involving human participants were in accordance with the ethical standards of the Group Health Institutional Review Board and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards. This article does not contain any studies with animals performed by any of the authors.

Informed consent

The Group Health Institutional Review Board approved a waiver of consent for this study.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hubbard, R.A., Johnson, E., Chubak, J. et al. Accounting for misclassification in electronic health records-derived exposures using generalized linear finite mixture models. Health Serv Outcomes Res Method 17, 101–112 (2017). https://doi.org/10.1007/s10742-016-0149-5

Download citation

Received: 04 December 2015
Revised: 27 April 2016
Accepted: 26 May 2016
Published: 03 June 2016
Issue Date: June 2017
DOI: https://doi.org/10.1007/s10742-016-0149-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Accounting for misclassification in electronic health records-derived exposures using generalized linear finite mixture models

Abstract

Access this article

Similar content being viewed by others

Bias reduction methods for propensity scores estimated from error-prone EHR-derived covariates

The validity of electronic health data for measuring smoking status: a systematic review and meta-analysis

The Impact of Joint Misclassification of Exposures and Outcomes on the Results of Epidemiologic Research

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Informed consent

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Accounting for misclassification in electronic health records-derived exposures using generalized linear finite mixture models

Abstract

Access this article

Similar content being viewed by others

Bias reduction methods for propensity scores estimated from error-prone EHR-derived covariates

The validity of electronic health data for measuring smoking status: a systematic review and meta-analysis

The Impact of Joint Misclassification of Exposures and Outcomes on the Results of Epidemiologic Research

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Informed consent

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation