Abstract
Big data analytics research using heterogeneous electronic health record (EHR) data requires accurate identification of disease phenotype cases and controls. Overreliance on ground truth determination based on administrative data can lead to biased and inaccurate findings. Hospital-acquired venous thromboembolism (HA-VTE) is challenging to identify due to its temporal evolution and variable EHR documentation. To establish ground truth for machine learning modeling, we compared accuracy of HA-VTE diagnoses made by administrative coding to manual review of gold standard diagnostic test results. We performed retrospective analysis of EHR data on 3680 adult stepdown unit patients identifying HA-VTE. International Classification of Diseases, Ninth Revision (ICD-9-CM) codes for VTE were identified. 4544 radiology reports associated with VTE diagnostic tests were screened using terminology extraction and then manually reviewed by a clinical expert to confirm diagnosis. Of 415 cases with ICD-9-CM codes for VTE, 219 were identified with acute onset type codes. Test report review identified 158 new-onset HA-VTE cases. Only 40% of ICD-9-CM coded cases (n = 87) were confirmed by a positive diagnostic test report, leaving the majority of administratively coded cases unsubstantiated by confirmatory diagnostic test. Additionally, 45% of diagnostic test confirmed HA-VTE cases lacked corresponding ICD codes. ICD-9-CM coding missed diagnostic test-confirmed HA-VTE cases and inaccurately assigned cases without confirmed VTE, suggesting dependence on administrative coding leads to inaccurate HA-VTE phenotyping. Alternative methods to develop more sensitive and specific VTE phenotype solutions portable across EHR vendor data are needed to support case-finding in big-data analytics.
Similar content being viewed by others
Data availability
Data availability in tables and figures.
References
Manyika J. Big data: the next frontier for innovation, competition, and productivity. http://www.mckinsey.com/Insights/MGI/Research/Technology_and_Innovation/Big_data_The_next_frontier_for_innovation. 2011.
Bowton E, et al. Biobanks and electronic medical records: enabling cost-effective research. Sci Transl Med. 2014;6(234):234cm3.
Pinsky MR, Dubrawski A. Gleaning knowledge from data in the intensive care unit. Am J Respir Crit Care Med. 2014;190(6):606–10.
Shmueli G. To explain or to predict? Stat Sci. 2010;25(3):289–310.
Basile AO, Ritchie MD. Informatics and machine learning to define the phenotype. Expert Rev Mol Diagn. 2018;18(3):219–26.
Seymour CW, et al. Derivation, validation, and potential treatment implications of novel clinical phenotypes for sepsis. Jama. 2019;321(20):2003–17.
Weiskopf NG, Weng C. Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research. J Am Med Inform Assoc. 2013;20(1):144–51.
Ngiam KY, Khor W. Big data and machine learning algorithms for health-care delivery. Lancet Oncol. 2019;20(5):e262-73.
Xu J, et al. Review and evaluation of electronic health records-driven phenotype algorithm authoring tools for clinical and translational research. J Am Med Inform Assoc. 2015;22(6):1251–60.
Richesson R, Smerek M, Electronic health records-based phenotyping. In Rethinking clinical trials: a living textbook of pragmatic clinical trials. Bethesda: NIH Health Care Systems Research Collaboratory; 2017.
Baldereschi M, et al. Administrative data underestimate acute ischemic stroke events and thrombolysis treatments: data from a multicenter validation survey in Italy. PLoS ONE. 2018;13(3):e0193776.
Small AM, et al. Text mining applied to electronic cardiovascular procedure reports to identify patients with trileaflet aortic stenosis and coronary artery disease. J Biomed Inform. 2017;72:77–84.
Steiner JM, et al. Identification of adults with congenital heart disease of moderate or great complexity from administrative data. Congenit Heart Dis. 2018;13(1):65–71.
Ko S, et al. International statistical classification of diseases and related health problems coding underestimates the incidence and prevalence of acute kidney injury and chronic kidney disease in general medical patients. Intern Med J. 2018;48(3):310–5.
Martin BJ, et al. Coding of obesity in administrative hospital discharge abstract data: accuracy and impact for future research studies. BMC Health Serv Res. 2014;14:70.
Cundall-Curry DJ, et al., Data errors in the National Hip Fracture Database: a local validation study. Bone Joint J. 2016;98-b(10):1406–9.
Golinvaux NS, et al. Administrative database concerns: accuracy of International Classification of Diseases, Ninth Revision coding is poor for preoperative anemia in patients undergoing spinal fusion. Spine (Phila Pa 1976). 2014;39(24):2019–23.
Delate T, et al. Assessment of the coding accuracy of warfarin-related bleeding events. Thromb Res. 2017;159:86–90.
McPeek Hinz ER, Bastarache L, Denny JC, A natural language processing algorithm to define a venous thromboembolism phenotype. AMIA Annu Symp Proc; 2013. p. 975–83.
Oake J, et al. Using electronic medical record to identify patients with dyslipidemia in primary care settings: international classification of disease code matters from one region to a national database. Biomed Inform Insights. 2017;9:1178222616685880.
Wiens J, et al. Do no harm: a roadmap for responsible machine learning for health care. Nat Med. 2019;25(9):1337–40.
Janssen KJ, et al. Development and validation of clinical prediction models: marginal differences between logistic regression, penalized maximum likelihood estimation, and genetic programming. J Clin Epidemiol. 2012;65(4):404–12.
Ramos JD, et al. The Khorana score in predicting venous thromboembolism for patients with metastatic urothelial carcinoma and variant histology treated with chemotherapy. Clin Appl Thromb Hemost. 2016;23(7):755–60.
Alpert JS, Dalen JE. Epidemiology and natural history of venous thromboembolism. Prog Cardiovasc Dis. 1994;36(6):417–22.
Anderson FA Jr, et al. A population-based perspective of the hospital incidence and case-fatality rates of deep vein thrombosis and pulmonary embolism. The Worcester DVT Study. Arch Intern Med. 1991;151(5):933–8.
Grosse SD, et al. The economic burden of incident venous thromboembolism in the United States: a review of estimated attributable healthcare costs. Thromb Res. 2016;137:3–10.
Johnston MJ, et al. A systematic review to identify the factors that affect failure to rescue and escalation of care in surgery. Surgery. 2015;157(4):752–63.
Silber JH, et al. Failure-to-rescue: comparing definitions to measure quality of care. Med Care. 2007;45(10):918–25.
Clarke SP, Aiken LH. Failure to rescue. Am J Nurs. 2003;103(1):42–7.
Hravnak M, et al., Causes of failure to rescue. In: Textbook of rapid response systems. New York: Springer; 2017. p. 95–110.
Ageno W, et al. Factors associated with the timing of diagnosis of venous thromboembolism: results from the MASTER registry. Thromb Res. 2008;121(6):751–6.
Torres C, Haut ER. Prevention, diagnosis, and management of venous thromboembolism in the critically ill surgical and trauma patient. Curr Opin Crit Care. 2020;26(6):640–7.
Schulman S, Ageno W, Konstantinides SV. Venous thromboembolism: past, present and future. Thromb Haemost. 2017;117(07):1219–29.
Yount RJ, Vries JK, Councill CD. The medical archival system: an information retrieval system based on distributed parallel processing. Inf Process Manag. 1991;27(4):379–89.
Simon D, Boring JR, Sensitivity, specificity, and predictive value. In: Walker HK, Hall WD, Hurst JW, editors. Clinical methods: the history, physical, and laboratory examinations. 3rd ed. Boston: Butterworths; 1990.
Ferrao JC, et al. Preprocessing structured clinical data for predictive modeling and decision support. A roadmap to tackle the challenges. Appl Clin Inform. 2016;7(4):1135–53.
Henderson KE, et al. Clinical validation of the AHRQ postoperative venous thromboembolism patient safety indicator. Jt Comm J Qual Patient Saf. 2009;35(7):370–6.
Leonardo Tamriz TH, Nair V. Mini-sentinel systematic evaluation of health outcome of interest definitions for studies using administrative data venous thromboembolism report. 2011.
Florecki KL, et al. What does venous thromboembolism mean in the national surgical quality improvement program? J Surg Res. 2020;251:94–9.
Butwick AJ, et al. Accuracy of international classification of diseases, ninth revision, codes for postpartum hemorrhage among women undergoing cesarean delivery. Transfusion. 2018;58(4):998–1005.
Nouraei SA, et al. Accuracy of clinician-clinical coder information handover following acute medical admissions: implication for using administrative datasets in clinical outcomes management. J Public Health (Oxf). 2016;38(2):352–62.
Burles K, et al. Limitations of pulmonary embolism ICD-10 codes in emergency department administrative data: let the buyer beware. BMC Med Res Methodol. 2017;17(1):89.
Fang MC, et al. Validity of using inpatient and outpatient administrative codes to identify acute venous thromboembolism: the CVRN VTE study. Med Care. 2017;55(12):e137-43.
O’Malley KJ, et al. Measuring diagnoses: ICD code accuracy. Health Serv Res. 2005;40(5 Pt 2):1620–39.
Leisman DE, et al. Development and reporting of prediction models: guidance for authors from editors of respiratory, sleep, and critical care journals. Crit Care Med. 2020;48(5):623.
Pruitt Z, Pracht E. Upcoding emergency admissions for non-life-threatening injuries to children. Am J Manag Care. 2013;19(11):917–24.
Services USCfMM. Hospital-Acquired Condition (HAC) Reduction Program. 07/21/2020. https://www.cms.gov/Medicare/Quality-Initiatives-Patient-Assessment-Instruments/Value-Based-Programs/HAC/Hospital-Acquired-Conditions. Cited 4 Aug 2020
Wei W-Q, et al. Combining billing codes, clinical notes, and medications from electronic health records provides superior phenotyping performance. J Am Med Inform Assoc. 2016;23(e1):e20–7.
Liao KP, et al. Development of phenotype algorithms using electronic medical records and incorporating natural language processing. BMJ. 2015;350:h1885.
Alotaibi GS, et al. The validity of ICD codes coupled with imaging procedure codes for identifying acute venous thromboembolism using administrative data. Vasc Med. 2015;20(4):364–8.
Murff HJ, et al. Automated identification of postoperative complications within an electronic medical record using natural language processing. JAMA. 2011;306(8):848–55.
FitzHenry F, et al. Exploring the frontier of electronic health record surveillance: the case of post-operative complications. Med Care. 2013;51(6):509.
Rochefort CM, et al. A novel method of adverse event detection can accurately identify venous thromboembolisms (VTEs) from narrative electronic health record data. J Am Med Inform Assoc. 2015;22(1):155–65.
Dantes RB, et al. Improved identification of venous thromboembolism from electronic medical records using a novel information extraction software platform. Med Care. 2018;56(9):e54.
Correa EA Jr, Lopes AA, Amancio DR. Word sense disambiguation: a complex network approach. Inf Sci. 2018;442:103–13.
Richesson R, Gold WL, Rasmussen SL. Electronic health records-based phenotyping. In: t.N.H.C.S.R.C.E.H.R.C.W. Group, editors. Rethinking clinical trials: a living textbook of pragmatic clinical trials. Bethesda, MD: NIH Health Care Systems Research Collaboratory. Updated 20 Oct 20.
Banda JM, et al. Advances in electronic phenotyping: from rule-based definitions to machine learning models. Annu Rev Biomed Data Sci. 2018;1:53–68.
Bowman S. Why ICD-10 is worth the trouble. J AHIMA. 2008;79(3):24–9; quiz 41–2.
Averill RF, Butler R. Misperceptions, misinformation, and misrepresentations: the ICD-10-CM/PCS saga. J AHIMA; 2013.
Topaz M, Shafran-Topaz L, Bowles KH. ICD-9 to ICD-10: evolution, revolution, and current debates in the United States. Perspect Health Inf Manag. 2013; 10(Spring):1d.
Le Gal G, Righini M. Controversies in the diagnosis of venous thromboembolism. J Thromb Haemost. 2015;13:1.
Sanfilippo KM, et al. Improving accuracy of International Classification of Diseases codes for venous thromboembolism in administrative data. Thromb Res. 2015;135(4):616–20.
Chen Y, et al. Applying active learning to high-throughput phenotyping algorithms for electronic health records data. J Am Med Inform Assoc. 2013;20(e2):e253-9.
The International Classification of Diseases, 9th Revision, Clinical Modification (ICD-9-CM). U.S. Department of Health and Human Services; 2007.
ICD9DATA.COM. 2006 ICD-9-CM Diagnosis Codes 2006. http://www.icd9data.com/2006/Volume1/default.htm. Cited 5 Aug 2020.
2007 ICD-9-CM Diagnosis Codes. 2007. http://www.icd9data.com/2007/Volume1/default.htm. Cited 5 Aug 2020.
2008 ICD-9-CM Diagnosis Codes. 2008. http://www.icd9data.com/2008/Volume1/default.htm. Cited 5 Aug 2020.
2020 ICD-10-CM/PCS Medical Coding Reference. 2020. https://www.icd10data.com/. Cited 5 Aug 2020.
Acknowledgements
This research was supported by National Institutes of Health R01NR01391 (all), F31NR01810 (TP, MH).
Author information
Authors and Affiliations
Contributions
Study design and implementation (TP, MS, GC, MRP, MH). Data Collection, Annotation (TP, MS). Data Analysis (TP, GC, MH). Manuscript Development and Review (TP, MS, GC, AWD, MRP, MH).
Corresponding author
Ethics declarations
Conflict of interest
The authors have no commercial conflicts of interest with this work.
Ethical approval
The study was approved by the institutional review boards of the University of Pittsburgh and Carnegie Mellon University.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Pellathy, T., Saul, M., Clermont, G. et al. Accuracy of identifying hospital acquired venous thromboembolism by administrative coding: implications for big data and machine learning research. J Clin Monit Comput 36, 397–405 (2022). https://doi.org/10.1007/s10877-021-00664-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10877-021-00664-6