Accuracy of identifying hospital acquired venous thromboembolism by administrative coding: implications for big data and machine learning research

Pellathy, Tiffany; Saul, Melissa; Clermont, Gilles; Dubrawski, Artur W.; Pinsky, Michael R.; Hravnak, Marilyn

doi:10.1007/s10877-021-00664-6

Accuracy of identifying hospital acquired venous thromboembolism by administrative coding: implications for big data and machine learning research

Original Research
Published: 08 February 2021

Volume 36, pages 397–405, (2022)
Cite this article

Journal of Clinical Monitoring and Computing Aims and scope Submit manuscript

Tiffany Pellathy ORCID: orcid.org/0000-0003-3643-5308¹,
Melissa Saul²,
Gilles Clermont²,
Artur W. Dubrawski³,
Michael R. Pinsky² &
…
Marilyn Hravnak¹

519 Accesses
8 Citations
Explore all metrics

Abstract

Big data analytics research using heterogeneous electronic health record (EHR) data requires accurate identification of disease phenotype cases and controls. Overreliance on ground truth determination based on administrative data can lead to biased and inaccurate findings. Hospital-acquired venous thromboembolism (HA-VTE) is challenging to identify due to its temporal evolution and variable EHR documentation. To establish ground truth for machine learning modeling, we compared accuracy of HA-VTE diagnoses made by administrative coding to manual review of gold standard diagnostic test results. We performed retrospective analysis of EHR data on 3680 adult stepdown unit patients identifying HA-VTE. International Classification of Diseases, Ninth Revision (ICD-9-CM) codes for VTE were identified. 4544 radiology reports associated with VTE diagnostic tests were screened using terminology extraction and then manually reviewed by a clinical expert to confirm diagnosis. Of 415 cases with ICD-9-CM codes for VTE, 219 were identified with acute onset type codes. Test report review identified 158 new-onset HA-VTE cases. Only 40% of ICD-9-CM coded cases (n = 87) were confirmed by a positive diagnostic test report, leaving the majority of administratively coded cases unsubstantiated by confirmatory diagnostic test. Additionally, 45% of diagnostic test confirmed HA-VTE cases lacked corresponding ICD codes. ICD-9-CM coding missed diagnostic test-confirmed HA-VTE cases and inaccurately assigned cases without confirmed VTE, suggesting dependence on administrative coding leads to inaccurate HA-VTE phenotyping. Alternative methods to develop more sensitive and specific VTE phenotype solutions portable across EHR vendor data are needed to support case-finding in big-data analytics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Implementation of automatic data extraction from an enterprise database warehouse (EDW) for validating pediatric VTE decision rule: a prospective observational study in a critical care population

Article 11 June 2020

Comparative analysis, applications, and interpretation of electronic health record-based stroke phenotyping methods

Article Open access 07 December 2020

External validation of a claims-based and clinical approach for predicting post-pulmonary embolism outcomes among United States veterans

Article 09 February 2017

Data availability

Data availability in tables and figures.

References

Manyika J. Big data: the next frontier for innovation, competition, and productivity. http://www.mckinsey.com/Insights/MGI/Research/Technology_and_Innovation/Big_data_The_next_frontier_for_innovation. 2011.
Bowton E, et al. Biobanks and electronic medical records: enabling cost-effective research. Sci Transl Med. 2014;6(234):234cm3.
Article PubMed PubMed Central Google Scholar
Pinsky MR, Dubrawski A. Gleaning knowledge from data in the intensive care unit. Am J Respir Crit Care Med. 2014;190(6):606–10.
Article PubMed PubMed Central Google Scholar
Shmueli G. To explain or to predict? Stat Sci. 2010;25(3):289–310.
Article Google Scholar
Basile AO, Ritchie MD. Informatics and machine learning to define the phenotype. Expert Rev Mol Diagn. 2018;18(3):219–26.
Article CAS PubMed PubMed Central Google Scholar
Seymour CW, et al. Derivation, validation, and potential treatment implications of novel clinical phenotypes for sepsis. Jama. 2019;321(20):2003–17.
Article CAS PubMed PubMed Central Google Scholar
Weiskopf NG, Weng C. Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research. J Am Med Inform Assoc. 2013;20(1):144–51.
Article PubMed PubMed Central Google Scholar
Ngiam KY, Khor W. Big data and machine learning algorithms for health-care delivery. Lancet Oncol. 2019;20(5):e262-73.
Article PubMed Google Scholar
Xu J, et al. Review and evaluation of electronic health records-driven phenotype algorithm authoring tools for clinical and translational research. J Am Med Inform Assoc. 2015;22(6):1251–60.
Article PubMed PubMed Central Google Scholar
Richesson R, Smerek M, Electronic health records-based phenotyping. In Rethinking clinical trials: a living textbook of pragmatic clinical trials. Bethesda: NIH Health Care Systems Research Collaboratory; 2017.
Baldereschi M, et al. Administrative data underestimate acute ischemic stroke events and thrombolysis treatments: data from a multicenter validation survey in Italy. PLoS ONE. 2018;13(3):e0193776.
Article PubMed PubMed Central Google Scholar
Small AM, et al. Text mining applied to electronic cardiovascular procedure reports to identify patients with trileaflet aortic stenosis and coronary artery disease. J Biomed Inform. 2017;72:77–84.
Article PubMed Google Scholar
Steiner JM, et al. Identification of adults with congenital heart disease of moderate or great complexity from administrative data. Congenit Heart Dis. 2018;13(1):65–71.
Article PubMed Google Scholar
Ko S, et al. International statistical classification of diseases and related health problems coding underestimates the incidence and prevalence of acute kidney injury and chronic kidney disease in general medical patients. Intern Med J. 2018;48(3):310–5.
Article PubMed Google Scholar
Martin BJ, et al. Coding of obesity in administrative hospital discharge abstract data: accuracy and impact for future research studies. BMC Health Serv Res. 2014;14:70.
Article PubMed PubMed Central Google Scholar
Cundall-Curry DJ, et al., Data errors in the National Hip Fracture Database: a local validation study. Bone Joint J. 2016;98-b(10):1406–9.
Golinvaux NS, et al. Administrative database concerns: accuracy of International Classification of Diseases, Ninth Revision coding is poor for preoperative anemia in patients undergoing spinal fusion. Spine (Phila Pa 1976). 2014;39(24):2019–23.
Article Google Scholar
Delate T, et al. Assessment of the coding accuracy of warfarin-related bleeding events. Thromb Res. 2017;159:86–90.
Article CAS PubMed Google Scholar
McPeek Hinz ER, Bastarache L, Denny JC, A natural language processing algorithm to define a venous thromboembolism phenotype. AMIA Annu Symp Proc; 2013. p. 975–83.
Oake J, et al. Using electronic medical record to identify patients with dyslipidemia in primary care settings: international classification of disease code matters from one region to a national database. Biomed Inform Insights. 2017;9:1178222616685880.
Article PubMed PubMed Central Google Scholar
Wiens J, et al. Do no harm: a roadmap for responsible machine learning for health care. Nat Med. 2019;25(9):1337–40.
Article CAS PubMed Google Scholar
Janssen KJ, et al. Development and validation of clinical prediction models: marginal differences between logistic regression, penalized maximum likelihood estimation, and genetic programming. J Clin Epidemiol. 2012;65(4):404–12.
Article PubMed Google Scholar
Ramos JD, et al. The Khorana score in predicting venous thromboembolism for patients with metastatic urothelial carcinoma and variant histology treated with chemotherapy. Clin Appl Thromb Hemost. 2016;23(7):755–60.
Article PubMed Google Scholar
Alpert JS, Dalen JE. Epidemiology and natural history of venous thromboembolism. Prog Cardiovasc Dis. 1994;36(6):417–22.
Article CAS PubMed Google Scholar
Anderson FA Jr, et al. A population-based perspective of the hospital incidence and case-fatality rates of deep vein thrombosis and pulmonary embolism. The Worcester DVT Study. Arch Intern Med. 1991;151(5):933–8.
Article PubMed Google Scholar
Grosse SD, et al. The economic burden of incident venous thromboembolism in the United States: a review of estimated attributable healthcare costs. Thromb Res. 2016;137:3–10.
Article CAS PubMed Google Scholar
Johnston MJ, et al. A systematic review to identify the factors that affect failure to rescue and escalation of care in surgery. Surgery. 2015;157(4):752–63.
Article PubMed Google Scholar
Silber JH, et al. Failure-to-rescue: comparing definitions to measure quality of care. Med Care. 2007;45(10):918–25.
Article PubMed Google Scholar
Clarke SP, Aiken LH. Failure to rescue. Am J Nurs. 2003;103(1):42–7.
Article PubMed Google Scholar
Hravnak M, et al., Causes of failure to rescue. In: Textbook of rapid response systems. New York: Springer; 2017. p. 95–110.
Ageno W, et al. Factors associated with the timing of diagnosis of venous thromboembolism: results from the MASTER registry. Thromb Res. 2008;121(6):751–6.
Article CAS PubMed Google Scholar
Torres C, Haut ER. Prevention, diagnosis, and management of venous thromboembolism in the critically ill surgical and trauma patient. Curr Opin Crit Care. 2020;26(6):640–7.
Article PubMed Google Scholar
Schulman S, Ageno W, Konstantinides SV. Venous thromboembolism: past, present and future. Thromb Haemost. 2017;117(07):1219–29.
Article PubMed Google Scholar
Yount RJ, Vries JK, Councill CD. The medical archival system: an information retrieval system based on distributed parallel processing. Inf Process Manag. 1991;27(4):379–89.
Article Google Scholar
Simon D, Boring JR, Sensitivity, specificity, and predictive value. In: Walker HK, Hall WD, Hurst JW, editors. Clinical methods: the history, physical, and laboratory examinations. 3rd ed. Boston: Butterworths; 1990.
Ferrao JC, et al. Preprocessing structured clinical data for predictive modeling and decision support. A roadmap to tackle the challenges. Appl Clin Inform. 2016;7(4):1135–53.
Article PubMed PubMed Central Google Scholar
Henderson KE, et al. Clinical validation of the AHRQ postoperative venous thromboembolism patient safety indicator. Jt Comm J Qual Patient Saf. 2009;35(7):370–6.
PubMed Google Scholar
Leonardo Tamriz TH, Nair V. Mini-sentinel systematic evaluation of health outcome of interest definitions for studies using administrative data venous thromboembolism report. 2011.
Florecki KL, et al. What does venous thromboembolism mean in the national surgical quality improvement program? J Surg Res. 2020;251:94–9.
Article PubMed Google Scholar
Butwick AJ, et al. Accuracy of international classification of diseases, ninth revision, codes for postpartum hemorrhage among women undergoing cesarean delivery. Transfusion. 2018;58(4):998–1005.
Article PubMed PubMed Central Google Scholar
Nouraei SA, et al. Accuracy of clinician-clinical coder information handover following acute medical admissions: implication for using administrative datasets in clinical outcomes management. J Public Health (Oxf). 2016;38(2):352–62.
Article Google Scholar
Burles K, et al. Limitations of pulmonary embolism ICD-10 codes in emergency department administrative data: let the buyer beware. BMC Med Res Methodol. 2017;17(1):89.
Article PubMed PubMed Central Google Scholar
Fang MC, et al. Validity of using inpatient and outpatient administrative codes to identify acute venous thromboembolism: the CVRN VTE study. Med Care. 2017;55(12):e137-43.
Article PubMed PubMed Central Google Scholar
O’Malley KJ, et al. Measuring diagnoses: ICD code accuracy. Health Serv Res. 2005;40(5 Pt 2):1620–39.
Article PubMed PubMed Central Google Scholar
Leisman DE, et al. Development and reporting of prediction models: guidance for authors from editors of respiratory, sleep, and critical care journals. Crit Care Med. 2020;48(5):623.
Article PubMed PubMed Central Google Scholar
Pruitt Z, Pracht E. Upcoding emergency admissions for non-life-threatening injuries to children. Am J Manag Care. 2013;19(11):917–24.
PubMed Google Scholar
Services USCfMM. Hospital-Acquired Condition (HAC) Reduction Program. 07/21/2020. https://www.cms.gov/Medicare/Quality-Initiatives-Patient-Assessment-Instruments/Value-Based-Programs/HAC/Hospital-Acquired-Conditions. Cited 4 Aug 2020
Wei W-Q, et al. Combining billing codes, clinical notes, and medications from electronic health records provides superior phenotyping performance. J Am Med Inform Assoc. 2016;23(e1):e20–7.
Article PubMed Google Scholar
Liao KP, et al. Development of phenotype algorithms using electronic medical records and incorporating natural language processing. BMJ. 2015;350:h1885.
Article PubMed PubMed Central Google Scholar
Alotaibi GS, et al. The validity of ICD codes coupled with imaging procedure codes for identifying acute venous thromboembolism using administrative data. Vasc Med. 2015;20(4):364–8.
Article PubMed Google Scholar
Murff HJ, et al. Automated identification of postoperative complications within an electronic medical record using natural language processing. JAMA. 2011;306(8):848–55.
Article CAS PubMed Google Scholar
FitzHenry F, et al. Exploring the frontier of electronic health record surveillance: the case of post-operative complications. Med Care. 2013;51(6):509.
Article PubMed PubMed Central Google Scholar
Rochefort CM, et al. A novel method of adverse event detection can accurately identify venous thromboembolisms (VTEs) from narrative electronic health record data. J Am Med Inform Assoc. 2015;22(1):155–65.
Article PubMed Google Scholar
Dantes RB, et al. Improved identification of venous thromboembolism from electronic medical records using a novel information extraction software platform. Med Care. 2018;56(9):e54.
Article PubMed PubMed Central Google Scholar
Correa EA Jr, Lopes AA, Amancio DR. Word sense disambiguation: a complex network approach. Inf Sci. 2018;442:103–13.
Article Google Scholar
Richesson R, Gold WL, Rasmussen SL. Electronic health records-based phenotyping. In: t.N.H.C.S.R.C.E.H.R.C.W. Group, editors. Rethinking clinical trials: a living textbook of pragmatic clinical trials. Bethesda, MD: NIH Health Care Systems Research Collaboratory. Updated 20 Oct 20.
Banda JM, et al. Advances in electronic phenotyping: from rule-based definitions to machine learning models. Annu Rev Biomed Data Sci. 2018;1:53–68.
Article PubMed PubMed Central Google Scholar
Bowman S. Why ICD-10 is worth the trouble. J AHIMA. 2008;79(3):24–9; quiz 41–2.
Averill RF, Butler R. Misperceptions, misinformation, and misrepresentations: the ICD-10-CM/PCS saga. J AHIMA; 2013.
Topaz M, Shafran-Topaz L, Bowles KH. ICD-9 to ICD-10: evolution, revolution, and current debates in the United States. Perspect Health Inf Manag. 2013; 10(Spring):1d.
Le Gal G, Righini M. Controversies in the diagnosis of venous thromboembolism. J Thromb Haemost. 2015;13:1.
Google Scholar
Sanfilippo KM, et al. Improving accuracy of International Classification of Diseases codes for venous thromboembolism in administrative data. Thromb Res. 2015;135(4):616–20.
Article CAS PubMed PubMed Central Google Scholar
Chen Y, et al. Applying active learning to high-throughput phenotyping algorithms for electronic health records data. J Am Med Inform Assoc. 2013;20(e2):e253-9.
Article PubMed Google Scholar
The International Classification of Diseases, 9th Revision, Clinical Modification (ICD-9-CM). U.S. Department of Health and Human Services; 2007.
ICD9DATA.COM. 2006 ICD-9-CM Diagnosis Codes 2006. http://www.icd9data.com/2006/Volume1/default.htm. Cited 5 Aug 2020.
2007 ICD-9-CM Diagnosis Codes. 2007. http://www.icd9data.com/2007/Volume1/default.htm. Cited 5 Aug 2020.
2008 ICD-9-CM Diagnosis Codes. 2008. http://www.icd9data.com/2008/Volume1/default.htm. Cited 5 Aug 2020.
2020 ICD-10-CM/PCS Medical Coding Reference. 2020. https://www.icd10data.com/. Cited 5 Aug 2020.

Download references

Acknowledgements

This research was supported by National Institutes of Health R01NR01391 (all), F31NR01810 (TP, MH).

Author information

Authors and Affiliations

University of Pittsburgh School of Nursing, 336 Victoria Hall; 3500 Victoria Street, Pittsburgh, PA, 15213, USA
Tiffany Pellathy & Marilyn Hravnak
University of Pittsburgh School of Medicine, Pittsburgh, PA, 15213, USA
Melissa Saul, Gilles Clermont & Michael R. Pinsky
School of Computer Science, Auton Lab, Carnegie Mellon University, Pittsburgh, PA, 15213, USA
Artur W. Dubrawski

Authors

Tiffany Pellathy
View author publications
You can also search for this author in PubMed Google Scholar
Melissa Saul
View author publications
You can also search for this author in PubMed Google Scholar
Gilles Clermont
View author publications
You can also search for this author in PubMed Google Scholar
Artur W. Dubrawski
View author publications
You can also search for this author in PubMed Google Scholar
Michael R. Pinsky
View author publications
You can also search for this author in PubMed Google Scholar
Marilyn Hravnak
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Study design and implementation (TP, MS, GC, MRP, MH). Data Collection, Annotation (TP, MS). Data Analysis (TP, GC, MH). Manuscript Development and Review (TP, MS, GC, AWD, MRP, MH).

Corresponding author

Correspondence to Tiffany Pellathy.

Ethics declarations

Conflict of interest

The authors have no commercial conflicts of interest with this work.

Ethical approval

The study was approved by the institutional review boards of the University of Pittsburgh and Carnegie Mellon University.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pellathy, T., Saul, M., Clermont, G. et al. Accuracy of identifying hospital acquired venous thromboembolism by administrative coding: implications for big data and machine learning research. J Clin Monit Comput 36, 397–405 (2022). https://doi.org/10.1007/s10877-021-00664-6

Download citation

Received: 19 September 2020
Accepted: 20 January 2021
Published: 08 February 2021
Issue Date: April 2022
DOI: https://doi.org/10.1007/s10877-021-00664-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Accuracy of identifying hospital acquired venous thromboembolism by administrative coding: implications for big data and machine learning research

Abstract

Access this article

Similar content being viewed by others

Implementation of automatic data extraction from an enterprise database warehouse (EDW) for validating pediatric VTE decision rule: a prospective observational study in a critical care population

Comparative analysis, applications, and interpretation of electronic health record-based stroke phenotyping methods

External validation of a claims-based and clinical approach for predicting post-pulmonary embolism outcomes among United States veterans

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Accuracy of identifying hospital acquired venous thromboembolism by administrative coding: implications for big data and machine learning research

Abstract

Access this article

Similar content being viewed by others

Implementation of automatic data extraction from an enterprise database warehouse (EDW) for validating pediatric VTE decision rule: a prospective observational study in a critical care population

Comparative analysis, applications, and interpretation of electronic health record-based stroke phenotyping methods

External validation of a claims-based and clinical approach for predicting post-pulmonary embolism outcomes among United States veterans

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation