Skip to main content

Advertisement

Log in

Accuracy of identifying hospital acquired venous thromboembolism by administrative coding: implications for big data and machine learning research

  • Original Research
  • Published:
Journal of Clinical Monitoring and Computing Aims and scope Submit manuscript

Abstract

Big data analytics research using heterogeneous electronic health record (EHR) data requires accurate identification of disease phenotype cases and controls. Overreliance on ground truth determination based on administrative data can lead to biased and inaccurate findings. Hospital-acquired venous thromboembolism (HA-VTE) is challenging to identify due to its temporal evolution and variable EHR documentation. To establish ground truth for machine learning modeling, we compared accuracy of HA-VTE diagnoses made by administrative coding to manual review of gold standard diagnostic test results. We performed retrospective analysis of EHR data on 3680 adult stepdown unit patients identifying HA-VTE. International Classification of Diseases, Ninth Revision (ICD-9-CM) codes for VTE were identified. 4544 radiology reports associated with VTE diagnostic tests were screened using terminology extraction and then manually reviewed by a clinical expert to confirm diagnosis. Of 415 cases with ICD-9-CM codes for VTE, 219 were identified with acute onset type codes. Test report review identified 158 new-onset HA-VTE cases. Only 40% of ICD-9-CM coded cases (n = 87) were confirmed by a positive diagnostic test report, leaving the majority of administratively coded cases unsubstantiated by confirmatory diagnostic test. Additionally, 45% of diagnostic test confirmed HA-VTE cases lacked corresponding ICD codes. ICD-9-CM coding missed diagnostic test-confirmed HA-VTE cases and inaccurately assigned cases without confirmed VTE, suggesting dependence on administrative coding leads to inaccurate HA-VTE phenotyping. Alternative methods to develop more sensitive and specific VTE phenotype solutions portable across EHR vendor data are needed to support case-finding in big-data analytics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Data availability

Data availability in tables and figures.

References

  1. Manyika J. Big data: the next frontier for innovation, competition, and productivity. http://www.mckinsey.com/Insights/MGI/Research/Technology_and_Innovation/Big_data_The_next_frontier_for_innovation. 2011.

  2. Bowton E, et al. Biobanks and electronic medical records: enabling cost-effective research. Sci Transl Med. 2014;6(234):234cm3.

    Article  PubMed  PubMed Central  Google Scholar 

  3. Pinsky MR, Dubrawski A. Gleaning knowledge from data in the intensive care unit. Am J Respir Crit Care Med. 2014;190(6):606–10.

    Article  PubMed  PubMed Central  Google Scholar 

  4. Shmueli G. To explain or to predict? Stat Sci. 2010;25(3):289–310.

    Article  Google Scholar 

  5. Basile AO, Ritchie MD. Informatics and machine learning to define the phenotype. Expert Rev Mol Diagn. 2018;18(3):219–26.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Seymour CW, et al. Derivation, validation, and potential treatment implications of novel clinical phenotypes for sepsis. Jama. 2019;321(20):2003–17.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Weiskopf NG, Weng C. Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research. J Am Med Inform Assoc. 2013;20(1):144–51.

    Article  PubMed  PubMed Central  Google Scholar 

  8. Ngiam KY, Khor W. Big data and machine learning algorithms for health-care delivery. Lancet Oncol. 2019;20(5):e262-73.

    Article  PubMed  Google Scholar 

  9. Xu J, et al. Review and evaluation of electronic health records-driven phenotype algorithm authoring tools for clinical and translational research. J Am Med Inform Assoc. 2015;22(6):1251–60.

    Article  PubMed  PubMed Central  Google Scholar 

  10. Richesson R, Smerek M, Electronic health records-based phenotyping. In Rethinking clinical trials: a living textbook of pragmatic clinical trials. Bethesda: NIH Health Care Systems Research Collaboratory; 2017.

  11. Baldereschi M, et al. Administrative data underestimate acute ischemic stroke events and thrombolysis treatments: data from a multicenter validation survey in Italy. PLoS ONE. 2018;13(3):e0193776.

    Article  PubMed  PubMed Central  Google Scholar 

  12. Small AM, et al. Text mining applied to electronic cardiovascular procedure reports to identify patients with trileaflet aortic stenosis and coronary artery disease. J Biomed Inform. 2017;72:77–84.

    Article  PubMed  Google Scholar 

  13. Steiner JM, et al. Identification of adults with congenital heart disease of moderate or great complexity from administrative data. Congenit Heart Dis. 2018;13(1):65–71.

    Article  PubMed  Google Scholar 

  14. Ko S, et al. International statistical classification of diseases and related health problems coding underestimates the incidence and prevalence of acute kidney injury and chronic kidney disease in general medical patients. Intern Med J. 2018;48(3):310–5.

    Article  PubMed  Google Scholar 

  15. Martin BJ, et al. Coding of obesity in administrative hospital discharge abstract data: accuracy and impact for future research studies. BMC Health Serv Res. 2014;14:70.

    Article  PubMed  PubMed Central  Google Scholar 

  16. Cundall-Curry DJ, et al., Data errors in the National Hip Fracture Database: a local validation study. Bone Joint J. 2016;98-b(10):1406–9.

  17. Golinvaux NS, et al. Administrative database concerns: accuracy of International Classification of Diseases, Ninth Revision coding is poor for preoperative anemia in patients undergoing spinal fusion. Spine (Phila Pa 1976). 2014;39(24):2019–23.

    Article  Google Scholar 

  18. Delate T, et al. Assessment of the coding accuracy of warfarin-related bleeding events. Thromb Res. 2017;159:86–90.

    Article  CAS  PubMed  Google Scholar 

  19. McPeek Hinz ER, Bastarache L, Denny JC, A natural language processing algorithm to define a venous thromboembolism phenotype. AMIA Annu Symp Proc; 2013. p. 975–83.

  20. Oake J, et al. Using electronic medical record to identify patients with dyslipidemia in primary care settings: international classification of disease code matters from one region to a national database. Biomed Inform Insights. 2017;9:1178222616685880.

    Article  PubMed  PubMed Central  Google Scholar 

  21. Wiens J, et al. Do no harm: a roadmap for responsible machine learning for health care. Nat Med. 2019;25(9):1337–40.

    Article  CAS  PubMed  Google Scholar 

  22. Janssen KJ, et al. Development and validation of clinical prediction models: marginal differences between logistic regression, penalized maximum likelihood estimation, and genetic programming. J Clin Epidemiol. 2012;65(4):404–12.

    Article  PubMed  Google Scholar 

  23. Ramos JD, et al. The Khorana score in predicting venous thromboembolism for patients with metastatic urothelial carcinoma and variant histology treated with chemotherapy. Clin Appl Thromb Hemost. 2016;23(7):755–60.

    Article  PubMed  Google Scholar 

  24. Alpert JS, Dalen JE. Epidemiology and natural history of venous thromboembolism. Prog Cardiovasc Dis. 1994;36(6):417–22.

    Article  CAS  PubMed  Google Scholar 

  25. Anderson FA Jr, et al. A population-based perspective of the hospital incidence and case-fatality rates of deep vein thrombosis and pulmonary embolism. The Worcester DVT Study. Arch Intern Med. 1991;151(5):933–8.

    Article  PubMed  Google Scholar 

  26. Grosse SD, et al. The economic burden of incident venous thromboembolism in the United States: a review of estimated attributable healthcare costs. Thromb Res. 2016;137:3–10.

    Article  CAS  PubMed  Google Scholar 

  27. Johnston MJ, et al. A systematic review to identify the factors that affect failure to rescue and escalation of care in surgery. Surgery. 2015;157(4):752–63.

    Article  PubMed  Google Scholar 

  28. Silber JH, et al. Failure-to-rescue: comparing definitions to measure quality of care. Med Care. 2007;45(10):918–25.

    Article  PubMed  Google Scholar 

  29. Clarke SP, Aiken LH. Failure to rescue. Am J Nurs. 2003;103(1):42–7.

    Article  PubMed  Google Scholar 

  30. Hravnak M, et al., Causes of failure to rescue. In: Textbook of rapid response systems.  New York: Springer; 2017. p. 95–110.

  31. Ageno W, et al. Factors associated with the timing of diagnosis of venous thromboembolism: results from the MASTER registry. Thromb Res. 2008;121(6):751–6.

    Article  CAS  PubMed  Google Scholar 

  32. Torres C, Haut ER. Prevention, diagnosis, and management of venous thromboembolism in the critically ill surgical and trauma patient. Curr Opin Crit Care. 2020;26(6):640–7.

    Article  PubMed  Google Scholar 

  33. Schulman S, Ageno W, Konstantinides SV. Venous thromboembolism: past, present and future. Thromb Haemost. 2017;117(07):1219–29.

    Article  PubMed  Google Scholar 

  34. Yount RJ, Vries JK, Councill CD. The medical archival system: an information retrieval system based on distributed parallel processing. Inf Process Manag. 1991;27(4):379–89.

    Article  Google Scholar 

  35. Simon D, Boring JR, Sensitivity, specificity, and predictive value. In: Walker HK, Hall WD, Hurst JW, editors.  Clinical methods: the history, physical, and laboratory examinations. 3rd ed. Boston: Butterworths; 1990.

  36. Ferrao JC, et al. Preprocessing structured clinical data for predictive modeling and decision support. A roadmap to tackle the challenges. Appl Clin Inform. 2016;7(4):1135–53.

    Article  PubMed  PubMed Central  Google Scholar 

  37. Henderson KE, et al. Clinical validation of the AHRQ postoperative venous thromboembolism patient safety indicator. Jt Comm J Qual Patient Saf. 2009;35(7):370–6.

    PubMed  Google Scholar 

  38. Leonardo Tamriz TH, Nair V. Mini-sentinel systematic evaluation of health outcome of interest definitions for studies using administrative data venous thromboembolism report. 2011.

  39. Florecki KL, et al. What does venous thromboembolism mean in the national surgical quality improvement program? J Surg Res. 2020;251:94–9.

    Article  PubMed  Google Scholar 

  40. Butwick AJ, et al. Accuracy of international classification of diseases, ninth revision, codes for postpartum hemorrhage among women undergoing cesarean delivery. Transfusion. 2018;58(4):998–1005.

    Article  PubMed  PubMed Central  Google Scholar 

  41. Nouraei SA, et al. Accuracy of clinician-clinical coder information handover following acute medical admissions: implication for using administrative datasets in clinical outcomes management. J Public Health (Oxf). 2016;38(2):352–62.

    Article  Google Scholar 

  42. Burles K, et al. Limitations of pulmonary embolism ICD-10 codes in emergency department administrative data: let the buyer beware. BMC Med Res Methodol. 2017;17(1):89.

    Article  PubMed  PubMed Central  Google Scholar 

  43. Fang MC, et al. Validity of using inpatient and outpatient administrative codes to identify acute venous thromboembolism: the CVRN VTE study. Med Care. 2017;55(12):e137-43.

    Article  PubMed  PubMed Central  Google Scholar 

  44. O’Malley KJ, et al. Measuring diagnoses: ICD code accuracy. Health Serv Res. 2005;40(5 Pt 2):1620–39.

    Article  PubMed  PubMed Central  Google Scholar 

  45. Leisman DE, et al. Development and reporting of prediction models: guidance for authors from editors of respiratory, sleep, and critical care journals. Crit Care Med. 2020;48(5):623.

    Article  PubMed  PubMed Central  Google Scholar 

  46. Pruitt Z, Pracht E. Upcoding emergency admissions for non-life-threatening injuries to children. Am J Manag Care. 2013;19(11):917–24.

    PubMed  Google Scholar 

  47. Services USCfMM. Hospital-Acquired Condition (HAC) Reduction Program. 07/21/2020.  https://www.cms.gov/Medicare/Quality-Initiatives-Patient-Assessment-Instruments/Value-Based-Programs/HAC/Hospital-Acquired-Conditions. Cited 4 Aug 2020

  48. Wei W-Q, et al. Combining billing codes, clinical notes, and medications from electronic health records provides superior phenotyping performance. J Am Med Inform Assoc. 2016;23(e1):e20–7.

    Article  PubMed  Google Scholar 

  49. Liao KP, et al. Development of phenotype algorithms using electronic medical records and incorporating natural language processing. BMJ. 2015;350:h1885.

    Article  PubMed  PubMed Central  Google Scholar 

  50. Alotaibi GS, et al. The validity of ICD codes coupled with imaging procedure codes for identifying acute venous thromboembolism using administrative data. Vasc Med. 2015;20(4):364–8.

    Article  PubMed  Google Scholar 

  51. Murff HJ, et al. Automated identification of postoperative complications within an electronic medical record using natural language processing. JAMA. 2011;306(8):848–55.

    Article  CAS  PubMed  Google Scholar 

  52. FitzHenry F, et al. Exploring the frontier of electronic health record surveillance: the case of post-operative complications. Med Care. 2013;51(6):509.

    Article  PubMed  PubMed Central  Google Scholar 

  53. Rochefort CM, et al. A novel method of adverse event detection can accurately identify venous thromboembolisms (VTEs) from narrative electronic health record data. J Am Med Inform Assoc. 2015;22(1):155–65.

    Article  PubMed  Google Scholar 

  54. Dantes RB, et al. Improved identification of venous thromboembolism from electronic medical records using a novel information extraction software platform. Med Care. 2018;56(9):e54.

    Article  PubMed  PubMed Central  Google Scholar 

  55. Correa EA Jr, Lopes AA, Amancio DR. Word sense disambiguation: a complex network approach. Inf Sci. 2018;442:103–13.

    Article  Google Scholar 

  56. Richesson R, Gold WL, Rasmussen SL. Electronic health records-based phenotyping. In: t.N.H.C.S.R.C.E.H.R.C.W. Group, editors. Rethinking clinical trials: a living textbook of pragmatic clinical trials. Bethesda, MD: NIH Health Care Systems Research Collaboratory. Updated 20 Oct 20.

  57. Banda JM, et al. Advances in electronic phenotyping: from rule-based definitions to machine learning models. Annu Rev Biomed Data Sci. 2018;1:53–68.

    Article  PubMed  PubMed Central  Google Scholar 

  58. Bowman S. Why ICD-10 is worth the trouble. J AHIMA. 2008;79(3):24–9; quiz 41–2.

  59. Averill RF, Butler R. Misperceptions, misinformation, and misrepresentations: the ICD-10-CM/PCS saga. J AHIMA; 2013.

  60. Topaz M, Shafran-Topaz L, Bowles KH. ICD-9 to ICD-10: evolution, revolution, and current debates in the United States. Perspect Health Inf Manag. 2013; 10(Spring):1d.

  61. Le Gal G, Righini M. Controversies in the diagnosis of venous thromboembolism. J Thromb Haemost. 2015;13:1.

    Google Scholar 

  62. Sanfilippo KM, et al. Improving accuracy of International Classification of Diseases codes for venous thromboembolism in administrative data. Thromb Res. 2015;135(4):616–20.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  63. Chen Y, et al. Applying active learning to high-throughput phenotyping algorithms for electronic health records data. J Am Med Inform Assoc. 2013;20(e2):e253-9.

    Article  PubMed  Google Scholar 

  64. The International Classification of Diseases, 9th Revision, Clinical Modification (ICD-9-CM). U.S. Department of Health and Human Services; 2007.

  65. ICD9DATA.COM. 2006 ICD-9-CM Diagnosis Codes 2006.  http://www.icd9data.com/2006/Volume1/default.htm. Cited 5 Aug 2020.

  66. 2007 ICD-9-CM Diagnosis Codes. 2007. http://www.icd9data.com/2007/Volume1/default.htm.  Cited 5 Aug 2020.

  67. 2008 ICD-9-CM Diagnosis Codes. 2008. http://www.icd9data.com/2008/Volume1/default.htm.  Cited 5 Aug 2020.

  68. 2020 ICD-10-CM/PCS Medical Coding Reference. 2020. https://www.icd10data.com/.  Cited 5 Aug 2020.

Download references

Acknowledgements

This research was supported by National Institutes of Health R01NR01391 (all), F31NR01810 (TP, MH).

Author information

Authors and Affiliations

Authors

Contributions

Study design and implementation (TP, MS, GC, MRP, MH). Data Collection, Annotation (TP, MS). Data Analysis (TP, GC, MH). Manuscript Development and Review (TP, MS, GC, AWD, MRP, MH).

Corresponding author

Correspondence to Tiffany Pellathy.

Ethics declarations

Conflict of interest

The authors have no commercial conflicts of interest with this work.

Ethical approval

The study was approved by the institutional review boards of the University of Pittsburgh and Carnegie Mellon University.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pellathy, T., Saul, M., Clermont, G. et al. Accuracy of identifying hospital acquired venous thromboembolism by administrative coding: implications for big data and machine learning research. J Clin Monit Comput 36, 397–405 (2022). https://doi.org/10.1007/s10877-021-00664-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10877-021-00664-6

Keywords

Navigation