Skip to main content

Advertisement

Log in

Between Always and Never: Evaluating Uncertainty in Radiology Reports Using Natural Language Processing

  • Original Paper
  • Published:
Journal of Digital Imaging Aims and scope Submit manuscript

Abstract

The ideal radiology report reduces diagnostic uncertainty, while avoiding ambiguity whenever possible. The purpose of this study was to characterize the use of uncertainty terms in radiology reports at a single institution and compare the use of these terms across imaging modalities, anatomic sections, patient characteristics, and radiologist characteristics. We hypothesized that there would be variability among radiologists and between subspecialities within radiology regarding the use of uncertainty terms and that the length of the impression of a report would be a predictor of use of uncertainty terms. Finally, we hypothesized that use of uncertainty terms would often be interpreted by human readers as “hedging.” To test these hypotheses, we applied a natural language processing (NLP) algorithm to assess and count the number of uncertainty terms within radiology reports. An algorithm was created to detect usage of a published set of uncertainty terms. All 642,569 radiology report impressions from 171 reporting radiologists were collected from 2011 through 2015. For validation, two radiologists without knowledge of the software algorithm reviewed report impressions and were asked to determine whether the report was “uncertain” or “hedging.” The relationship between the presence of 1 or more uncertainty terms and the human readers’ assessment was compared. There were significant differences in the proportion of reports containing uncertainty terms across patient admission status and across anatomic imaging subsections. Reports with uncertainty were significantly longer than those without, although report length was not significantly different between subspecialities or modalities. There were no significant differences in rates of uncertainty when comparing the experience of the attending radiologist. When compared with reader 1 as a gold standard, accuracy was 0.91, sensitivity was 0.92, specificity was 0.9, and precision was 0.88, with an F1-score of 0.9. When compared with reader 2, accuracy was 0.84, sensitivity was 0.88, specificity was 0.82, and precision was 0.68, with an F1-score of 0.77. Substantial variability exists among radiologists and subspecialities regarding the use of uncertainty terms, and this variability cannot be explained by years of radiologist experience or differences in proportions of specific modalities. Furthermore, detection of uncertainty terms demonstrates good test characteristics for predicting human readers’ assessment of uncertainty.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Schoenfeld AJ, Harris MB, Davis M. Clinical uncertainty at the intersection of advancing technology, evidence-based medicine, and health care policy. JAMA Surg. 2014;149(12):1221–2.

    Article  Google Scholar 

  2. Platt ML, Huettel SA. Risky business: the neuroeconomics of decision making under uncertainty. Nat Neurosci. 2008;11(4):398–403.

    Article  CAS  Google Scholar 

  3. Saposnik G, Sempere AP, Prefasi D, Selchen D, Ruff CC, Maurino J, et al. Decision-making in Multiple Sclerosis: The Role of Aversion to Ambiguity for Therapeutic Inertia among Neurologists (DIScUTIR MS). Front Neurol. 2017;8:65.

    PubMed  PubMed Central  Google Scholar 

  4. Rosenkrantz AB, Kiritsy M, Kim S. How “consistent” is “consistent”? A clinician-based assessment of the reliability of expressions used by radiologists to communicate diagnostic confidence. Clin Radiol. 2014;69(7):745–9.

    Article  CAS  Google Scholar 

  5. Khorasani R, Bates DW, Teeger S, Rothschild JM, Adams DF, Seltzer SE. Is terminology used effectively to convey diagnostic certainty in radiology reports? Acad Radiol. 2003;10(6):685–8.

    Article  Google Scholar 

  6. Carney PA, Yi JP, Abraham LA, Miglioretti DL, Aiello EJ, Gerrity MS, et al. Reactions to uncertainty and the accuracy of diagnostic mammography. J Gen Intern Med. 2007;22(2):234–41.

    Article  Google Scholar 

  7. Clinger NJ, Hunter TB, Hillman BJ. Radiology reporting: attitudes of referring physicians. Radiology. 1988;169(3):825–6.

    Article  CAS  Google Scholar 

  8. Panicek DM, Hricak H. How Sure Are You, Doctor? A Standardized Lexicon to Describe the Radiologist’s Level of Certainty. AJR Am J Roentgenol. 2016;207(1):2–3.

    Article  Google Scholar 

  9. Wallis A, McCoubrie P. The radiology report—are we getting the message across? Clin Radiol. 2011;66(11):1015–22.

    Article  CAS  Google Scholar 

  10. Sobel JL, Pearson ML, Gross K, Desmond KA, Harrison ER, Rubenstein LV, et al. Information content and clarity of radiologists’ reports for chest radiography. Acad Radiol. 1996;3(9):709–17.

    Article  CAS  Google Scholar 

  11. Valls C. Pitfalls of the vague radiology report. AJR Am J Roentgenol. 2001;176(1):253–4.

    Article  CAS  Google Scholar 

  12. Levinson W. Physician-patient communication. A key to malpractice prevention. JAMA. 1994;272(20):1619–20.

    Article  CAS  Google Scholar 

  13. Hoang JK. Do not hedge when there is certainty. J Am Coll Radiol. 2017;14(1):5.

    Article  Google Scholar 

  14. Burnside ES, Sickles EA, Bassett LW, Rubin DL, Lee CH, Ikeda DM, et al. The ACR BI-RADS experience: learning from history. J Am Coll Radiol. 2009;6(12):851–60.

    Article  Google Scholar 

  15. Tessler FN, Middleton WD, Grant EG, Hoang JK, Berland LL, Teefey SA, et al. ACR Thyroid Imaging, Reporting and Data System (TI-RADS): White Paper of the ACR TI-RADS Committee. J Am Coll Radiol. 2017;14(5):587–95.

    Article  Google Scholar 

  16. Berlin L. Radiologic errors and malpractice: a blurry distinction. AJR Am J Roentgenol. 2007;189(3):517–22.

    Article  Google Scholar 

  17. Hripcsak G, Austin JHM, Alderson PO, Friedman C. Use of natural language processing to translate clinical information from a database of 889,921 chest radiographic reports. Radiology. 2002;224(1):157–63.

    Article  Google Scholar 

  18. Do BH, Wu A, Biswal S, Kamaya A, Rubin DL. Informatics in Radiology: RADTF: A Semantic Search–enabled, Natural Language Processor–generated Radiology Teaching File. Radiographics. 2010;30(7):2039–48.

    Article  Google Scholar 

  19. Dutta S, Long WJ, Brown DFM, Reisner AT. Automated detection using natural language processing of radiologists recommendations for additional imaging of incidental findings. Ann Emerg Med. 2013;62(2):162–9.

    Article  Google Scholar 

  20. Dang PA, Kalra MK, Blake MA, Schultz TJ, Stout M, Lemay PR, et al. Natural language processing using online analytic processing for assessing recommendations in radiology reports. J Am Coll Radiol. 2008;5(3):197–204.

    Article  Google Scholar 

  21. Dogra N, Giordano J, France N. Cultural diversity teaching and issues of uncertainty: the findings of a qualitative study. BMC Med Educ. 2007;7(1):8.

    Article  Google Scholar 

  22. Wu AS, Do BH, Kim J, Rubin DL. Evaluation of negation and uncertainty detection and its impact on precision and recall in search. J Digit Imaging. 2011;24(2):234–42.

    Article  Google Scholar 

  23. Hanauer DA, Liu Y, Mei Q, Manion FJ, Balis UJ, Zheng K. Hedging their mets: the use of uncertainty terms in clinical documents and its potential implications when sharing the documents with patients. AMIA Annu Symp Proc. 2012;2012:321–30.

    PubMed  PubMed Central  Google Scholar 

  24. Bhise V, Rajan SS, Sittig DF, Morgan RO, Chaudhary P, Singh H. Defining and measuring diagnostic uncertainty in medicine: a systematic review. J Gen Intern Med. 2018;33(1):103–15.

    Article  Google Scholar 

  25. Zwaan L, Singh H. The challenges in defining and measuring diagnostic error. Diagnosis (Berl). 2015;2(2):97–103.

    Article  Google Scholar 

  26. Singh H, Giardina TD, Meyer AND, Forjuoh SN, Reis MD, Thomas EJ. Types and origins of diagnostic errors in primary care settings. JAMA Intern Med. 2013;173(6):418–25.

    Article  Google Scholar 

  27. Hoang JK, Middleton WD, Farjat AE, Langer JE, Reading CC, Teefey SA, et al. Reduction in thyroid nodule biopsies and improved accuracy with American college of radiology thyroid imaging reporting and data system. Radiology. 2018;287(1):185–93.

    Article  Google Scholar 

  28. Tang A, Bashir MR, Corwin MT, Cruite I, Dietrich CF, Do RKG, et al. Evidence supporting LI-RADS major features for CT- and MR imaging-based diagnosis of hepatocellular carcinoma: a systematic review. Radiology. 2018;286(1):29–48.

    Article  Google Scholar 

  29. MacMahon H, Naidich DP, Goo JM, Lee KS, Leung ANC, Mayo JR, et al. Guidelines for Management of Incidental Pulmonary Nodules Detected on CT Images: From the Fleischner Society 2017. Radiology. 2017;284(1):228–43.

    Article  Google Scholar 

  30. Pons E, Braun LMM, Hunink MGM, Kors JA. Natural language processing in radiology: a systematic review. Radiology. 2016;279(2):329–43.

    Article  Google Scholar 

  31. Lacson R, Khorasani R. Practical examples of natural language processing in radiology. J Am Coll Radiol. 2011;8(12):872–4.

    Article  Google Scholar 

  32. Srinivasa Babu A, Brooks ML. The malpractice liability of radiology reports: minimizing the risk. Radiographics. 2015;35(2):547–54.

    Article  Google Scholar 

  33. Dreyer KJ, Kalra MK, Maher MM, Hurier AM, Asfaw BA, Schultz T, et al. Application of recently developed computer algorithm for automatic classification of unstructured radiology reports: validation study. Radiology. 2005;234(2):323–9.

    Article  Google Scholar 

  34. Abbass IM, Krause TM, Virani SS, Swint JM, Chan W, Franzini L. Revisiting the economic efficiencies of observation units. Manag Care. 2015;24(3):46–52.

    PubMed  Google Scholar 

  35. Prabhakar AM, Misono AS, Harvey HB, Yun BJ, Saini S, Oklu R. Imaging utilization from the ED: no difference between observation and admitted patients. Am J Emerg Med. 2015;33(8):1076–9.

    Article  Google Scholar 

  36. Molins E, Macià F, Ferrer F, Maristany M-T, Castells X. Association between radiologists’ experience and accuracy in interpreting screening mammograms. BMC Health Serv Res. 2008;8:91.

    Article  Google Scholar 

  37. Miglioretti DL, Gard CC, Carney PA, Onega TL, Buist DSM, Sickles EA, et al. When radiologists perform best: the learning curve in screening mammogram interpretation. Radiology. 2009;253(3):632–40.

    Article  Google Scholar 

  38. Saposnik G, Redelmeier D, Ruff CC, Tobler PN. Cognitive biases associated with medical decisions: a systematic review. BMC Med Inform Decis Mak. 2016;16(1):138.

    Article  Google Scholar 

Download references

Funding

Dr. Callen was supported by the National Institutes of Health (NIBIB) T32 Training Grant, T32EB001631.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andrew L. Callen.

Ethics declarations

Conflict of Interest

The authors declare that they have no conflict of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Table 1 Uncertainty Terms

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Callen, A.L., Dupont, S.M., Price, A. et al. Between Always and Never: Evaluating Uncertainty in Radiology Reports Using Natural Language Processing. J Digit Imaging 33, 1194–1201 (2020). https://doi.org/10.1007/s10278-020-00379-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10278-020-00379-1

Keywords

Navigation