Skip to main content

Advertisement

Log in

Multicentre external validation of a commercial artificial intelligence software to analyse chest radiographs in health screening environments with low disease prevalence

  • Imaging Informatics and Artificial Intelligence
  • Published:
European Radiology Aims and scope Submit manuscript

Abstract

Objectives

To externally validate the performance of a commercial AI software program for interpreting CXRs in a large, consecutive, real-world cohort from primary healthcare centres.

Methods

A total of 3047 CXRs were collected from two primary healthcare centres, characterised by low disease prevalence, between January and December 2018. All CXRs were labelled as normal or abnormal according to CT findings. Four radiology residents read all CXRs twice with and without AI assistance. The performances of the AI and readers with and without AI assistance were measured in terms of area under the receiver operating characteristic curve (AUROC), sensitivity, and specificity.

Results

The prevalence of clinically significant lesions was 2.2% (68 of 3047). The AUROC, sensitivity, and specificity of the AI were 0.648 (95% confidence interval [CI] 0.630–0.665), 35.3% (CI, 24.7–47.8), and 94.2% (CI, 93.3–95.0), respectively. AI detected 12 of 41 pneumonia, 3 of 5 tuberculosis, and 9 of 22 tumours. AI-undetected lesions tended to be smaller than true-positive lesions. The readers’ AUROCs ranged from 0.534–0.676 without AI and 0.571–0.688 with AI (all p values < 0.05). For all readers, the mean reading time was 2.96–10.27 s longer with AI assistance (all p values < 0.05).

Conclusions

The performance of commercial AI in these high-volume, low-prevalence settings was poorer than expected, although it modestly boosted the performance of less-experienced readers. The technical prowess of AI demonstrated in experimental settings and approved by regulatory bodies may not directly translate to real-world practice, especially where the demand for AI assistance is highest.

Key Points

This study shows the limited applicability of commercial AI software for detecting abnormalities in CXRs in a health screening population.

When using AI software in a specific clinical setting that differs from the training setting, it is necessary to adjust the threshold or perform additional training with such data that reflects this environment well.

Prospective test accuracy studies, randomised controlled trials, or cohort studies are needed to examine AI software to be implemented in real clinical practice.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Abbreviations

AI:

Artificial intelligence

AUROC:

Area under the receiver operating characteristic curve

CI s:

95% confidence intervals

CXRs:

Chest radiographs

FNs:

False negatives

FPs:

False positives

GT:

Ground truth

TPs:

True positives

References

  1. Voter A, Larson M, Garrett J, Yu J-P (2021) Diagnostic accuracy and failure mode analysis of a deep learning algorithm for the detection of cervical spine fractures. AJNR Am J Neuroradiol 42:1550–1556

  2. Hwang EJ, Park S, Jin KN et al (2019) Development and validation of a deep learning-based automated detection algorithm for major thoracic diseases on chest radiographs. JAMA Netw Open 2:e191095

    Article  PubMed  PubMed Central  Google Scholar 

  3. Ting DSW, Cheung CY-L, Lim G et al (2017) Development and validation of a deep learning system for diabetic retinopathy and related eye diseases using retinal images from multiethnic populations with diabetes. JAMA 318:2211–2223

    Article  PubMed  PubMed Central  Google Scholar 

  4. Futoma J, Simons M, Panch T, Doshi-Velez F, Celi LA (2020) The myth of generalisability in clinical research and machine learning in health care. Lancet Digital Health 2:e489–e492

    Article  PubMed  Google Scholar 

  5. Park SH, Choi J, Byeon J-S (2021) Key principles of clinical validation, device approval, and insurance coverage decisions of artificial intelligence. Korean J Radiol 22:442

    Article  PubMed  PubMed Central  Google Scholar 

  6. Nam JG, Park S, Hwang EJ et al (2019) Development and validation of deep learning-based automatic detection algorithm for malignant pulmonary nodules on chest radiographs. Radiology 290:218–228

    Article  PubMed  Google Scholar 

  7. Choi SY, Park S, Kim M, Park J, Choi YR, Jin KN (2021) Evaluation of a deep learning-based computer-aided detection algorithm on chest radiographs: case-control study. Medicine (Baltimore) 100:e25663

    Article  PubMed  Google Scholar 

  8. Yoo H, Kim KH, Singh R, Digumarthy SR, Kalra MK (2020) Validation of a deep learning algorithm for the detection of malignant pulmonary nodules in chest radiographs. JAMA Netw Open 3:e2017135

    Article  PubMed  PubMed Central  Google Scholar 

  9. Kim JH, Kim JY, Kim GH et al (2020) Clinical validation of a deep learning algorithm for detection of pneumonia on chest radiographs in emergency department patients with acute febrile respiratory illness. J Clin Med 9(6):1981

  10. Nam JG, Hwang EJ, Kim DS et al (2020) Undetected lung cancer at posteroanterior chest radiography: potential role of a deep learning-based detection algorithm. Radiol Cardiothorac Imaging 2:e190222

    Article  PubMed  PubMed Central  Google Scholar 

  11. Hwang EJ, Park S, Jin KN et al (2019) Development and validation of a deep learning-based automatic detection algorithm for active pulmonary tuberculosis on chest radiographs. Clin Infect Dis 69:739–747

    Article  PubMed  Google Scholar 

  12. Lee JH, Park S, Hwang EJ et al (2021) Deep learning-based automated detection algorithm for active pulmonary tuberculosis on chest radiographs: diagnostic performance in systematic screening of asymptomatic individuals. Eur Radiol 31:1069–1080

    Article  PubMed  Google Scholar 

  13. Jang SB, Lee SH, Lee DE et al (2020) Deep-learning algorithms for the interpretation of chest radiographs to aid in the triage of COVID-19 patients: a multicenter retrospective study. PLoS One 15:e0242759

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Hwang EJ, Lee JS, Lee JH et al (2021) Deep learning for detection of pulmonary metastasis on chest radiographs. Radiology 301:455–463

  15. Kim JH, Han SG, Cho A, Shin HJ, Baek SE (2021) Effect of deep learning-based assistive technology use on chest radiograph interpretation by emergency department physicians: a prospective interventional simulation-based study. BMC Med Inform Decis Mak 21:311

    Article  PubMed  PubMed Central  Google Scholar 

  16. Lee JH, Sun HY, Park S et al (2020) Performance of a deep learning algorithm compared with radiologic interpretation for lung cancer detection on chest radiographs in a health screening population. Radiology 297:687–696

    Article  PubMed  Google Scholar 

  17. Hwang EJ, Hong JH, Lee KH et al (2020) Deep learning algorithm for surveillance of pneumothorax after lung biopsy: a multicenter diagnostic cohort study. Eur Radiol 30:3660–3671

  18. Hwang EJ, Nam JG, Lim WH et al (2019) Deep learning for chest radiograph diagnosis in the emergency department. Radiology 293:573–580

    Article  PubMed  Google Scholar 

  19. Qin ZZ, Sander MS, Rai B et al (2019) Using artificial intelligence to read chest radiographs for tuberculosis detection: a multi-site evaluation of the diagnostic accuracy of three deep learning systems. Sci Rep 9:15000

    Article  PubMed  PubMed Central  Google Scholar 

  20. Freeman K, Geppert J, Stinton C et al (2021) Use of artificial intelligence for image analysis in breast cancer screening programmes: systematic review of test accuracy. BMJ 374:n1872

  21. Seah JCY, Tang CHM, Buchlak QD et al (2021) Effect of a comprehensive deep-learning model on the accuracy of chest x-ray interpretation by radiologists: a retrospective, multireader multicase study. Lancet Digit Health 3:e496–e506

    Article  CAS  PubMed  Google Scholar 

  22. Park S, Lee SM, Lee KH et al (2020) Deep learning-based detection system for multiclass lesions on chest radiographs: comparison with observer readings. Eur Radiol 30:1359–1368

    Article  PubMed  Google Scholar 

  23. Sjoding MW, Taylor D, Motyka J et al (2021) Deep learning to detect acute respiratory distress syndrome on chest radiographs: a retrospective study with external validation. Lancet Digit Health 3:e340–e348

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Park SH (2019) Diagnostic case-control versus diagnostic cohort studies for clinical validation of artificial intelligence algorithm performance. Radiology 290:272–273

    Article  PubMed  Google Scholar 

  25. Kim EY, Kim YJ, Choi WJ et al (2021) Performance of a deep-learning algorithm for referable thoracic abnormalities on chest radiographs: a multicenter study of a health screening cohort. PLoS One 16:e0246472

  26. Nam JG, Kim M, Park J et al (2021) Development and validation of a deep learning algorithm detecting 10 common abnormalities on chest radiographs. Eur Respir J 57:2003061

  27. Tam M, Dyer T, Dissez G et al (2021) Augmenting lung cancer diagnosis on chest radiographs: positioning artificial intelligence to improve radiologist performance. Clin Radiol 76:607–614

  28. Hwang EJ, Goo JM, Yoon SH et al (2021) Use of artificial intelligence-based software as medical devices for chest radiography: a position paper from the Korean Society of Thoracic Radiology. Korean J Radiol 22:1743–1748

    Article  PubMed  PubMed Central  Google Scholar 

  29. Duggan GE, Reicher JJ, Liu Y, Tse D, Shetty S (2021) Improving reference standards for validation of AI-based radiography. Br J Radiol 94:20210435

    Article  PubMed  PubMed Central  Google Scholar 

  30. Sung J, Park S, Lee SM et al (2021) Added value of deep learning-based detection system for multiple major findings on chest radiographs: a randomized crossover study. Radiology 299:450–459

    Article  PubMed  Google Scholar 

Download references

Acknowledgements

Eunjin Noh kindly provided statistical advice for this manuscript.

Funding

This study has received funding from DongKook Life Science Co., Lunit Inc., and the Department of Radiology of Korea University Medical Center.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hwan Seok Yong.

Ethics declarations

Guarantor

The scientific guarantor of this publication is Hwan Seok Yong.

Conflict of interest

The authors of this manuscript declare no relationships with any companies whose products or services may be related to the subject matter of the article.

Statistics and biometry

Eunjin Noh kindly provided statistical advice for this manuscript.

Informed consent

Written informed consent was waived by the Institutional Review Board.

Ethical approval

Institutional Review Board approval was obtained.

Methodology

• retrospective

• observational

• multicentre study

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

ESM 1

(DOCX 15405 kb)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kim, C., Yang, Z., Park, S.H. et al. Multicentre external validation of a commercial artificial intelligence software to analyse chest radiographs in health screening environments with low disease prevalence. Eur Radiol 33, 3501–3509 (2023). https://doi.org/10.1007/s00330-022-09315-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00330-022-09315-z

Keywords

Navigation