Introduction

Pneumothorax is a condition characterized by the abnormal collection of air in the pleural space, which requires urgent diagnosis and appropriate care1,2. Chest radiographs (CXRs) are the primary diagnostic tool for pneumothorax1,3. Tension pneumothorax is a life-threatening condition caused by persistent air leakage through a one-way valve formed by damaged tissue. Occult pneumothorax refers to cases that were initially missed, often due to the presence of only small amounts of air or CXRs being taken in the supine position. Moreover, the large volume of radiographs in routine clinical practice can lead to longer turnaround times and delays in diagnosis by radiology experts. These delays can result in the progression of respiratory compromise, particularly in patients discharged from the emergency department or those who are unstable patients in the intensive care unit1.

To mitigate the risk of missed diagnosis—up to 20% of occult pneumothoraces—artificial intelligence (AI)-based diagnostic tools or triage systems may offer a viable solution4,5,6,7. Recent studies have shown that AI algorithms perform well in detecting pneumothorax, with reported areas under the receiver operating characteristics curve ranging from 0.91 to 0.977,8,9,10,11,12.

While AI algorithms have the potential to reduce missed diagnosis of pneumothorax and decrease the turnaround time for CXR from critical patients, false-positive (FP) results remain a challenge. These can lead to unnecessary additional examinations or diagnostic dilemmas, complicating the implementation of AI-based diagnostic tools in daily clinical practice13. Although a low cutoff value for pneumothorax detection might be justified despite the risk of misdiagnosis, not limiting FP results can increase radiologists’ workloads14. Hospitals that have implemented commercial AI-based diagnostic tools have reported that the benefits of AI vary depending on the clinical context and how FP results are managed6,15,16,17,18,19. This is also relevant for pneumothorax, yet few studies have demonstrated the real-world performance of AI-based tools in diagnosing pneumothorax or explored factors that influence their accuracy in reducing FP results.

Therefore, the purpose of this study was to evaluate the actual positive predictive value (PPV) of an AI based tool for pneumothorax detection on CXR. Additionally, we examined factors that might influence the AI results to better understand their impact on diagnostic accuracy.

Materials and methods

Subjects

The Institutional Review Board (IRB) of Yongin Severance Hospital approved this retrospective study (Yongin Severance Hospital, Yonsei University College of Medicine: IRB number 9–2022-0073) and waived the need for informed content. All methods were carried out in accordance with Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) guideline and regulations. All methods were performed in accordance with relevant guidelines and regulations. Patients found to have pneumothorax on their initial CXR by AI from March to December 2021 were included retrospectively. CXRs in posteroanterior (PA) and anteroposterior (AP) views were all included. We excluded patients < 15 years of age, any repeated CXRs taken in the same visit (i.e., only the initial CXR taken at the time of the event was included and any follow-up CXRs from the same visit were excluded). If the patient visited again with recurrent pneumothorax, the initial CXR of that subsequent visit was included again. Therefore, patients with chest tube insertion from pneumothorax were excluded automatically because we included initial CXRs before the treatment to focus on the initial diagnostic performance of AI as a screening tool for the first detection. Information about patient age and sex as well as projection view of the CXR were included for assessment.

Usage of AI-based lesion-detection software for CXR

Our hospital uses a commercially available AI-based lesion-detection software (Lunit INSIGHT CXR, version 3; Lunit Inc., South Korea) for all CXRs. This software was directly integrated into the picture-archiving and communication system (PACS) in March 2020. It can detect a total of eight lesions, including pneumothorax, atelectasis, consolidation, fibrosis, nodule, and pleural effusion, on PA and AP views of CXRs when the abnormality score of each lesion is above 15%. The abnormality score refers to the probability of the CXR containing the lesion determined by AI and ranges from 0–100%. Our hospital uses 15% as a cutoff value for determining the presence of a lesion according to the vendor’s guideline and previous studies19,20,21. When the lesion had an abnormality score of greater than 15%, the AI displayed a contour map, abbreviation, and the abnormality score on the CXR as a secondary capture image on PACS (Fig. 1)5,15. For lesions with a score of less than 15%, the AI did not generate a contour map or abbreviation, and the CXR was considered as not containing the subsequent lesion.

Fig. 1
figure 1

Examples of diagnosis of pneumothorax by AI software on CXRs. (a) An FP case of pneumothorax on an AP-view CXR from a 36-year-old male patient. The suggested reason for the FP diagnosis of AI was the existence of a monitoring patch on the chest wall. (b) An FP case of pneumothorax on an AP-view CXR from a 78-year-old male patient. The suggested reason for the FP diagnosis of AI was the presence of a skin fold. (c) An FP case of pneumothorax on an AP-view CXR from a 53-year-old female patient. There were concurrent large amounts of ipsilateral pleural effusion occupying more than two-thirds of the hemithorax on the same side as the pneumothorax marked by AI.

Analysis of AI-based CXR result

To know the PPV of AI for diagnosing pneumothorax on CXR, we extracted CXR data with abnormality scores of greater than 15% for pneumothorax during the study period in the software server. Using the extracted CXR, abnormality scores for pneumothorax, atelectasis, consolidation, fibrosis, nodule, and pleural effusion were evaluated, and an abnormality score of more than 15% was used to determine the presence of lesions on each CXR.

To determine the ground truth about pneumothorax, two board-certified radiologists with more than 12 years of experience in radiology reviewed images independently and divided them into true-positive (TP) and FP results for pneumothorax. Radiologists referred to all available images, including other CXRs and computed tomography (CT) scans, to confirm the results, without being blinded to the electronic medical records and also AI results. For CXRs with uncertain results, the radiologists established a consensus by reviewing the images together. If pneumothorax was present on both sides of lung in one CXR, each pneumothorax was analyzed separately.

The abnormality score and area of contour maps for each pneumothorax were included for analysis. The area of the contour map was evaluated by drawing regions of interest (ROIs) along the border of the contour map, done by one radiologist directly on the PACS It was evaluated only on the CXR where the pneumothorax did not overlap with other lung lesions because the contour map of pneumothorax can merge with the other overlapped lesions, disrupting its own evaluation about the area. The location of the pneumothorax was evaluated as right or left on each CXR by the same radiologist. To know whether concurrent lesions affected the detection of pneumothorax by AI, the presence or absence of atelectasis, pleural effusion, consolidation, nodule, and fibrosis, respectively, was determined. When one of these lesions was present, its ipsilateral or contralateral presence considering the location of the pneumothorax was evaluated. In addition, when the lesion was in the ipsilateral location, the amount of concurrent pleural effusion, consolidation, and fibrosis lesions were analyzed.

Radiologists reports about pneumothorax

Whether or not radiologists accepted the AI results and reported pneumothorax in the official reading was evaluated retrospectively while also considering the TP and FP results of AI. When the radiologist missed a diagnosis, we reviewed images to know the reason for the discordance between the radiologist’s reports and the AI results22. In addition, when FP AI results were obtained, we reviewed the images again and tried to discern the reason for the AI’s suggestion.

Statistical analysis

For the statistical analysis, we used the R program (version 4.1.3; Foundation for Statistical Computing, Vienna, Austria, package: furniture, geepack, doBy). Demographics of the included CXRs were compared using the two-sample t test or Chi-square test. Proportions of TP and FP were evaluated, and the PPV was calculated with a 95% confidence interval (CI) according to the demographics of patients; projection views of the CXR; and type, location, and amount of concurrent lung lesions. The difference in PPV from the reference group and the odds ratio (OR) were evaluated for categorical variables and compared using logistic regression with a generalized estimating equation to reflect repeated CXRs from different visits due to recurrent episodes and also to consider bilateral pneumothoraces in the same patients separately. The differences in ORs for age and pneumothorax were tested by adding the interaction effect of each variable and the CXR projection view to the logistic regression model. P < 0.05 was considered to be statistically significant.

Results

Comparison of TP and FP cases

During the study period, a total of 87,658 CXRs were performed and analyzed by the AI software at our hospital. Among them, 460 CXRs (0.53%) had abnormality scores of greater than 15% for pneumothorax according to the AI. Among them, 11 CXRs were from patients < 15 years of age and excluded, and another 141 CXRs were excluded because they were repeated CXRs from the same episode. Therefore, 308 CXRs of 283 patients (M:F = 213:70; mean age, 59.5 years) were finally included with exams repeated up to nine times and bilateral pneumothoraces in 23 CXRs. Therefore, a total of 331 pneumothoraces discovered by AI were included. Among the 308 included CXRs, the proportion with a PA view was 32.5% (PA:AP = 100:208). A flowchart of CXR inclusion is presented in Fig. 2.

Fig. 2
figure 2

Flowchart of CXR inclusion. Abbreviations: CXR, chest radiograph; AI, artificial intelligence.

Out of the 331 pneumothoraces detected by AI, there were 136 TP and 195 FP cases, which meant that the overall PPV was 41.1%. A comparison of demographics between TP and FP cases is presented in Table 1. Sex had no significant effect on the diagnosis of pneumothorax (OR, 1.481; 95% CI, 0.901–2.434; P = 0.121). However, patient age was significantly greater in FP cases compared to TP cases (70.9 ± 16.7 vs. 41.9 ± 24.9 years, P < 0.001) and showed significant association about the diagnosis (OR, 0.945; 95% CI, 0.934–0.956; P < 0.001).

Table 1 Comparison of demographics between TP and FP cases.

Considering the CXR projection view, 79.9% of AP-view CXRs were FP cases, while 88.2% of PA-view CXRs were TP cases (P < 0.001) (Fig. 3). The PPV was significantly different between PA and AP views (88.2% vs. 20.1%, P < 0.001). It showed a significant effect on the diagnosis of pneumothorax (OR, 29.837; 95% CI, 15.062–59.107; P < 0.001).

Fig. 3
figure 3

Waterfall plots for pneumothorax abnormality score according to the TP (blue) and FP (red) diagnosis in (a) PA- and (b) AP-view CXRs. Abbreviations: AP, anteroposterior; CXR, chest radiograph; FP, false-positive, PA, posteroanterior; TP, true-positive.

The abnormality score of pneumothorax was significantly higher In TP cases compared to FP cases (88.9 ± 19.8% vs. 38.6 ± 20.7%, P < 0.001) and had a significant effect on the diagnosis of pneumothorax (OR, 1.081; 95% CI, 1.066–1.097; P < 0.001). The amount of pneumothorax could be evaluated in 250 patients and was significantly larger in TP cases compared to FP cases (135.4 ± 95.6 vs. 53 ± 66.3 cm2, P < 0.001) with a significant association (OR, 1.005; 95% CI, 1.003–1.007; P < 0.001).

We compared the ORs of AP and PA views according to patient age and abnormality scores to reveal the interaction effect of projection views. The ORs of age did not significantly differ between AP and PA views (0.967 vs. 0.946, P = 0.18). In addition, the ORs of the abnormality score were not significantly different (1.084 vs. 1.059, P = 0.17).

Effect of concurrent lung lesions for diagnosing pneumothorax by AI

The effect of concurrent lung lesions determined by AI are summarized in Table 2. Notably, the PPV of pneumothorax was significantly greater in the presence of atelectasis (58.1% vs. 38.5%, P = 0.025). Conversely, the PPV was not significantly different according to the presence of pleural effusion (42.7% vs. 40.2%, P = 0.667). However, the PPV was significantly greater in the absence of consolidation (65.5% vs. 22%, P < 0.001), fibrosis (46.7% vs. 25.8%, P < 0.001), and nodules (48.3% vs. 24.8%, P < 0.001). In the logistic regression analysis, the presence of atelectasis was a significant factor for an increased PPV (OR, 2.215; 95% CI, 1.092–4.49), while the presence of consolidation, fibrosis, and nodules were significant factors for a decreased PPV.

Table 2 Effect of concurrent lung lesions for the diagnosis of pneumothorax by AI.

The effect of sites and the amount of concurrent lesions are noted in Table 3. Considering atelectasis, the PPV of pneumothorax was not significantly different when it was located on the contralateral side of the pneumothorax compared to cases with no atelectasis (27.3% vs. 38.5%, P = 0.413). However, the PPV was significantly higher when atelectasis was located on the ipsilateral side relative to the pneumothorax (68.8% vs. 38.5%, P = 0.001). Considering pleural effusion, there was no significant difference in the PPV when it was located on the contralateral side of the pneumothorax compared to cases without pleural effusion. Conversely, when the pleural effusion was located on the ipsilateral side of the pneumothorax, the PPV was significantly higher when the effusion volume was less than one-third and lower when the volume was greater than one-third, respectively (Fig. 1). When consolidation and nodules were present, the PPV of pneumothorax was significantly lower regardless of the side and amount of concurrent lesions. Compared to cases without fibrosis, the PPV of the pneumothorax was significantly lower when fibrosis was present either on the contralateral side or on the ipsilateral side while less than half of the hemithorax in volume.

Table 3 Effect of location and amount of concurrent lung lesions for the diagnosis of pneumothorax by AI.

Radiologists reports about pneumothorax

In their reports, the radiologists reported the absence of pneumothorax correctly in 100% of FP cases, but they missed pneumothorax in 9.6% (13/136) of TP cases (P < 0.001).

When we reviewed the false-negative diagnoses of radiologists, the missed cases were as follows: four patients with minimal amounts of pneumothorax after trauma (bilateral pneumothoraces in three patients); three patients with underlying fibrotic lung disease and persistent, unchanged small amounts of chronic pneumothorax; and three patients with true iatrogenic or primary spontaneous pneumothorax.

Regarding the FP diagnosis of AI, we discerned the reason in 31.3% of cases; about 20.5% of FP cases were from skin folds, 5.6% were from a monitoring patch on the chest wall, 3.1% were from bulla, and 2.1% were from rib opacities, respectively (Fig. 1). Table 4 shows differences in suggested reasons according to the projection view of the CXRs.

Table 4 Suggested reasons for the FP diagnosis of pneumothorax determined by AI.

Discussion

In our study, the overall PPV for diagnosing pneumothorax using commercially available AI software was 41.1%. The PPV was higher for the PA view at 88.2%, but it decreased to 20.1% for the AP view. Significant factors contributing to an increased PPV included younger patient age (OR, 0.945; 95% CI, 0.934–0.956), the PA view of CXRs (OR, 29.837; 95% CI, 15.062–59.107), a high abnormality score for pneumothorax (OR, 1.081; 95% CI, 1.066–1.097), and a greater amount of pneumothorax (OR, 1.005; 95% CI, 1.003–1.007). Additionally, the presence of concurrent atelectasis, small amounts of pleural effusion in the same lung, and the absence of consolidation, fibrosis, and nodules were significant factors that increasing the PPV for pneumothorax.

The effect of age may be attributed to an increasing proportion of the AP view or concurrent lesions in older patients with comorbidities. However, interaction analysis showed no significant difference in ORs concerning the projection view on the effect of age in the diagnosis of pneumothorax. Regarding the CXR projection view, the low PPV in the AP view was thought to result from a significant number of misdiagnoses caused by various artifacts. Among the 195 FP cases, after excluding three cases of bullae in the PA view out of the 61 cases where FP reasons were suggested, all remaining FP cases were recorded in the AP view (40 due to skin folds, 11 due to patches, and 4 cases due to rib opacities). Another possible explanation may be that the visualization of the visceral pleural edge is generally less distinct in the AP projection compared to the PA view, and the location and amount of pneumothorax may appear different. Other contributing factors include the presence of concurrent comorbidities in patients who were unable to stand for the PA view and thus had to adopt a supine position for AP views. This could emphasize the need for more dedicated training based on the projection view of the CXRs.

When considering concurrent lung lesions on CXR, the presence of consolidation, nodules, or fibrosis led to decreased PPVs, whereas combined atelectasis, especially on the same side of the lung, increased the PPV of pneumothorax as detected by AI. Although concurrent pleural effusion did not demonstrate a significant association with PPV when not considering the amount or location, small quantities of pleural effusion on the same side were found to significantly increase the PPV. This may be because ipsilateral atelectasis and a small amount of pleural effusion are frequently associated findings for pneumothorax, regardless of the order or sequence. Conversly, pleural effusion occupying more than one-third of the ipsilateral hemithorax was strongly associated with FP diagnoses of pneumothorax. These findings suggest that large amounts of pleural effusion significantly hinder the accuracy of the AI. In contrast, consolidation, nodules, and fibrosis were associated with lower PPVs, irrespective of the lesion location. This trend likely arose from the confounding effect of underlying abnormalities on CXR and from the increasing comorbidities of patients.

Interestingly, in terms of PPV, we observed that it varied according to patient factors, image-acquisition environment, and concurrent lesion types on CXR. Therefore, it is crucial to consider the specific circumstances of image acquisition while implementing AI software and to reference published data about specific AI software. During the planning stage for AI program implementation, it is crucial to recognize that diagnostic performance in actual hospitals may differ from the vendor’s findings or results from previous studies. At the implementation stage, we should consider factors that may influence the PPV of each AI model, such as patient characteristics, image acquisition protocols, or lesion types. Clinical correlation is needed to accurately interpret AI results10. After implementation, post-deployment modification is essential for continued effective use of AI software, including tailoring these efforts to each hospital13.

Another crucial aspect post-implementation is user education. There were many instances where the cause of FP could not be determined, but fortunately, no radiologists mistakenly reported these cases as pneumothorax. However, the radiologists were all board-certified faculty members at our institution, with no residents or trainees involved. This could affect the results and whether the diagnosis might differ for doctors with varying levels of experience in radiology. In addition, the potential for misinterpretation by inexperienced clinicians exists, underscoring the importance of acknowledging the risk of FP. Promoting the use of AI is another crucial task after implementation. For example, aside from our cases involving low-volume or chronic pneumothorax in chronic lung disease, the three cases of missed TP diagnoses were clinically significant pneumothoraces. This could be an important issue after the actual implementation of AI, and this study is meaningful in demonstrating the actual performance and its influencing factors.

There are several limitations in this study. First, as this was a retrospective study and we included only positive cases detected by AI, we could not analyze false-negative cases. Second, it was challenging to discern the causal relationship between concurrent lung lesions and FP cases. Because of the black-box nature of the AI algorithm, determining the reasons for the AI’s diagnosis was difficult22. We attempted to identify the reason for AI’s FP diagnoses by reviewing CXRs again; however, most of the cases (73.7%) were categorized as having unknown causes. Third, the concurrent lesions analyzed in this study were identified by AI rather than by radiologists. Since the diagnostic accuracy of concurrent lung lesions by AI was beyond the scope of this study, we focused on the factors affecting the diagnostic accuracy of pneumothorax when concurrent results were presented by the same AI software. To address these limitations, a large-scale cohort study is needed as a follow-up to this study. This study highlights the necessity of adjusting cutoff points based on the projection view of images, lesion types, and patient characteristics such as age. Further studies, including false negative cases, are needed to investigate this point as follow-up research.

In conclusion, the overall PPV of the commercially available AI software for the diagnosis of pneumothorax was 41.1%, with different PPVs of 88.2% for the PA view and 20.1% for the AP view. The PPV for pneumothorax diagnosis using AI in CXR can vary based on patient factors, image-acquisition protocols, and the presence of concurrent lung lesions on CXR. It is essential to exercise caution when implementing and interpreting AI results in actual medical fields, considering the purpose of AI use and imaging conditions. Further research is needed to clarify the diagnostic performance of AI software for CXR in different clinical situations.