Introduction

Since the approval of ipilimumab for the treatment of metastatic melanoma in 2011 [1], immune checkpoint inhibitors (ICI) revolutionized cancer therapy and came to represent the standard of care for the treatment of several solid and haematological malignancies [2]. Nevertheless, the hyperstimulation of the immune system engendered by ICI may determine adverse events—driven by an anomalous response against normal tissues—such as ICI-related pneumonitis (IRP), mainly related to drugs targeting the programmed cell death protein 1 (PD-1) [3]. While IRP represents a rather frequent, clinically relevant, and potentially lethal condition, there is no available dedicated test to obtain a confirmed diagnosis of IRP, which is therefore suspected in patients receiving immunotherapy who develop pneumonia-like symptoms and exhibit computed tomography (CT) findings suggestive for this condition, after having excluded other causes of pneumonia [4,5,6]. In addition, although several CT patterns of lung damage associated with IRP have been described, the most common findings of IRP are rather non-specific and may include ground-glass opacities, septal thickening, and traction bronchiectasis, which broadly overlap with typical signs of viral pneumonia and interstitial lung disease [7].

During the COVID-19 pandemic, several authors reported the close clinical and imaging similarity of SARS-CoV-2 pneumonia and IRC, pointing out the rising challenge in distinguishing between the two conditions in patients receiving ICI [8,9,10]. Although preliminary data suggested a potential role of artificial intelligence and perfusion CT in this task [11], the current epidemiological transition of COVID-19 towards an endemic status, the frequency of undetermined ground-glass opacities detected during routine follow-up CT studies of oncological patients without pneumonia symptoms, and the potential use of CT to disclose signs of COVID-19 pneumonia in patients with negative reverse transcription polymerase chain reaction (RT-PCR) tests in high prevalence scenarios [12] are all likely to sizably increase the burden of doubtful cases. Indeed, the distinction between IRP and COVID-19 pneumonia is critical, not only for implications related to disease contention and patients’ isolation in SARS-CoV-2 infections, but also because a diagnosis of IRP requires a prompt suspension of immunotherapy and, in selected cases, the administration of corticosteroids.

CO-RADS is a standardized assessment scheme which provides a level of suspicion for pulmonary involvement of COVID-19 based on the features seen at unenhanced chest CT. The system has been developed in a moderate to high prevalence setting with the aim of facilitating the recognition of COVID-19 infection in patients with clinically evident pneumonia and unavailable positive RT-PCR test [13]. Therefore, this study aimed to retrospectively compare radiological findings of IRP and COVID-19 pneumonia in an age- and sex-matched cohort with closely comparable clinical presentation, also evaluating the potential of the CO-RADS score to discriminate between these two conditions.

Methods

Approval for this monocentric study, performed at IRCCS Ospedale Policlinico San Martino (Genoa, Italy), was obtained from the competent Ethics Committee (Comitato Etico Regione Liguria, protocol code 12,306, approved on 02/05/2022). Informed consent to participate to the study was waived due to the retrospective nature of the study; however, all the patients signed the informed consent to undergo the diagnostic examinations and have their data used for research purpose unless emergency situations have occurred.

Study design and population

We screened our institutional imaging database and electronic medical charts of the Radiology Unit of IRCCS Ospedale Policlinico San Martino to identify cancer patients who developed symptoms suspicious for pneumonia during ICI therapy, were referred for CT and this demonstrated findings compatible with IRP, and were ultimately diagnosed with this condition after clinical exclusion of alternative etiologies in the five-year period from March 1, 2016, to February 28, 2021. Due to the retrospective nature of the study, the clinical work-up of patients diagnosed with IRP comprised various combinations of blood tests, bacterial cultures, and bronchioalveolar lavage. Patients with examinations performed after the start of the COVID-19 pandemic in Italy (February 21, 2020) were included in this study only if results of reverse transcriptase–polymerase chain reaction test negative for SARS-CoV-2 infection—performed within ± 3 days of their CT examination—were retrievable in institutional electronic medical charts. After recording their demographic and clinical characteristics (age, sex, cancer type, ICI target), enrolled patients were matched for sex and age (tolerance: ± 6 months) with patients who had a diagnosis of COVID-19 pneumonia confirmed by a positive reverse transcriptase–polymerase chain reaction test and had performed chest CT—both within 48 h from hospital admission—in a period (March 17, 2020, to November 27, 2020) encompassing the first and second wave of the COVID-19 pandemic in Italy before the start of vaccination campaigns.

Image acquisition and analysis

All chest CT examinations were conducted on dual-source 128 × 2 slices CT scanner (Somatom Definition flash, Siemens, Germany). Acquisition parameters were as follows: slice thickness 2.0 mm, 120 kVp, mAs according to patient body size, spiral pitch factor 0.98, and collimation width 0.625. The CT acquisition protocol was adapted to the clinical question (e.g. with contrast medium if the examination was aimed to confirm or rule-out pulmonary embolism or in patients during oncological follow-up; without contrast medium if lung infiltrates or pneumonia were suspected); multiplanar and high-resolution reconstructions were made, as per institutional protocol.

Chest CT examinations of all included patients were reviewed independently and in a random order by two board-certified radiologists (F.Z., Reader 1, and R.P., Reader 2) with 5 and 4 years of clinical experience in thoracic imaging and with 1 year of experience each in the application of the CO-RADS classification [13]. Both readers had access to standard chest CT interpretation settings and tools provided by our institutional PACS system but were blinded to exam identifiers, patients’ names, and medical history.

First, both readers performed a semiquantitative assessment of the extent of lung involvement according to a five-category scheme [14] (0%, 0; 1–25%, 1; 26–50%, 2; 51– 75%, 3; over 75%, 4) applied on each lung, with a maximum involvement score of 8. Then, 13 CT features were evaluated according to the Fleischner Society Glossary of Terms for Thoracic Imaging [15]: craniocaudal (lower, upper, or mixed) and axial (peripheral, central, or mixed) distributions of lung findings, their laterality (unilateral or bilateral), the presence of ground-glass opacities and their appearance (round/circumscribed, patchy, or diffuse), interlobular septal thickening (present or absent), crazy paving (present or absent), consolidations (present or absent), air bronchogram (present or absent), tree-in-bud nodules (present or absent), traction bronchiectasis (present or absent), linear band-like/strip-like opacities (present or absent), mediastinal lymphadenopathy (present or absent [16]), and pleural effusion. Finally, the readers assigned a CO-RADS category to each case.

Statistical analysis

The Shapiro–Wilk test was used to perform distribution analysis. Consequentially, normal distributions were reported using median ± standard deviation and non-normal distributions were reported as median with their interquartile range (IQR). Inter-reader reliability in the assessment of lung involvement extent, of CT features, and of the CO-RADS categories was evaluated with Cohen’s κ, reported with its 95% confidence interval (CI) and interpreted according to the Landis and Koch classification [17]. Analyses of distribution differences were performed on the whole number of observations with the application of patient clustering: distribution differences of ordinal items (lung involvement and CO-RADS scores) between the IRP group and the COVID-19 pneumonia group were evaluated with the Wilcoxon–Mann–Whitney U test, whereas the χ2 and the Fisher’s tests were used to evaluate distribution differences of all remaining nominal items.

The diagnostic performance of the CO-RADS classification in distinguishing IRP and COVID-19 pneumonia was explored separately for the two readers by dichotomizing CO-RADS categories either as “positive for COVID-19” (CO-RADS scores 4 and 5) or “negative for COVID-19” (CO-RADS scores 1–3). Considering the case–control nature of this study and its potential implications on diagnostic performance indexes [18], evaluations of the two readers were reported in a descriptive fashion, while only a preliminary evaluation of specificity was performed.

All analyses were performed with SPSS v.26.0 (IBM Corp., Armonk, N.Y., USA), the p value significance threshold being lowered to p < 0.003 after applying the Bonferroni–Holm correction to account for multiple statistical testing.

Results

Study population

A total of 33 patients with IRP (24 males, 73%) with an average age of 68.1 ± 11.9 years were retrieved, according to the database search described above, in the period between January 11, 2017, and February 02, 2021. A diagnosis of melanoma had been made in 11/33 patients (33%), while the remaining 22/33 (67%) had been diagnosed with non-small-cell lung cancer. Anti-PD-1 monotherapy was administered to 31/33 patients (94%), one patient with non-small-cell lung cancer (3%) received anti-PD-L1 monotherapy, and a combined anti-PD-1/anti-CTLA-4 therapy was administered to another patient (3%) with non-small-cell lung cancer. These 33 patients with IRP were matched for sex and age to 33 patients with an RT-PCR confirmed diagnosis of COVID-19 pneumonia (average age 68.2 ± 11.4 years). As detailed in Table 1, no statistically significant difference was found between the two groups in terms of length of symptoms presence before chest CT, peripheral oxygen saturation before supplemental oxygen administration, need of supplemental oxygen administration, and composite adverse patient outcome (i.e. need of intensive care unit admission or death at 30 days from hospitalization).

Table 1 Patient characteristics according to pneumonia type

Chest CT features

The analysis of inter-reader reliability (Table 2) for the evaluation of the 13 chest features showed a substantial agreement for 9 features, ranging from κ = 0.638 (95% CI 0.467–0.808) for the assessment of axial distribution to κ = 0.743 (95% CI 0.503–0.982) for the evaluation of the presence of tree-in-bud nodules. The remaining 4 features had an almost perfect agreement ranging from κ = 0.833 (95% CI 0.716–0.949) for evaluation of craniocaudal distribution of pulmonary findings to κ = 1.000 (95% CI 0.759–1.000) for the assessment of the unilateral or bilateral distribution of findings. The semiquantitative evaluation of lung involvement extent (with a visual score ranging from 1 to 8) also showed a high inter-reader agreement (75.8%, 95% CI 64.2–84.5%) and substantial or higher inter-reader reliability, with a non-weighted κ = 0.683 (95% CI 0.558–0.809) and a linear-weighted κ = 0.805 (95% CI 0.705–0.905).

Table 2 Assessment of inter-reader reliability in the evaluation of the 13 chest CT features

As detailed in Table 3, among the 13 chest CT features, a statistically significant association with COVID-19 pneumonia or IRP was observed only for the laterality of findings, with a unilateral presentation being observed in 21.2% of IRP cases and in no COVID-19 cases (p < 0.001). As shown in Fig. 1, IRP patients also had a significantly lower total extent of lung involvement (median 2.5, IQR 2–4) compared to patients with COVID-19 pneumonia (median 4, IQR 4–6, Mann–Whitney U 1174, p < 0.001).

Table 3 Distribution of CT features according to pneumonia type
Fig. 1
figure 1

Distribution of the extent of lung involvement across the two pneumonia groups. IRP Immune checkpoint inhibitor-related pneumonitis

CO-RADS assessment

As expected, overall CO-RADS scores were significantly higher (Mann–Whitney U 980, p < 0.001) in the COVID-19 pneumonia group (median 5, IQR 4–5) than in the IRP group (median 3, IQR 3–4), as depicted in Fig. 2. Table 4 details category-specific CO-RADS scores assigned by the two readers, highlighting a 77.3% agreement (51/66 cases, 95% CI 65.8–85.7%) with an overall substantial inter-reader reliability (κ = 0.664, 95% CI 0.512–0.814) that had its lowest category-wise value in the CO-RADS 3 category (κ = 0.536, 95% CI 0.295–0.778). Consequently, inter-reader agreement in the dichotomized CO-RADS classification improved marginally to 84.8% (95% CI 74.3–91.6%, 54 of 66 cases) and inter-reader reliability slightly lowered to κ = 0.651 (95% CI 0.453–0.848).

Fig. 2
figure 2

Distribution of CO-RADS scores across the two pneumonia groups. IRP Immune checkpoint inhibitor-related pneumonitis

Table 4 Cross-tabulation of CO-RADS score assignments between readers

The distribution of CO-RADS scores in the two groups is listed in Table 5: Reader 1 correctly identified 17/33 IRP patients (52%) with a CO-RADS 1–3 score, while the remaining 16/33 patients were incorrectly assigned a CO-RADS 4 (9/33, 27%) or a CO-RADS 5 score (7/33, 21%), for a resulting 51.5% specificity (95% CI 33.5–69.2%). Reader 2 showed a closely comparable 54.6% specificity (95% CI 36.4–71.9%), having correctly assigned a CO-RADS 1–3 score to 18/33 IRP patients (55%) and having incorrectly assigned a CO-RADS 4 score to 7/33 patients (21%) and a CO-RADS 5 score to 8/33 patients (24%) (Fig. 3).

Table 5 Distribution of CO-RADS scores among pneumonia groups according to each reader assessment
Fig. 3
figure 3

Six examples of CO-RADS system application in COVID-19 pneumonia and IRP cases. a COVID-19 pneumonia classified as CO-RADS 5, true positive; b IRP classified as CO-RADS 5, false positive; c IRP classified as CO-RADS 4, false positive; d IRP classified as CO-RADS 2, true negative; e IRP classified as CO-RADS 2, true negative; f COVID-19 pneumonia classified as CO-RADS 3, false negative

Discussion

The introduction of ICI in cancer care during the last decade is widely recognized as a major milestone in cancer research and treatment. Currently, more than two-third of drug-related trials in oncology regard ICI and further growth of clinical indications to single or combined ICI therapy is easily foreseeable in the next future [19]. However, IRP represents a frequent adverse effect of ICI therapy: a recent meta-analysis [20] found an overall 2.7% incidence of IRP in patients treated with anti-PD-1 molecules, rising to 10% when ICI are part of combination therapies [21]. Although its pathogenesis remains largely unknown, IRP is widely hypothesized to be a multi-layered autoimmune process, including abnormal T-cells reaction against self-peptides, production of autoantibodies, overexpression of inflammatory cytokines, and development of complement-mediated inflammation [22]. Bronchoalveolar lavage specimens from IRP patients demonstrated a decrease of T regulatory cells, proliferation of CD8 lymphocytes, and CD4/CD8 ratio inversion [23, 24], but to date alveolar lavage and lung biopsy are not routinely indicated due to the absence of specific pathological findings [25]. The increasing diffusion of ICI-based therapies, the relatively high incidence of IRP in treated patients, and the current COVID-19 pandemic are posing serious challenges in the interpretation of chest CTs of patients who develop pneumonia-like symptoms during ICI treatments. This holds particularly true when considering that IRP patients may be silent carriers of SARS-COV-2 infection, that imaging features of IRP and COVID-19 pneumonia can substantially overlap, and that classification systems devised to aid the interpretation of CT scans in suspected COVID-19 patients were developed in a medium-to-high prevalence scenario. The current transition to an endemic SARS-CoV-2 circulation warrants further investigations about the potential of these scores in the differential diagnosis of COVID-19 and other interstitial lung diseases such as IRP.

In this retrospective study, two radiologists blindly reviewed 66 CT examinations from two groups of age- and sex-matched patients with COVID-19 pneumonia and IRP: of note, the two groups did not significantly differ in major clinical characteristics. For every CT examination, each reader assigned a CO-RADS score and evaluated the presence of a series of predetermined descriptive CT features. Finally, the extension of lung involvement was graded through a semiquantitative scale consisting of five consecutive classes. While we observed a high overall inter-reader agreement (77.3%) and overall substantial inter-reader reliability (Cohen’s κ = 0.664) in CO-RADS assignments, class-specific reliability analysis showed only a moderate reliability for the CO-RADS 3 category (Cohen’s κ = 0.536), reflecting uncertainties in the differential diagnosis of COVID-19 pneumonia in less-than typical cases. Again, the two readers had at least substantial agreement in the evaluation of descriptive CT features, with high agreement in the semiquantitative estimation of lung involvement. Aside from the bilateral presentation found in all COVID-19 patients but only in 80% of IRP patients, no other descriptive CT feature was significantly associated with one of the two groups, highlighting the close overlap of CT appearance of the two conditions. However, our results also enable us to hypothesize that in larger samples the presence of linear band-like opacities and of interlobular septal thickening may turn out significantly more present in COVID-19 patients, also considering that COVID-19 patients had a significantly higher extent of lung involvement (p < 0.001).

Even if—as expected—the overall CO-RADS scores were significantly higher in the COVID-19 pneumonia group, exploratory analysis of specificity revealed medium-to-low estimates: among IRP patients, a CO-RADS score ≥ 4 was incorrectly assigned in at least 45% of patients (48% for Reader 1 and 45% for Reader 2, respectively). In addition, no IRP cases which were assigned a CO-RADS ≥ 4 score by one reader had a CO-RADS < 3 score assigned by the other, underlining the challenging nature of this differential diagnosis. Post hoc case revision revealed that “patchy” ground-glass opacities and septal thickening were the features most commonly driving the reader towards erroneous diagnosis of COVID-19 pneumonia. This appears in line with the fact that in the CO-RADS system, the detection of multifocal bilateral ground-glass opacities is pivotal in shifting from CO-RADS 3 to higher classes: however, a comparatively high number of IRP patients in our study demonstrated that specific feature, suggesting that its relative diagnostic weight should be reconsidered when interpreting CT scans of patients receiving ICI therapy.

A recent meta-analysis [26] showed that from 2 to 58% of SARS-COV-2 patients (mean summary estimate 12%) may have an initial false negative RT-PCR test and that in these patients CT currently plays a pivotal role as a complementary tool to diagnose COVID-19 pneumonia: thus, evidence of a low performance of CO-RADS in the differential diagnosis of IRP and COVID-19, coupled to the absence of significantly associated CT features—save for semiquantitatively scored lung involvement and bilateral presentation—may prompt a reconsideration of the diagnostic pathway of ICI patients during the current pandemic.

Our study presents several limitations, chiefly related to its single-centre nature and relatively small sample size: however, we tried to mitigate the effects of these limitations with a case-matched analysis of IRP patients—who did not have a significantly different clinical presentation compared to COVID-19 patients—and by building a highly self-consistent group of IRP patients, all with a clinically confirmed IRP diagnosis which is essentially a diagnosis of exclusion and is reached in a minority of patients. In addition, the retrospective nature of the study prevented any investigation on the impact of chest CT findings in modifying the clinical course of IRP patients.

In conclusion, in our case–control analysis, significant differences in chest CT appearance of IRP and COVID-19 pneumonia were represented by bilateral presentation and lung involvement extent, while 12 other CT features variously overlapped between the two groups. The CO-RADS score exhibited a medium-to-low differential diagnostic potential, suggesting that in the current SARS-CoV-2 pandemic context clinical and imaging findings may not be sufficient to appropriately reach a differential diagnosis when evaluating patients receiving.

ICI and developing pneumonia-like symptoms. Extensive multimodal investigation including blood and culture tests, bronchoalveolar lavage analysis, and CT should be warranted in doubtful cases and therapeutic decision making (e.g. suspending ICI and starting steroids) should be based on a careful and personalized cost–benefit analysis that must consider the probability of IRP and individual patient’s conditions. Further studies are warranted to investigate if the development of advanced imaging techniques and innovative CT-based biomarkers may assist the differential diagnosis between IRP and COVID-19 pneumonia.