Introduction

Chest radiography (CXR) is generally considered entry level imaging to screen many pulmonary diseases with good performance as a screening uptake evaluation [1, 2]. The interface between the bronchial tree, containing air, and structures with no air gives the radiographic image a natural contrast, used to advantage radiological interpreters (author radiologists) to depict abnormal findings [3]. These intrinsic anatomical features, along with continuous technical advancements in the field of digital radiography, have significantly contributed to make chest radiography one of the most requested radiological investigations [1, 4].

Over the last decades, digital chest radiography has iteratively and incrementally improved, with numerous processing tools being developed to support radiologists in the detection of pathological findings [2, 4, 5]. Most of these tools have been implemented to improve nodule detection, including digital tomosynthesis [6,7,8], dual energy and temporal subtraction techniques [9,10,11], computer-aided detection systems [12, 13] and dark-field CXR. More recently, dark-field CXR has been demonstrated to be a valuable complementary tool for the assessment of pulmonary infiltrates, cardiomegaly and hemopericardium [14, 15]. Such techniques are not yet widely available, and their use requires further validation. In comparison, the grey-scale inversion technique is universally available, being a built-in feature on most Picture Archiving and Communication System (PACS) display workstations. Based on the evidence that viewing the inverted image (black on white) improves human contrast perception [16], grey-scale inversion has been proposed as a valid supplementary tool to increase the diagnostic accuracy of radiographic imaging [17,18,19,20,21]. In chest radiography, the diagnostic value of inverted images has been investigated mostly for parenchymal nodules [17, 22,23,24,25,26], pneumothorax [20] and rib fractures [27] detection. The clinical advantages of using this display method, however, are still debated, and no general consensus has been reached.

The purpose of this study is to investigate whether the additional use of grey-scale inversion technique improves the interpretation of the main chest abnormalities, in terms of both diagnostic performance and interobserver variability.

Material and methods

Ethics statement

This study was approved by the Institutional Review Board of the University Hospital of Parma (Prot. 51059). Given the retrospective nature of the study, informed consent was waived.

Study group

The study selection criteria were as follows: chest CT examination and CXR obtained within 24 h of each other, in patients older than 18 years of age admitted to the University Hospital of Parma between October 2017 and October 2019. CTs and CXRs images affected by motion artefacts or other technical limitations (e.g. chest structures only partially included within the CT acquisition volume or the CXR projection) were excluded. Chest CT served as standard of reference (CT technique is reported in Supplementary material).

CXR imaging technique

Posteroanterior (PA) and left-lateral (LL) images were obtained with the patient standing up and in full inspiration with three digital radiography systems (Axiom Aristos FX, Siemens Healthineers; Essenta DR, Philips and DigitalDiagnost, Philips). Acquisition parameters were as follows: 125 kV, 1.6 mAs, antiscatter grid with a 180cm focus–detector distance.

Anteroposterior (AP) images were acquired with the patient either lying down or sitting up with two computed radiography systems (Practix 33 Plus, Philips and Practix 300, Philips). Acquisition parameters were as follows: 95–98 kV, 3.2 mAs, with a 120cm focus–detector distance.

Images were visualized on a dedicated workstation (BARCO visualization system, Kortrijk, Belgium), and grey-scale inversion was performed through a built-in software of our PACS workstations (suite Estensa, Esaote, Genova, Italy) (Figs. 1 and 2).

Fig. 1
figure 1

Representative example of right apical pneumothorax (arrows) in standard (A) and inverted grey-scale (B) CXR images (posteroanterior projection)

Fig. 2
figure 2

Representative example of bilateral parenchymal nodules (arrows) in standard (A) and inverted grey-scale (B) CXR images (anteroposterior projection)

Data collection and interpretation

CXR—Images of CXR were retrieved from the local PACS and independently reviewed by one general radiologist with 18 years of experience (Reader 1) and two third- year radiology residents (Readers 2 and 3), for the presence of eight predefined findings: atelectasis, consolidation, interstitial abnormality, nodule, mass, pleural effusion, pneumothorax and rib fractures. Chest abnormalities were classified based on the Fleischner Society glossary [28]. Standard grey-scale (also called “white bones”) and inverted grey-scale (“black bones”) CXRs were evaluated in two separate reading sessions, as follows:

  • Session 1: standard setting first, followed by inverted grey-scale

  • Session 2: inverted grey-scale first, followed by standard.

There was a wash out interval of at least 4 weeks between the two reading sessions, and images were evaluated in random order. For each session, annotation of findings was recorded separately for standard and inverted grey-scale to analyse the findings by either first line standard or inverted. Subsequently, the adjunct findings by consecutive reading with either approach were recorded. This database allowed testing of CXR accuracy and interobserver agreement under different reading settings and combinations (see Statistical analysis). Reading time was recorded for each reader and session.

Standard of reference—The diagnostic performance of CXR with different visualization modes was tested against CT, as standard of reference. CT images were reviewed independently by two resident radiologists (Readers 4 and 5, respectively) who had access to the radiological reports, and classified into positive or negative, as follows:

  • Positive CT was assigned in case of at least one of the eight above-mentioned findings;

  • Negative CT was assigned when none of them was present.

Any discrepancy between Readers 4 and 5 was resolved by a chest radiologist with 13 years of experience.

The same classification system was applied to discretize CXR outcome in binary categories.

Statistical analysis

Continuous data were expressed as median and its 95% confidence interval (95% CI), whereas categorical data were expressed as absolute and relative distribution, with corresponding 95% CI using Wilson method.

The following reading settings were assembled for comparison with CT standard of reference:

  • Setting 1: standard reading only, derived from session 1

  • Setting 2: inverted reading only, derived from session 2

  • Setting 3: combined reading, first standard followed by inverted reading as per full session 1

  • Setting 4: combined reading, first inverted followed by standard reading as per full session 2

Sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) were calculated for each reader and all reading settings; accuracy was tested with the area under the curve (AUC) values and its 95% confidence interval (95% CI). Interobserver agreement was tested by Cohen’s K test with quadratic weights (kw) and its 95%CI: kw < 0.20 was considered to indicate poor agreement, 0.21 < kw < 0.40 fair agreement, 0.41 < kw < 0.60 moderate agreement, 0.61 < kw < 0.80 good agreement, and 0.81 < kw < 1.00 very good agreement.

A p value < 0.05 was deemed statistically significant. Statistical analysis was performed by MedCalc Software bvba (version 19.1–64-bit, Ostend, Belgium).

Results

Study population

A total of 553 consecutive patients underwent a chest CT and CXR within 24 h of each other, at the University Hospital of Parma between October 2017 and October 2019. Forty-six (8.32%, 95%CI 6.3% to 10.92%) patients were excluded because of CT and/or CXR technical limitations: 31/46 (67.39%, 95%CI 52.97% to 79.13%) because of CT motion artifacts; 15/46 (32.61%, 95%CI 20.87% to 47.03%) due to chest structures only partially included within the CT acquisition volume or CXR projections. A total of 507 (median age 69.95%CI 66.94 to 71; 285/507 men, 56.2%) patients were enrolled (Fig. 3). Main clinical indications for both CT and CXR included trauma, chest pain, dyspnea, fever and persistent cough.

Fig. 3
figure 3

Flow chart of patient selection

CT findings

A total of 393/507 (77.5%, 95%CI 73.68% to 80.93%) CTs were scored positives and 114/507 (22.5%, 95%CI 19.07% to 26.32%) negatives. Detailed distribution of CT pathological findings is reported in Table 1, whereas CT acquisition data in Supplementary material.

Table 1 Distribution of chest CT pathological findings

CXR acquisition data

PA and LL projections were performed in 254/507 (50.1%, 95%CI 45.76% to 54.44%), whereas AP projection was performed in 253/507 (49.9%, 95%CI 45.56% to 54.24%). The effect of reading setting was comparable for both standing and supine CXR imaging.

Reading time

The median reading time of session 1 was 79 s [95%CI, 77 to 85 s] for Reader 1, 84 [95%CI, 80 to 88 s] for Reader 2, and 83 [95%CI, 80.5 to 87 s] for Reader 3, whereas that of session 2 was 61 s [95%CI, 57 to 64 s] for Reader 1 and 59 for Readers 2 [95%CI, 57 to 60.7 s] and 3 [95%CI, 55 to 60 s].

Diagnostic performance of CXR

Overall, sensitivity of CXR for any finding ranged 7.1–60% for setting 1, and 8.2–60% for each of setting 2, 3, and 4. Specificity ranged 76.5–99.8%, 78–99.8%, 73.3–100%, 73–100%, for settings 1, 2, 3 and 4, respectively. PPV ranged 16.7–90%, 17.2–90.9%, 20–100%, 17.8–100%, for settings 1, 2, 3 and 4, respectively. NPV ranged 69.6–99.2%, 69.3–99.2%, 69.5–99.2%, 69.8–99.2%, for settings 1, 2, 3 and 4, respectively. AUC ranged 0.529–0.781, 0.527–0.779, 0.531–0.779, 0.529–0.779, for settings 1, 2, 3 and 4, respectively. Overall, CXR accuracy was not significantly improved by the inverted images compared to setting 1. For Reader 3, CXR sensitivity was improved by the combined reading at the detection of consolidation in setting 4 and of pneumothorax and rib fractures in setting 3, whereas for Reader 1 the combined approach improved CXR PPV at the detection of pleural effusion in setting 3. Sensitivity, specificity, PPV, NPV and AUC values are detailed for each radiological finding in Table 2.

Table 2 Diagnostic performance of CXR—only standard (setting 1); only inverted (setting 2); standard + inverted (setting 3) and inverted + standard (setting 4) (AUC: area under the curve; NPV: negative predictive value; PPV: positive predictive value)

Interobserver agreement

Kw values for any finding ranged 0.23–0.63, 0.13–0.73, 0.21–0.66, 0.14–0.75 for settings 1, 2, 3 and 4, respectively. Regardless of size, interobserver agreement at the detection of pneumothorax between the residents and the senior radiologist showed a slight improvement in both settings 3 and 4 (Table 3). Kw values were generally higher for large pneumothorax—sized ≥ 3 cm [29]—with only two exceptions of greater values observed for small pneumothoraces (sized < 3 cm). Details are reported in Table 4.

Table 3 Interobserver agreement between the three Readers (Reader 1: experienced radiologist; Readers 2 and 3: radiology residents)
Table 4 Interobserver agreement between the three Readers for small and large pneumothoraces

Discussion

We observed that grey-scale inversion display mode did not significantly improve diagnostic performance or interobserver agreement compared with standard viewing mode. Combinations of standard and inverted modes could help in reducing the interobserver variability across different levels of expertise.

The visualization of CXR is usually performed by “white bones” mode on video-terminal; however, the perception of CXR images is also (variably) preferred with “black bones” mode. The latter represents a subjective adaptation of the standard setting, based on the individual feeling that the detection of abnormal findings is eased by the inverted images. We undertook this study for systematic evaluation of such perception and showed that there is no actual diagnostic difference. Our results partially confirm previous observation from Park et al. who investigated sensitivity and accuracy of the grey-scale inversion technique, limited to the detection of rib fractures. Park reported that the combination of the two reading modalities could improve chest radiography sensitivity and accuracy among residents and medical students, namely among readers with limited experience [27]. In our study, the combined use of the two approaches increased CXR sensitivity at the detection of consolidation for one resident, when using reading setting 4 (i.e. first inverted, followed by standard), pneumothorax and rib fractures in setting 3 (i.e. first standard, followed by inverted) and 4. However, the improvement did not reach statistical significance for accuracy performance.

Interobserver agreement at the detection of pneumothorax between the residents and the senior radiologist showed a moderate improvement in both sessions and, as expected, was generally higher for large pneumothoraces in all settings and among all readers. Since the required reading time for both sessions was relatively short (not greater than 84 s), the combined use of the two display modes might be worth exploiting when pneumothorax is suspected. Having said that, pneumothorax was scarcely represented among the enrolled patients (5.5%, 28 cases).

The combined reading approach improved the PPV at the detection of pleural effusion by the senior radiologist, but showed a general drop in diagnostic performance as compared to the standard approach for the same reader. The unfamiliarity with the “black bones” images might have affected their interpretation by the senior radiologist. As pointed out by McMahon et al., when a new type of image results in lesser accuracy, the unfamiliarity with the new approach must be taken into account prior to blaming intrinsic properties of the new modality [11]. This “unfamiliarity effect” tends to have a minor impact on younger author radiologists, who are inevitably less affected by a long-lasting habit.

Thompson et al. reported that two display modes can improve nodule detection [26]. These authors hypothesized that the advantage of using two display modes might lie in the fast-flicking between the two images, namely standard and inverted, which would draw attention to suspicious areas, (e.g. lung periphery). This fast-flicking technique was not employed by our readers, for whom the detection was already slightly improved, suggesting that it might only partly explain the advantages of such a combination. Even if limited in number, the majority of studies that have applied the grey-scale inversion display mode to chest radiography have attempted to demonstrate its additional value in detecting lung nodules, either real or simulated, with opposite results [17, 22,23,24,25,26]. Nodules were fairly represented in our sample (16.8%, 85 cases), and significant differences were not observed in accuracy or interobserver agreement with the combination of the two techniques. Their depiction rate was generally low among the three readers, ranging 7.1% to 17.7%. One of the reasons of such low percentages can be found in their relatively small size (nodule median diameter of 7 mm, 95%CI, 6 to 8 mm), which has likely contributed to reduce their detectability by CXR. Previous studies reported better performance in nodule detection, notably with relatively larger solid nodules [17]. As opposite to previous analyses, a nodule size range was not set at the time of patient selection (22), since the general intent of this investigation was to reproduce a real clinical setting, without focusing on a pre-defined finding.

To our knowledge, this is the first study testing eight different abnormal findings at the same time and within such a large population. Indeed, the majority of studies that have investigated the application of grey-scale inversion display mode into chest radiography only tested one selected finding at time, enrolling no more than 300 subjects. Furthermore, we included bedside CXRs, with the aim of reproducing a real clinical setting, where a good proportion of patients is unable to stand (e.g. trauma patients or severely ill ones). Of note, the effect of reading setting was comparable for both standing and supine CXR imaging.

Our study, however, has several limitations. First, the retrospective design is prone to confounding factors, such as selection of patients. Second, CXRs were obtained with different technical equipment and parameters, which can ultimately affect the detectability of findings, nonetheless representing the actual routine of this imaging modality. Third, some of the findings included in the analysis were barely represented within the sample, such as mass (1.97%, 10 cases). Finally, the presence of only one senior radiologist limited the possibility of investigating the impact of different levels of expertise.

In conclusion, we observed no significant advantages in the use of grey-scale inversion technique in expert radiologist. The combination of grey-scale inversion display modes with standard mode could reduce the interobserver variability in readers with limited expertise.