Patients
This series is part of the COVID-19 Institutional clinical-biological cohort assessing patients with COVID-19 (COVID-BioB, ClinicalTrials.gov NCT04318366) at a 1350-bed tertiary care academic hospital in Milan, Italy. The study was approved by the ethics committee (EC) (protocol number 34/INT/2020). All procedures were conducted in agreement with the 1964 Helsinki declaration and its later amendments; informed consent was collected from all patients according to the EC guidelines.
All consecutive patients aged ≥ 18 years, admitted to the Institution’s Emergency Department (ED) with a positive RT-PCR nasopharyngeal swab between February 25 and April 9, 2020, were initially considered. Patients with a CXR obtained on presentation were included in the study. Exclusion criteria were patients who acquired infection during hospitalization, those transferred to the institution from other hospitals or later transferred to other hospitals, and those with positive RT-PCR as outpatients. A complete exclusion flow diagram is provided in Fig. 1.
Clinical data collection
All prospectively collected clinical data were retrospectively extracted from the study’s dedicated electronic database.
The time-to-event for clinical outcomes, i.e., death, admission to intensive care unit (ICU), and discharge, was calculated from the date of hospital admission to the date of the event; follow-up was right-censored on May 5, 2020.
Clinical outcomes categories were defined as (i) death (primary) and (ii) critical COVID-19, which included patients admitted to ICU and deaths occurring before ICU admission.
Imaging data collection and evaluation
Conventional chest X-ray (CXR) images were acquired in the posteroanterior (PA) or in the anteroposterior (AP) projection for patients not able to stand. All AP projection images were acquired with portable X-ray machines with patients in a supine position or sitting up.
Radiographs obtained on ED presentation were reviewed by two radiologists (F.D.C. and C.M.A.M., respectively, with 30 years and 24 years of experience in thoracic imaging); agreement was obtained by consensus. To minimize bias, reviewers had no knowledge of clinical data other than COVID-19 positivity.
The following radiographic findings were evaluated: hazy opacities, consolidation, hilar enlargement, and pleural effusion [15]. Lung opacities’ distribution was assessed and categorized as follows: peripheral/peri-hilar predominance, upper/lower quadrant predominance, or no predominance and bilateral or unilateral involvement.
The severity of lung involvement, on all baseline CXRs, was quantified by a deep learning artificial intelligence (AI) system (qXR v2.1 c2, Qure.ai Technologies) and compared with a radiologist-assessed score.
qXR is a CE-certified deep learning AI system based on a set of convolutional neural networks (CNNs) trained to detect a number of specific abnormalities on frontal CXRs (blunting of costophrenic angle, cardiomegaly, cavitation, consolidation, fibrosis, hilar enlargement, nodules, opacities and pleural effusion). The specific architectures that form the basic blocks in the systems and detect individual abnormalities are versions of residual neural networks (ResNets) with squeeze-excitation modules with abnormality-specific modifications. The AI system identifies normal CXRs and detects and localizes suspect abnormalities providing results in terms of percentage of involvement and, if necessary, reports the pre-defined tags.
The algorithm was trained on a set of 2.3 million CXRs collected from different centers in different geographical locations [16]. Two different datasets (respectively consisting of more than 89,000 and 2000 distinct CXRs) were used for validation and another set of images for algorithm development. A validated natural language processing algorithm identified the defined abnormalities in the original radiology reports, in the largest dataset, which were considered the gold standard. The developers report that the algorithm, using the radiologists’ assessment as the gold standard, achieved an area-under-the-curve (AUC) for the detection of the specific abnormalities varying from 0.89 to 0.98; notably, AUCs of 0.95 (95% CI 0.92–0.98) for consolidation and 0.94 (95% CI 0.93–0.96) for opacities [16]. The algorithm was additionally tuned with recent images from COVID-19-positive and COVID-19-negative patients [17].
For the purpose of our study, the software output was personalized to only report the extent of consolidation and lung opacities. The severity of the lung involvement was calculated by the AI system as the percentage of pixels involved by opacity or consolidation for each lung (cutoff 3%). The average of the two values ((percentage of right lung involvement + percentage of left lung involvement)/2), Qure AI “score,” was then obtained to reflect total lung involvement (minimum score 0 = no lung involvement; maximum score 100 = complete opacification/consolidation of both lungs) as described in Fig. 2.
The same CXRs were then evaluated by the radiologists using the Radiographic Assessment of Lung Edema (RALE) score to quantify the severity of lung abnormalities [18]. Each CXR was divided into quadrants and each quadrant was assigned a score by a radiologist which described the (i) extent of opacities (0–4; absence, < 25%, 25–50%, 50–75%, and > 75% involvement) and (ii) density of opacity (1–3; hazy, moderate, or dense). The final score (maximum 48) was obtained by summing the product of the consolidation and density scores for each of the four quadrants.
Statistical analysis
Patients’ characteristics were assessed with standard descriptive statistics. Frequencies presented as percentages were used to express categorical values; median values with respective interquartile ranges (IQR) were used for continuous variables. Imputation for missing data was not performed.
To evaluate the sensitivity of the initial CXR, radiological scores of > 0 were interpreted as positive.
The correlation between the two radiological scores was assessed by Kendall’s rank correlation test.
Baseline CXR lung opacity characteristics and radiological scores of patients with symptoms suggestive for COVID-19 for < 7 days or ≥ 7 days at ED presentation were compared using the chi-square test and Mann-Whitney U test, respectively; the cutoff at 7 days was selected as the median value.
The ability of the AI calculated total lung involvement and the radiologist-assessed RALE score to predict mortality and critical COVID-19 was determined by the area-under-the-curve (AUC) of receiver operating characteristics (ROC) curves. The optimal cutoff values were determined on the highest Youden index value for the primary outcome (mortality) and were used to estimate Kaplan-Meier curves for survival and ICU-free survival, which were compared by the log-rank test.
A Cox proportional hazard model including sex and age (model 1) was used to evaluate the association between radiological scores and clinical outcomes. A second more comprehensive Cox proportional hazard model (model 2) which in addition to model 1 variables included important comorbidities, or known risk factors, was also used. Effect estimates were reported as hazard ratios (HRs) with 95% confidence intervals (CIs).
The correlation between the RALE or Qure AI score and clinical signs (considered temperature and PaO2/FiO2 ratio) were evaluated using two-tailed Pearson’s correlation or Kendall’s tau based on the distribution of the variables.
Two-tailed tests were performed, and a p value of < 0.05 was considered statistically significant.
Statistical analyses were performed using SPSS 26 (SPSS Inc./IBM) and SAS version 9.4.