Introduction

In the early days of December 2019, the first pneumonia cases of a new coronavirus named SARS-CoV-2 were identified in Wuhan, the capital city of Hubei province (China) [1]. As of April 30, 2020, 1,112,667 cases have been reported in the EU/EEA and the UK: Spain (212,917) and Italy (203,591) had the largest contagion [2].

Since the initial spread of this new illness, known as coronavirus disease 2019 (COVID-19), many patients have been hospitalized with respiratory symptoms [3]. The clinical spectrum is broad, including asymptomatic infection, mild upper respiratory tract disease and severe interstitial pneumonia with respiratory failure requiring oxygenation support or intubation [4, 5].

Computed tomography (CT) is the most sensitive radiological technique for the diagnosis of COVID-19, showing diffuse lung alterations ranging from ground-glass opacities to parenchymal consolidations; several radiological patterns are observed at different times throughout the disease course [6, 7]. However, CT has limited specificity for distinguishing between virus [8]. Early diagnosis is hence mainly performed with nasopharyngeal swabs and virus RNA extraction with real-time reverse transcriptase–polymerase chain reaction (rtRT-PCR) [9].

Recently, the visual assessment of lung damage on CT scans has been proven valuable in determining the prognostic implications [10]. Nonetheless, computer-aided quantitative analysis of the CT exam (Quantitative Computed Tomography [QCT]) can also be used for this purpose, as already demonstrated by the research on the acute respiratory distress syndrome (ARDS) [11, 12].

The main objective of our research was to explore the role of QCT analysis as an outcome predictor for COVID-19. To this end, we conducted a monocentric retrospective study on positive patients in a tertiary referral Institution in Milan (Lombardy, Italy) during the recent outbreak, where we experienced hospital overcrowding and shortage of intensive care unit (ICU) beds, mechanical ventilation devices and oxygen [13].

Materials and methods

Study oversight

The present is an observational retrospective study, conducted in a tertiary referral University Hospital in Milan (Lombardy, Italy). The study protocol followed the ethical guidelines established in the 1975 Declaration of Helsinki, compliant with the procedures of the local ethical committee, and was approved by the Institutional Review Board. This study received no financial support.

Data sources

We obtained laboratory, clinical and radiological data of hospitalised patients affected by COVID-19 from the electronic medical records; the inclusion data cutoff for the analyses was January 25, 2020, and April 28, 2020.

COVID-19 was diagnosed based on a positive result on rtRT-PCR assay on nasal and pharyngeal swab specimens [14]. We included only laboratory-confirmed cases who received a non-contrast chest CT at admittance in the emergency department.

We analysed patients’ demographics and clinical and laboratory findings at admittance from electronic medical records. Patient demographics included age, sex, body mass index (BMI), concomitant/previous diseases and smoking habit. Clinical and laboratory assessments consisted of body temperature, PaO2, PaCO2, and C-reactive protein. We also noted the time from symptom onset, days of hospitalisation, ICU admittance, medical therapy administered and the most invasive level of oxygenation support provided. In particular, we distinguished between low-flow oxygenation (nasal cannula, face mask), high-flow oxygenation (Venturi mask, helmet CPAP) and mechanical ventilation through an endotracheal tube. We collected the time interval from CT and oxygenation support, as well as the first PaO2/FiO2 ratio available. In-hospital deaths and healed patients’ discharge dates were also noted.

The clinical features of confusion (mental test score of 8 or less), urea, respiratory rate and blood pressure were also acquired to calculate the CURB-65, a validated score to predict the severity of community-acquired pneumonia [15] that stratifies patients in groups from 1 to 3 according to the risk of mortality.

Chest CT and quantitative analysis

All patients received a standard non-contrast chest CT with a multidetector CT scanner (Philips Brilliance) with the following setup: collimation, 64 × 0.25; voltage, 120 kV; tube current, 130–200 mAs, 240 mA, pitch 1.4, slice thickness after reformat, 2.5 mm. The field of view included the whole chest and was acquired during forced inspiration, in keeping with patient compliance. The dataset was anonymised and exported to a dedicated segmentation suite for medical image computing (3D Slicer, www.slicer.org) [16] equipped with a semi-automated segmentation algorithm (Chest Imaging Platform) [17]. This software, validated as useful in the surgical setting [18], performed a first-pass automated segmentation; then, lung volumes were manually perfected using three-dimensional tools such as spherical brushes or erasers.

As a rule, a complete segmentation included both lungs with interstitial structures, segmentary vessels and bronchi; the main pulmonary arteries and bronchi, all mediastinal structures and eventual pleural effusion were excluded, as well as lung masses (e.g. tumours, fungal disease).

Lung volumes, considered percentages of the total volume, were extracted according to different Hounsfield unit (HU) intervals into non-aerated (%NNL, density between 100, − 100 HU), poorly aerated (%PAL, − 101, − 500 HU), normally aerated (%NAL, − 501, − 900 HU), and hyperinflated (− 901, − 1000 HU) [19]. The additional volume “compromised lung” (%CL) was considered the sum of %PAL and %NNL (− 500, 100 HU) (Fig. 1). The authors in charge of segmentation (E.L., C.L., R.M.) were unaware of the laboratory and clinical parameters or hospitalisation outcomes. Conflicts were resolved in consensus. The principal investigator reviewed and confirmed all segmentations before data entry. The time needed to complete each analysis was recorded.

Fig. 1
figure 1

Quantitative lung CT analysis of an 81-year-old male patient affected by COVID-19. a Non-contrast chest CT at admission showing bilateral ground-glass opacities, common findings of the novel coronavirus pneumonia. b Semi-automated segmentation using 3D Slicer. Blue areas are normal lung parenchyma; yellow areas represent poorly aerated lung in the − 500, − 100 HU interval. c 3D volumetric representation of both lungs. d Comparison between normal and compromised lung volumes. This patient had 6% of compromised lung volume, required no oxygenation support and was discharged after 15 days of observation and supportive therapy

Outcomes

The primary objective was to identify and validate the most accurate lung volume derived by QCT, to predict the two main study outcomes: the need for oxygenation support and the need for intubation in patients affected by COVID-19.

Other objectives included correlation with pulmonary dysfunction as measured by the PaO2/FiO2 ratio and prediction of in-hospital death (Table 1).

Table 1 Multivariate analyses of risk factors for oxygenation support, intubation and in-hospital death

Statistical analysis

Development of prediction models

All analyses were performed using Stata 13 (StataCorp LP). Multiple binomial logistic regressions were performed to explore the correlation of the lung volumes, %NAL, %PAL and %CL, over the two outcomes of interest. All clinically relevant predictors without missing data were included in the final model as covariates: age, sex, smoke habit, CPR, heart disease, chronic lung disease, cancer, diabetes, chronic kidney failure, urea levels and CURB-65 group. Three similar models were thus developed: %NAL-model, %PAL-model and %CL-model.

BMI was available for 161 patients and was tested in a separate model.

A Pearson’s product-moment correlation was run to assess the relationship between the selected lung interval and PaO2/FiO2 in 106 patients (nasal cannula = 26, Venturi mask = 28, helmet CPAP = 21, endotracheal tube = 27).

A Cox regression survival analysis was performed to explore potential predictors of mortality. All potential candidates were tested with univariate analyses; the cutoff for inclusion in the final model was set at p < 0.2.

Model validation

All simulations were run using Python programming language (Python Software Foundation, https://www.python.org/). Categorical variables were preliminarily tested for correlation using chi-square tests; Wilcoxon rank-sum tests were performed on continuous covariates to inspect the probability of being sampled from the same distribution. Two separate multivariate regressions, without regularisation, were performed on both outcomes over the space of covariates. Models’ coefficients, confidence intervals and their associated associations (p values) were investigated to assess whether the selected lung interval remained significant despite adjusting for possible confounders. Predictive machine learning models were built using logistic regression with regularisation. To adjust for class imbalance, on both outcomes, and preserve the limited amount of available observations, the logistic regression was stacked upon a SMOTE model during training [20].

Hyperparameters were chosen by randomised selection over 1000 possible validations by means of 10-fold cross-validation each, for a total of 5000 actually trained models. The aim was to alleviate class imbalance by maximisation of class weighted F1 score, harmonic mean of precision and recall (Fig. 2).

Fig. 2
figure 2

Ten-fold cross-validation for receiver operating characteristic (a) and precision-recall curves (b) showing performance of compromised lung volume as a predictor of oxygenation therapy and of intubation (c and d), based on quantitative analysis of chest CT at hospital admittance

Cross-validated receiver operating characteristic (CV-ROC) and mean areas under the curve (CV-AUC) were calculated [21]. Different cross-validated cut points at 90% sensitivity and at 90% specificity were estimated for both outcomes.

Results

Demographic and clinical characteristics

A cohort of 222 patients (163 males, 59 females, median age 66, interquartile range [IQR] 54–6) was identified (Table 2). The median interval time between symptom onset at admission and CT was 7 days (IQR, 3–8). The median body temperature during the CT exam was 37.7 °C (IQR, 36.9°–38.4°), and the median PaO2 was 67 mmHg (IQR, 57–90). The median hospitalisation length was 11 days (IQR, 7–17, maximum 56).

Table 2 Patients’ characteristics

During the hospital stay, 75% of patients required oxygenation support (Table 3) as follows: 29% low-flow oxygenation (26% nasal cannula, 3% facial mask); 26% high-flow oxygenation (16% Venturi mask, 10% helmet CPAP); and 20% mechanical ventilation with an endotracheal tube. Median PaO2/FiO2 ratio, as recorded on the first day of oxygenation support, was 192.0 (IQR, 122.2–251.5). The median time interval from admission CT to oxygenation support was 1 day (IQR, 0–2).

Table 3 Details of oxygenation support and results of quantitative lung CT analysis

During the observation, antiviral therapy with lopinavir-ritonavir was administered in 55% of patients and darunavir plus cobicistat in 36%. Hydroxychloroquine was used in 91%. All patients also received the best medical therapy tailored to the individual case, including broad-spectrum antibiotics.

Up to the reference date April 28, 2020, out of total 222 patients, 8 (3.6%) were still hospitalised, whereas the remaining 214 (96.3%) were discharged, 150 (67.6%) of them healed and 64 (28.8%) died after a median 9 days hospitalisation (IQR, 5–14): their median age was 75.7 years (IQR, 71–80).

Regarding the CURB-65 score, 129 patients were categorised as low risk (group 1), and 57 patients were medium risk (group 2); no patients were scored as high risk (group 3). Death rates were 19% and 36% respectively.

Diagnostic performance of the different lung volumes

All CT scans were considered of diagnostic quality and successfully analysed. All lung volumes were correctly extracted (Table 3). The median time for segmentation was 11 min (IQR, 7–14).

The need for oxygenation support was accurately predicted both by the %PAL model (LR χ2(13), 86.05; p < 0.0001) and by the %CL model (LR χ2(13), 86.90; p < 0.0001). In both, %PAL and %CL showed a strong predictive value (p < 0.0001) and were the only significant covariates. The %NAL model showed worse performance (LR χ2(13), 39.18; p < 0.0002) and %NAL was not a significant predictor (p = 0.315).

The need for intubation was accurately predicted by all three models with minimal differences in LR χ2(13) (%NAL model, 81.47; %PAL model, 85.31, %CL model, 89.88; p < 0.0001 each) and all three lung volumes showed strong predictive value (p < 0.0001). The compromised lung volume was considered the best single predictor of both outcomes, and the %CL model was chosen for validation.

A separate model was fitted adding BMI as a covariate in a subgroup of 161 patients, proving no effect on both outcomes (oxygenation support, p = 0.130; intubation, p = 0.428).

Validation of compromised lung volume

For the outcome of oxygenation support, the %CL model reached CV-AUROC 0.83 (weighted f1-score, 0.72; standard deviation [SD], 0.098). Cross-validated cut points were identified: 6%CL at high sensitivity (90%; specificity, 43%); 13%CL at high specificity (91%; sensitivity, 53%, Fig. 3).

Fig. 3
figure 3

Quantitative lung CT analysis of a 35-year-old male patient affected by COVID-19. a Non-contrast chest CT at admission showing bilateral ground-glass opacities, interstitial thickening and consolidation in the posterior lung zones. b Semi-automated segmentation using 3D Slicer. Blue areas are normal lung parenchyma; yellow areas represent poorly aerated lung in the − 500, − 100 HU interval; red areas represent non-aerated lung and interstitium, in the − 100, 100 HU interval. c 3D volumetric representation of both lungs showing multiple red areas in keeping with moderate lung impairment. d Comparison between normal and compromised lung volumes. This patient had 18% of compromised lung volume, required high-flow oxygenation support through a Venturi Mask. He was discharged after 17 days of sub-intensive care

For the intubation outcome, the %CL model reached CV-AUROC 0.86 (weighted f1-score, 0.79; SD, 0.07). Cut-points were 10%CL at high sensitivity (90%; specificity, 56%), and 23%CL at high specificity (91%; sensitivity 69% Fig. 4).

Fig. 4
figure 4

Quantitative Lung CT analysis of a 43 years old male patient affected by COVID-19. a) Non-contrast chest CT showing extensive areas of bilateral lung consolidation, multiple ground-glass opacities and interstitial thickening and consolidation b) Semi-automated segmentation using 3D Slicer. Blue areas are normal lung parenchyma; yellow areas represent poorly aerated lung in the − 500, − 100 HU interval; red areas represent non-aerated lung and interstitium, in the − 100, 100 HU interval. c 3D volumetric representation of both lungs showing extensive red areas of consolidation in keeping with severe pneumonia. d Comparison between normal and compromised lung volumes. This patient had 50% of compromised lung volume and required immediate intubation and mechanical ventilation. He died after 13 days of intensive care, due to multi-organ failure

Compromised lung volume and survival

A moderate negative correlation between %CL and PaO2/FiO2 ratio, r(104) = − 0.39, p < 0.001 was observed, highlighting how pulmonary function worsened as %CL increased.

As a result of multivariate survival analysis, %CL was predictive of in-hospital mortality (hazard ratio [HR], 1.02; 95% CI, 1.02 to 1.05, p = 0.01), together with age (HR, 1.06; 95% CI, 1.03–1.10, p < 0.0001), cancer (HR, 3.27; 95% CI, 1.54–6.94), CRP (HR, 1.03, 95% CI 1.01–1.05) and CKD (HR, 5.59; 95% CI, 1.97–15.86). Detailed results are reported in Table 1.

All other variables proved non-significant at univariate analyses and were excluded from the final model.

Discussion

COVID-19 is a new disease outbreak reaching a pandemic level and a threat to global health [22]. Hospital overcrowding, shortness of ICU beds, oxygen and ventilators have been a large-scale concern in in Lombardy (Italy) [13]; thus, accurate and rapid triaging is key to avoid a crisis of the health care network.

Chest x-ray has been proposed as a low-cost tool for detecting lung impairment in patients with suspected COVID-19 [23]. However, its low sensitivity [24] makes it more appropriate for follow-up rather than early diagnosis [25]. Chest CT is instead pivotal to the early diagnosis of COVID-19 due to the ability to detect all disease characteristics [26], and its use has been recommended by the European Society of Radiology in selected patients [27], using a dedicated scanner whenever possible [28]. However, its value is currently limited to visual findings [25], whereas QCT proved to be the tool that allowed for considerable advancements in understanding the ARDS pathophysiology and establishing adequate oxygenation support [29]. These observations led us to perform QCT analysis in all chest CT scans performed at admission on confirmed COVID-19 cases. We identified the volume of compromised lung included in the − 500, 100 HU interval as a predictor of oxygenation support and invasive ventilation; this was also significantly correlated with pulmonary dysfunction as measured by the PaO2/FiO2 ratio and represented a risk factor for in-hospital mortality.

Typical CT appearance of COVID-19 was well described by Chung et al in February 2020 [7] and has been confirmed by several reports [6, 7]. Based on these findings, Yuan et al have proposed a scoring method to screen patients based on the admittance CT scan [10]. More recently, Li et al also described a visual, quantitative analysis of lung damage, based on a “total severity score” to the degrees of parenchymal loss, correlated with a score of clinical severity [30]. However, these visual characterisations are subjective and unsuited to a systematic disease evaluation and data sharing. Conversely, we reported on a QCT method based on 3D Slicer, which is fast and standardised, and ensures a consistently repeatable evaluation of parenchymal impairment. Some advantages are that (a) it is a free, open-source software untied to any workstation; (b) it has a low learning curve [16]; (c) documentation and Internet support are easily found; and (d) it allows for rapid deployment in the ever-changing epicentres of the Coronavirus pandemic.

Colombi et al [31] have used a similar approach to predict the outcomes of COVID-19. They reported good performance of the well-aerated volume (%WAL, − 950, − 700 HU) in predicting the combined outcome of ICU admission and death. Such an outcome, however, would not be informative in our cohort. In fact, 21/48 (66%) of our deaths happened outside the ICU, and 23/48 (48%) of patients successfully healed after ICU admittance, remarking that ICU admission should not be generalised as a “worse” outcome. Moreover, they reported the negative predictive value of a BMI surrogate, the adipose tissue measured at the T7-T8 level, whereas in our study BMI was not predictive of the need for oxygenation support or intubation.

Compromised lung was significantly correlated with the PaO2/FiO2 ratio and increasing pulmonary dysfunction. One reason may be that hypoxemia refractory to oxygenation support is mainly due to intrapulmonary shunt [32] that happens in poorly and non-aerated lung areas, which are well quantified in the %CL. On the contrary, aerated lung volumes such as %NAL or %WAL represent an indirect quantification of this phenomenon and may be more prone to variation during respiratory movements, or ventilation; this may also explain the poor performance of %NAL model in our analysis.

Another approach to quantitative analysis has been proposed by Huang et al [33], who have successfully deployed a commercial deep learning algorithm to quantify lung impairment. However, the prediction of oxygenation support was not among their reported outcomes. Despite the promising power of such an approach, the same authors also report that in 8.7% of cases, the algorithm was unable to correctly identify the lung borders without the help of the radiologist. This need is also shared by our semi-automated approach.

Regardless of the technique used, QCT is seen as the ideal tool that could be able to predict which patient will need a ventilator soon or who no longer requires one. This might help especially in situations with limited hospital resources; in fact, in the process of triaging, the highest priority is to be accorded to those patients whose prognosis is good with intensive care, but poor without it [34]. QCT, particularly the compromised lung volume, may hence be used as a new decision tool, considering its strong correlation not only with respiratory needs but also with in-hospital survival. The need for new metrics specific to COVID-19 is remarked by the low performance of a validated score for pneumonia (CURB-65) that was tested for comparison; most patients (58%) were graded as low-mortality risk (group 1); still, we observed a 15% death rate, much higher than the presumed 1.5% [15].

This study has several limitations. Firstly, it has a retrospective design. Secondly, we did not perform a repeat CT on the healed patients before discharge, to check whether the physiological lung volumes had been restored; this could be the subject of a future study.

Regarding the segmentation process, we included the interstitial structures in the analysed volume (e.g. segmental arteries and bronchi). By partially falling in the same threshold as the non-aerated lung, these may have reduced precision. However, the event might have been irrelevant to the outcome. Firstly, because the poorly aerated volume alone was a strong outcome predictor; secondly, because the same method was applied to all patients thus counterbalancing the potential inaccuracy; thirdly, the added volume of the lung interstitium may be considered negligible compared to the compromised lung volume in the setting of severe pneumonia. On the contrary, this choice allowed for a fast segmentation process.

Regarding respiratory support, 20% of our patients required invasive mechanical ventilation, a higher number compared to 2.5% reported by Wu et al and 17% by Wang and Zhou et al These observations may be partially due to the higher age of the Italian population [35]. Indeed, the median age of our cohort (66 years) was above the ones previously reported in China by Wu et al [36] (51 years), Wang et al [37] and Fei Zhou et al [4] (56 years).

The main strengths of our study are the high number of patients enrolled, the completeness of clinical data, the high statistical significance of all tests conducted and the use of a rapidly reproducible and scalable method for QCT.

In conclusion, lung QCT provides new metrics of COVID-19 and has a promising role in predicting its clinical outcome. The percentage of lung parenchyma in the − 500, 100 HU interval, namely the compromised lung volume (%CL), has shown high accuracy in predicting the need for oxygenation support and mechanical ventilation and is a risk factor for in-hospital death.

We empirically identified different cut points of compromised lung volume: patients presenting with %CL values in the 6–23% range are at risk for needing oxygenation therapy; values above 23% are at risk for intubation. These results strengthen the evidence of QCT as an ideal tool for easing the triaging process of COVID-19.