Since its emergence in December 2019, COVID-19, caused by SARS-CoV-2, has been responsible for over 763 million confirmed cases and at least 6.9 million cumulative deaths worldwide as of 25 April 2023.1 The wide clinical spectrum of COVID-19 ranges from asymptomatic or mild infection to critical illness leading to severe respiratory failure, requiring admission to the intensive care unit (ICU).

Predictive scoring systems allow clinicians to estimate a variety of clinical outcomes, including mortality, that may facilitate decision-making. Existing scores such as the Acute Physiology and Chronic Health Evaluation (APACHE II) and Sequential Organ Failure Assessment (SOFA) scoring systems are prognostic tools that are widely used within the ICU to assess the risk of mortality in critically ill patients.2,3,4 These scores were initially developed for general critical care admission, so their prognostic utility for COVID-19-related critical illness is uncertain. The few studies that have attempted to explore the role of these scores in COVID-19 disease are limited by small sample sizes and variable conclusions and were conducted immediately at the onset of the pandemic, thus may not be reflective of current therapeutic options.5,6,7,8

The 4C Mortality Score was developed by the International Severe Acute Respiratory and Emerging Infection Consortium (ISARIC) to predict mortality among patients admitted to the hospital with COVID-19.9 The variables used in the ISARIC 4C Mortality Score include patient demographics, clinical observations such as vital signs, and common laboratory values available at hospital admission.9 In contrast, the APACHE II score requires 12 physiologic variables in addition to the patient’s age and chronic health status, while the SOFA score includes variables such as partial pressure of arterial oxygen/fraction of inhaled oxygen (PaO2/FIO2) that are not always available or easily attainable in this population.2,3,4 While there are clear advantages in the ease of use of the 4C Mortality Score, there are currently limited data on whether this disease-specific risk stratification tool can predict mortality in populations with more severe COVID-19 who are being treated in the ICU, since it was developed and validated in a general hospital population.9

The primary objective of this historical cohort study was to externally validate the ISARIC 4C Mortality Score as an effective predictive scoring system among critically ill patients admitted to a Canadian ICU with COVID-19. The secondary objective was to compare the 4C Mortality Score’s discriminative ability with that of the APACHE II and SOFA scoring systems for in-hospital mortality.

Methods

Study design and setting

This historical cohort study was conducted at the ICU of the Jewish General Hospital in Montreal, QC, Canada. The medical-surgical ICU is located within a university-affiliated tertiary care hospital and operates using the closed, intensivist-led model.

Ethics

This study was approved by the institutional Research Ethics Committee of the Centre Intégré Universitaire en Santé et Services Sociaux (CIUSSS) West-Central Montreal, Jewish General Hospital; Study Identifier 2021-2362; approved on 3 February 2022. Due to the nature of the study and anonymous data collection, the need for informed consent was waived.

Cohort assembly

At the onset of the pandemic, a clinical database was prospectively established for all patients with a positive nasopharyngeal polymerase chain reaction test result for SARS-CoV-2 admitted to the ICU at the Jewish General Hospital, either directly from our emergency department, COVID-19 wards, or another institution. Our institution was designated as the regional ICU for COVID-19 admissions in the first wave of the pandemic, and transfers occurred within 24 hr of arrival at the referring centre. All patients listed in this database between 5 March 2020 and 5 March 2022 were eligible for inclusion in this study. No patients were excluded from the cohort based on pre-existing limitations of care. Patients were excluded if their COVID-19 status was determined to be incidental to the indication for ICU admission (i.e., not causing respiratory failure), as adjudicated independently by two team members (S. D. and B. S.).

Medical care

All elements of medical care were left to the discretion of the treating intensivist. Patients with pre-established care directives available at the time of ICU admission that limited life-sustaining therapies, specifically endotracheal intubation, were still recipients of other forms of respiratory support such as high-flow nasal oxygen therapy or noninvasive ventilation. They are referred to as “not for intubation” in this manuscript. Moreover, decisions regarding therapies, including corticosteroid dose and timing, were left to the discretion of the treating intensivist, though we did have an institutional protocol to use high-dose steroids (dexamethasone 20 mg iv daily for five days, followed by 10 mg iv daily for five days) for patients on greater than 70% FIO2 during the first wave.10

Data sources and study variables

Each patient’s complete electronic medical record (ChartMaxx® version 7.00; Quest Diagnostics® Incorporated, Secaucus, NJ, USA) was reviewed and relevant data were extracted into an encrypted computerized spreadsheet. The data were extracted independently by two team members (S. D. and T. V.). Resulting data were compared, and discrepancies rectified using a consensus process, with a goal of ensuring valid and high-quality data extraction. Additionally, the ICU admission note was abstracted for demographic information, including age, sex, and patient comorbidities. Patient vital signs, supplemental oxygenation treatments, and laboratory values were extracted from the critical care flowsheet and electronic laboratory system (Open Architecture Clinical Information System [Oacis], Telus Health, Montreal, QC, Canada). Information regarding limitations to therapy was extracted from care directives available at the time of ICU admission. Treatments received were determined from the medication administration records, and other clinical outcomes, such as length of stay, intubation, and mortality were abstracted from the clinical progress notes. Extracted data were then used to calculate the 4C Mortality, APACHE II, and SOFA scores for each patient in our cohort. The variables within the 4C Mortality Score include age, sex, number of comorbidities from the Charlson index, respiratory rate, peripheral oxygen saturation on room air, Glasgow Coma Scale, serum urea, and level of C-reactive protein. The 4C Mortality Score was calculated using the worst value of the variables within the first 24-hr period of hospital admission. In cases of hospital transfer, data from the external institution was used.

The SOFA score and the APACHE II score were calculated for each patient based on the worst variables obtained within the first 24-hr period of ICU admission. The respiratory component of the SOFA score requires an arterial blood gas value. Nevertheless, many patients in our cohort did not have an arterial blood gas value available from the first 24-hr window of ICU admission, leading to missing PaO2/FIO2 variables. Therefore, the respiratory and total SOFA score for all cohort patients was calculated using a validated technique,11 which includes using the imputed SpO2/FIO2 ratio in place of PaO2/FIO2. The SpO2/FIO2 ratio for these patients was determined by averaging the three lowest SpO2 values in the first 24 hr of ICU admission and taking the highest delivered FIO2 among these three SpO2 values to calculate the imputed SpO2/FIO2 ratio.

Outcomes

The primary outcome of our study was in-hospital mortality, with the objective of evaluating the ISARIC 4C Mortality Score’s model calibration and discrimination for predicting in-hospital mortality in a cohort of intensive care patients.

Missing data

In the event that a variable required for calculating the ISARIC 4C Mortality Score was missing, the median value for this variable from our cohort was imputed.

Statistical analysis

Descriptive statistics were used to summarize baseline demographic and clinical characteristics. Continuous variables are presented as median [interquartile range (IQR)] values because of data distribution. Categorical variables are presented as count and percentage unless stated otherwise.

We conducted comparisons between groups using the Wilcoxon rank-sum test for nonnormally distributed continuous data and the Chi square test for categorical variables. All analyses were conducted in a two-tailed fashion with statistical significance set at P < 0.05 and were performed using SAS 9.4 (SAS Institute Inc., Cary, NC, USA), STATA/MP version 15 (StataCorp LLC, College Station, TX, USA), and R version 3.6.2 (R Foundation for Statistical Computing, Vienna, Austria) with the -rms­ package used in R.

We constructed a univariate logistic regression analysis to examine each mortality prediction score and the outcome of interest, in-hospital mortality. We use the Wald test to estimate 95% confidence intervals (CIs).

To validate the ISARIC 4C Mortality Score model, we measured its discriminative properties and its calibration within our cohort. Calibration compares the proportion of observed patients who had in-hospital mortality against the expected proportion based on the 4C Mortality Score model. Discrimination assessed the ability of the model to correctly distinguish between hospital decedents vs survivors, as assessed by the C-statistic. Calibration of our cohort model to the predicted model was assessed using calibration at large, examining the slope and intercept of the calibration model.

We conducted prespecified sensitivity analyses to explore the stability of the model’s discrimination of the 4C Mortality Score with the exclusion of the following individuals: 1) patients > 80 yr old; 2) patients admitted 24 hr or more post calculation of their 4C Mortality Score; and 3) patients with “not for intubation” status.

Furthermore, we constructed a multivariable logistic regression model to adjust for the effect of “not for intubation” status on the predictive ability of the 4C Mortality Score for in-hospital mortality. Internal validation of this model was completed using a bootstrapping technique of 1,000 samples to determine its calibration and discrimination and correct for optimism.

Finally, we measured and compared the discriminative properties of the 4C Mortality Score, the APACHE score, and the SOFA score within our cohort.

Results

During our study period, 459 patients were admitted to the ICU with a diagnosis of COVID-19, of whom 30 (6.5%) were excluded from this study because of incidental COVID-19 infection at the time of ICU admission. Of 429 patients in our study, 211 (49.2%) were admitted from our emergency department, 186 (43.4%) from our COVID-19 medical ward, and 32 (7.5%) were transferred from the emergency department of another hospital within 24 hr of arrival. Patients spent a median of 1 day [IQR, 0 to 2] on the COVID-19 medical ward prior to ICU transfer.

The clinical and demographic characteristics of the overall cohort and a breakdown between survivors and patients who died is shown in Table 1.

Table 1 Demographic and clinical characteristics of the overall cohort, and comparison between survivors and patients who died

In two (0.5%) patients, C-reactive protein was not measured at admission. No other variables were missing.

The primary outcome of in-hospital mortality occurred in 102 (23.8%) patients in our cohort. Patients who died were noted to be older, be more hypoxemic on ICU admission, and have more underlying comorbidities compared with survivors.

Five percent of our patients had a 4C Mortality Score between 0 and 3, and 17.4% had a score of 4–8; both these subgroups had 0 deaths. The 242 (56.4%) patients with 4C Mortality Scores between 9 and 14 had a mortality rate of 21.9% whereas 107 (24.9%) of our patients had scores ≥ 15 and a mortality rate of 45.8%. This is comparable with the original validation cohort in which 52.2% of patients had scores between 9 and 14 and 18.6% had scores ≥ 15, with mortality rates of 31.4% and 61.5%, respectively.9

The fitted receiver operator characteristic curve of the ISARIC 4C Mortality Score had an area under the curve (AUC) of 0.762 (95% CI, 0.717 to 0.811) for in-hospital mortality in our cohort. The SOFA score had an AUC of 0.705 (95% CI, 0.648 to 0.761) and the APACHE score had an AUC of 0.722 (95% CI, 0.667 to 0.777) for the prediction of in-hospital mortality (Fig. 1). Calibration of the ISARIC 4C Mortality Score appeared adequate, with a calibration slope of 1.06 and an intercept of -0.65 (Fig. 2).

Fig. 1
figure 1

Model performance comparison for discrimination of in-hospital mortality using receiver operating characteristics curves for the ISARIC 4C Mortality Score, the APACHE II score, and the SOFA score.

APACHE = Acute Physiology and Chronic Health Evaluation; ISARIC = International Severe Acute Respiratory and Emerging Infection Consortium; SOFA = Sequential Organ Failure Assessment

Fig. 2
figure 2

Calibration plot for cohort mortality (observed) vs ISARIC 4C Mortality Score (predicted). The black line represents ideal fit; the blue line represents logistic calibration with 95% confidence interval bands.

Sensitivity analyses suggested that the 4C Mortality Score had stable discriminatory ability across subsets of data that excluded patients: > 80 yr old, admitted 24 hr or more post calculation of their 4C Mortality Score, and those with “not for intubation” status (Table 2).

Table 2 Prespecified sensitivity analyses

The 50 (11.7%) patients that were “not for intubation” were noted to be significantly older with median age 83 [74–87] vs 65 [54–72] and had higher median 4C scores at 15 [12–17] vs 12 [9–14] compared with those with no limits to care. This group had a mortality of 68%, representing 33% of all patients who died.

The AUC of the 4C Mortality Score after adding “not for intubation” status as a variable into the logistic regression model was 0.802 [0.757–0.847, optimism-corrected 0.800]. The calibration of the new model showed good fit with our ICU cohort (Fig. 3).

Fig. 3
figure 3

Calibration plot for the new model including ISARIC 4C Mortality Score adjusted for “not for intubation” status. Perfect predicting accuracy is represented by the “ideal” line. The “apparent” line represents the predicting accuracy within our study. The “bias-corrected” line represents the prediction accuracy following bootstrap resampling (1,000 resamples).

Discussion

The ISARIC 4C Mortality Score was derived and validated in a cohort that was comprised of mostly noncritically ill patients.9 Therefore, we wanted to assess whether the ISARIC 4C Mortality Score was able to maintain its predictive calibration and discrimination in a cohort of solely critically ill patients within the ICU setting and compare this model with existing prognostic models such as the APACHE II and the SOFA scores, which are already critical illness specific.

Findings from our study provide additional validation and support for the use of the ISARIC 4C Mortality Score as a prognostic tool for mortality among patients admitted with COVID-19 to the ICU. The AUC for the 4C Mortality Score obtained in our study of 0.762 was similar to that obtained in the original validation study (0.767 [95% CI, 0.76 to 0.77]),9 despite the lower incidence of mortality at 23.8% in our cohort compared with 30–32%. Furthermore, our cohort model showed no evidence of poor model fit as suggested by evaluation of the calibration slope and intercept of our calibration curve. In addition to showing the utility of this score in an exclusively critically ill population, given the timeframe of the cohort, our study also validated the use of the 4C Mortality Score across the spectrum of different SARS-CoV-2 variants, whereas the original ISARIC study was derived exclusively among the initial SARS-CoV-2 wild type strain.

We also found that this predictive ability remained robust across a wide variety of sensitivity analyses that are important to the critical care environment and were not initially explored in the validation cohort. Finally, we observed that this predictive ability is preserved, and possibly even slightly enhanced, when adjusting for patient’s preference to not undergo endotracheal intubation. This finding may be of significant relevance for practicing critical care providers who are often faced with sparse data regarding the outcomes of patients with such limitations to therapy.

Recently, another validation study of the ISARIC 4C Mortality Score in a critically ill cohort of 1,493 ICU patients in Saudi Arabia was conducted.12 In this cohort, a similar AUC to ours was reported (0.81), providing additional support for the use of the 4C Mortality Score in a critical care setting.12 In contrast to our study, they reported an overall higher mortality (38% vs 24%) despite a younger median age of their population (51.1 yr for survivors and 56.8 yr for nonsurvivors vs 64 and 75 yr old) and lower 4C Mortality Scores (6.3 and 12.4 vs 11 and 14).12 This may reflect differences in patient demographics or variations in health care systems or practices. Nonetheless, the results of our study not only confirm the utility of the 4C Mortality Score in critically ill cohorts, but also in different critical care settings, adding to the landscape by providing support for its use within Canadian critical care units.

The discrimination of the ISARIC 4C Mortality Score (0.762) showed good predictive ability in comparison with the APACHE II (0.722) and SOFA (0.705) scoring systems. In addition, we feel that the 4C Mortality Score has other advantages over the traditional scoring systems in the context of COVID-19-related admissions. It is an easy-to-use tool that requires only eight variables, all of which are commonly available at first assessment in the hospital. In contrast, many of the variables required for the APACHE II and SOFA require measurements that are available only after admission to a critical care setting and necessitate more invasive interventions.

Additionally, recent studies have shown that male sex, older age, a higher number of comorbidities, and more inflammatory changes within the respiratory system are key determinants of prognosis related to COVID-19.10,11,12,13,14 The ISARIC 4C Mortality Score includes all these important variables thus reflecting COVID-19-related prognosis effectively. On the other hand, the SOFA score was created primarily to assess organ dysfunction related to sepsis and among the six variables it uses, studies have shown that only the respiratory, renal, and hepatobiliary organ systems were found to be associated with mortality in COVID-19.15,16 The APACHE II score has some elements arguing against its use for COVID-19-related pneumonia. Firstly, all current evaluations of ICU performance are weighed against the original reference population of 5,815 patients from 13 American hospitals in 1985; therefore, its use for present-day populations is often associated with issues such as poor calibration.17,18 Secondly, the inclusion of many complex variables meant to capture general critical illness does not capture the predominant involvement of the respiratory system present in our population.17,18 Taking these factors into account, as well as the 4C Mortality Score’s ease of use and ability to generate meaningful prognostic data prior to ICU admission, the 4C Mortality Score is, in our opinion, a more appropriate tool for use in cases of COVID-19 respiratory failure.

Our study is limited by its single-centre nature and mixed prospective and retrospective design, with a significant strength being the inclusion of all unselected cases of COVID-19-associated respiratory failure. As with any retrospective study, there is the risk of misclassification and missing data for certain variables in the primary record. These limitations were mitigated by having the data abstracted by trained researchers and possible discrepancies discussed among the authors. Our study had an extremely small number of 2 (0.5%) missing observations for C-reactive protein values, and thus a low risk of any bias from missing data. Our use of imputation of the median value would not have been ideal in a situation with more missing data, but we are confident that a complete case analysis would have yielded similar results and any bias of our approach would be towards reduced discrimination. Finally, our single-centre design could potentially limit generalizability to other Canadian ICUs, but should be considered representative of units with similar populations and operational models.

In conclusion, the ISARIC 4C Mortality Score is an easy-to-use tool that showed a good predictive performance for in-hospital mortality in a cohort of COVID-19 patients admitted to an intensive care for respiratory failure. Our results suggest a good external validity of the score in a more severely ill population.