Background

There is a growing demand for healthcare providers to improve and provide “value-based” services to critically ill patients (Toua, de Kock, and Welzel 2016). Nowadays, clinical medicine relies heavily on the accurate prediction of patient outcomes. Patients with poor outcomes may benefit from severity of illness scoring systems, which provide objective information for stratifying and prioritizing patients. Having access to a reliable outcome prediction model would aid efforts to enhance patient care (Rahmatinejad et al. 2020).

After its publication in the early 1980s, the APACHE score became widely used in ICUs (Le Gall 2005). APACHE IV is based on physiologic abnormalities and was proven in 2006 to be effective in determining the severity of illness in critically ill patients (Yamin, Vaswani, and Afreedi 2011). In 2002, a group of researchers from all over the world gathered new data on the physiologic changes, clinical presentation, and outcomes of critically ill patients in over 300 ICUs around the world. These findings led to the creation of the SAPS III as a new prognostic model (Metnitz et al. 2005) (Moreno et al. 2005). Also, an international panel of experts created the SOFA score in 1994, which was initially published in 1996. It was designed to assess multiple organ failures in sepsis patients, but it has also been utilized as a prediction scoring system (Vincent et al. 1996).

In a retrospective clinical trial, researchers investigated how well the APACHE II, APACHE IV, and SAPS III scoring systems predicted death in a 16-bed surgical medical ICU. They found that the sensitivity and specificity of the scoring systems were comparable in terms of mortality prediction, but that the accuracy of SAPS III and APACHE II was greater than that of APACHE IV (Evran et al. 2016). On the other hand, the discriminatory performance of the APACHE IV model was excellent and comparable to that of the APACHE II and SAPS III models in another retrospective assessment of surgical ICU patients (Lee et al. 2014). Moreover, Sun et al. (Sun et al. 2017) found that the SAPS-II score was the most accurate, followed by the APACHE IV and SOFA scores, in predicting short-term mortality in the cardiac intensive care unit (CICU). In addition, a prospective cohort study in a CICU showed that both the APACHE II and the SOFA scores have a good and comparable discriminative capacity for predicting outcomes, but the APACHE II has better calibration and accuracy indices (Argyriou et al. 2015).

Therefore, there has been a lack of consensus over the efficacy of APACHE IV, SAPS III, and SOFA scores in predicting ICU mortality and length of stay in surgical ICUs. Furthermore, the vast majority of previous research that used illness severity scores to evaluate patient outcomes was retrospective cohort research. This analytical cross-sectional study was designed to determine how well APACHE IV, SAPS III, and SOFA scores predict patient outcomes, including ICU mortality and the length of ICU stay in a surgical/trauma ICU.

Methods

Ethics and registration

The study has been approved by the institutional ethics committee (approval number: 449/3/20) and registered on ClinicalTrial.gov (NCT04683094). The research complied with the ethical principles outlined in the Declaration of Helsinki of 1964 and its subsequent modifications. Written informed consent was obtained from all patients or a close family member when the participant was unable to give consent.

Patient inclusion and exclusion criteria

The current study adopted a cross-sectional design that enrolled patients admitted to a 12-bed surgical/trauma intensive care unit (ICU). Patients under the age of 16; those with an estimated ICU stay of less than 24 h; those with incomplete APACHE IV, SAPS III, and SOFA data; and those who were readmitted to the ICU during the study period were excluded.

The following admission data were recorded:

  • Age, sex, and comorbidity

  • Admission clinical diagnosis

  • Systolic and diastolic arterial blood pressure (mm Hg), heart rate (beats/min), respiratory rate (breaths/min), body temperature (C), initial Glasgow Coma Scale (GCS), mechanical ventilation or not, and amount of vasopressor if present

  • Arterial blood gas analysis (pH, PaO2, PaCO2, oxygen saturation, and base excess), FiO2, and PaO2/FiO2

  • Scores for APACHE IV, SAPS III, and SOFA on day 1 (including the worst values during the first 24 h)

  • Laboratory data (white blood cell count, hematocrit, platelet count, and serum levels of sodium, potassium, creatinine, and bilirubin)

  • A 24-h urine output

Outcome measures

The primary outcome was the validity of the APACHE IV, SAPS III, and SOFA scores to predict a patient’s death and survival following surgical/trauma ICU admission. Secondary objectives included the effectiveness of various severity scoring systems in predicting the length of ICU stay and the association between patients’ mortality and baseline clinical and laboratory data on ICU admission.

Sample size calculation

The sample size calculation was carried out using G*Power 3 software (F. Faul, E. Erdfelder, A. G. Lang 2007). A minimum calculated sample of 145 patients admitted to the surgical/trauma intensive care unit (ICU) was needed to detect an effect size of 0.12 in the accuracy of APACHE IV and SAPS III (AUC = 0.836 and 0.741, respectively) (Ma et al. 2017) (Jahn et al. 2019) in the prediction of mortality, with an error probability of 0.05 and 95% power on a one-tailed test.

Statistical analysis

Data were verified, coded by the researcher, and analyzed using IBM-SPSS 24.0 (IBM-SPSS Inc., Chicago, IL, USA). Descriptive statistics mean standard deviations, medians, ranges, and percentages were calculated.

Test of significance

The chi-square test was used to compare the differences in the distribution of frequencies among different groups. Student t-test analysis/Mann–Whitney U was carried out to compare the means of dichotomous data (parametric/non-parametric). Logistic regression models/Cox hazard regression was used for the assessment of the prediction of ICU severity scoring systems (SSS) of mortality (backward stepwise likelihood ratio method), and the Hosmer and Lemeshow calibration test was calculated to test the model fit. A receiver operating characteristic curve was depicted to investigate the diagnostic performance of ICU SSS for the prediction of mortality, analyzed as the area under the curve (AUC), standard error (SE), and 95% confidence interval (CI). Validity statistics (sensitivity, specificity, and positive and negative predictive values) were calculated. A Spearman’s rank correlation analysis was used to test the association between variables. The agreement between ICU SSS in the prediction of mortality was examined using the Bland–Altman curve. A significant P-value was considered when it was < 0.05.

Results

The study enrolled 200 patients admitted to the surgical/trauma ICU between April 2020 and December 2021. Fifty-two patients were excluded from the study because they were under the age of 16 (n = 20); stayed less than 24 h in the ICU (n = 16); had incomplete data for APACHE IV, SAPS III, and SOFA scores (n = 10); or were readmitted during the study time (n = 6), as only the first admission was included.

Clinical characteristics of patients

The clinical characteristics of the 148 participants, as well as their APACHE IV, SAPS III, and SOFA scores and admission diagnosis, are displayed in Table 1. The mean age of patients was 48.45 years, 54.7% were male (N = 81), 44.6% (N = 66) had no comorbidity, and 34.5% (N = 51) were mechanically ventilated on admission. The most common admission diagnosis was polytrauma (40%), followed by postoperative complications (35.8%) and sepsis (14.2%). The mortality rate among the studied patients was 32.4% (N = 48), and the mean length of ICU stay was 6.94 ± 4.7 days. Table 2 compares the clinical and laboratory data between survivors and non-survivors and shows that non-survivors scored higher on all severity scoring systems than survivors.

Table 1 Clinical characteristics and severity scoring data of the studied patients
Table 2 Clinical, laboratory, and severity scoring in survivor and non-survivor patients

Logistic/Cox hazard regression analysis of the APACHE IV, SAPS III, and SOFA scores for mortality and survival prediction

The logistic regression analysis for mortality prediction shows that after adjusting for the three scales, APACHE IV was the only significant predictor of mortality; with a 1-point increase in the APACHE IV score, there was a 5% increase in death probability (AOR = 1.049, 95% CI 1.028–1.069) (P-value < 0.001 (Table 3). On the other hand, the Cox hazard regression analysis for survival prediction shows that SAPS III was the only significant predictor of survival after adjusting for the three scales; with a 1-point increase in the SAPS-III score, there was a 2% increase in death probability (AHR = 1.021, 95% CI 1.008–1.035) (P-value = 0.002) (Table 4).

Table 3 Logistic regression analysis of predictive power of mortality for APACHE IV, SAPS III, and SOFA scores
Table 4 Cox proportional hazard regression analysis for survival predictive power of APACHE IV, SAPS III, and SOFA scores

The diagnostic performance of APACHE IV, SAPS III, and SOFA scores in predicting ICU mortality

The diagnostic performance of the ICU severity scoring scales for prediction of mortality was measured as the area under the receiver operating characteristic curve (AUC). The three scores had high accuracy; the AUC ranged between 0.716 (95% CI, 0.625–0.806, P < 0.001) for SAPS III, 0.734 (95% CI, 0.659–0.825, P < 0.001) for SOFA, and 0.766 (95% CI, 0.670–0.862, P < 0.001) for APACHE IV scores as presented in Fig. 1. As well, the sensitivity, specificity, PPV, and NPV values of the three scores are illustrated in Table 5.

Fig. 1
figure 1

The receiver operating curves of APACHE IV, SAPS III, and SOFA scoring systems for mortality prediction. The three scores demonstrated a good discrimination performance, with an AUC of 0.766 (95% CI 0.670–0.862; P-value < 0.001), 0.716 (95% CI 0.625–0.806; P-value < 0.001), and 0.734 (95% CI 0.659–0.825; P-value < 0.001), respectively

Table 5 Diagnostic criteria of APACHE IV, SAPS III, and SOFA scoring system for prediction of mortality

The calculated models showed that both expected and observed event rates in subgroups are similar for the three scores (P > 0.05), i.e., the three scores are well calibrated for the prediction of mortality by the Hosmer and Lemeshow calibration test. The agreement between the three ICU severity scores for the prediction of mortality application is illustrated in Fig. 2 using a Bland–Altman plot, revealing good agreement between each pair of the three severity scores for the prediction of mortality.

Fig. 2
figure 2

Bland–Altman plot for agreement of APACHE IV, SAPS III, and SOFA scores in terms of predicting mortality. There is a good agreement between each pair of the three scores

Correlation analysis

The univariate Spearman’s ranked correlations of APACHE IV, SAPS III, and SOFA scoring systems and patient length of stay revealed a significant positive correlation between APACHE IV score and ICU length of stay (r = 0.22, P = 0.004). However, there was no significant correlation between SAPS-III, SOFA, and ICU length of stay (r = 0.07 and 0.06, respectively, P > 0.05). Likewise, the APACHE IV score had a significant positive correlation with both the SAPS III and SOFA scores (r = 0.52 and 0.58, respectively, P < 0.001). There was also a significant positive correlation between SAPS III and SOFA scores (r = 0.63, P < 0.001) (Fig. 3).

Fig. 3
figure 3

The univariate Spearman’s ranked correlations of APACHE IV, SAPS III, and SOFA scoring system and length of stay. There was a significant positive correlation between APACHE IV score and ICU length of stay. Also, there was a significant positive correlation between the APACHE IV score and both SAPS III and SOFA scores and between SAPS III and SOFA scores

Discussion

This analytical cross-sectional study of 148 patients admitted to a surgical/trauma ICU assessed how effectively APACHE IV, SAPS III, and SOFA scores predicted patient outcomes such as ICU mortality and the length of ICU stay. Our findings demonstrate that the APACHE IV score was the only significant discriminating predictor of mortality, and the SAPS III was the only significant discriminating predictor of survival based on 24-h values after admission to the ICU. Moreover, the APACHE IV score was superior to the SAPS III and SOFA scores regarding accuracy, while the three scores were similar in calibration for mortality prediction. We also observed a significant positive correlation between the APACHE IV score with the length of ICU stay, whereas the SAPS III and SOFA scores showed no significant correlation and length of ICU stay.

Our results are consistent with those of a recent study that aimed to assess and compare the predictive value of the APACHE II, APACHE IV, and SAPS II scores for predicting inhospital mortality in the emergency department and found that APACHE IV was superior to the APACHE II and SAPS II in terms of discrimination and calibration (Rahmatinejad et al. 2020). Furthermore, Bennett et al. (Bennett et al. 2019) demonstrated the predictive ability of the APACHE IV score in CICU patients, concluding that the APACHE IV predicted mortality model at 24 h had the highest AUC value of 0.82 (95% CI, 0.81–0.84) for hospital death with good discrimination, followed by APACHE III 0.81 (95% CI, 0.80–0.83) (P = 0.001). However, calibration for hospital death prediction was suboptimal for both the APACHE III score (P = 0.01) and the APACHE IV (P < 0.001). The discriminatory performance of the APACHE IV model was extremely good and similar to that of the APACHE II, SAPS III, and Korean SAPS III models, according to a retrospective study evaluating electronic medical records for patients admitted to the SICU, although all of the models had poor calibration (Lee et al. 2014).

The SAPS-III score was originally designed to assess disease severity and predict mortality of patients treated in surgical ICUs and not as a predictor of survival following ICU admission. We found that SAPS III was also useful in predicting survival in our population of SICU patients, which was an interesting finding. Sakr et al. (Sakr et al. 2008) discovered that SAPS III had a good discriminating capability but poor calibration using prospectively collected data from 1851 patients hospitalized in a surgical intensive care unit. The difference between their findings and ours could be because their study only included postoperative patients, whereas ours included polytraumatic patients. Furthermore, the smaller population size and prospective SAPS-III data calculation in our study may contribute to discrepancies.

In a retrospective cohort study, both APACHE III and APACHE IV had better discriminatory capability but were less calibrated than SAPS III in predicting inhospital mortality, which contradicts our findings, which estimate ICU mortality rather than hospital mortality (Keegan et al. 2012).

Our results are generally in line with a prospective cohort study in a CICU for predicting outcomes; both SOFA and APACHE II scores exhibited excellent discriminative capacity, with AUCs ranging from 0.84 (SOFA) to 0.92 (APACHE II) (Argyriou et al. 2015). Furthermore, the SOFA score was highly recommended for predicting the outcomes of ICU trauma patients in a study of 706 patients admitted to the ICU with significant trauma because it was more easily and simply calculated than the APACHE II and Trauma and Injury Severity Scores (TRISS) (Hwang et al. 2012).

Additionally, Ma et al. (2017) discovered that the SAPS-III score of non-survivors was significantly greater than that of survivors. Our findings revealed that non-survivors scored higher on APACHE IV, SAPS III, and SOFA scores, which is consistent with a clinical study done in respiratory ICU (RICU) that found that APACHE IV and SAPS II scores were significantly higher between non-survivors than survivors on admission, but APACHE IV score showed a negative correlation with RICU stay, in contrast to our findings, which showed a significant mild positive correlation between the APACHE IV score and ICU stay (El-naggar, Raafat, and Mohamed 2018). This disparity could be attributed to the different variety of patients in their RICU and our SICU. Another study in RICU revealed that the mean ± SD admission SOFA score differed significantly between the survivors and non-survivors (4.95 ± 2.49, 6.11 ± 2.76; P = 0.028, respectively) (Galal et al. 2013).

Moreover, non-survivor cases exhibited significantly lower systolic and diastolic blood pressure, as well as a higher body temperature, than survivors. In terms of laboratory findings, arterial blood gas analysis results showed that the mean PO2, SpO2, and PaO2/FiO2 levels in non-survived patients were significantly lower than in survived cases. The average PaCO2 and FiO2 values in dead patients were higher than in living cases. Non-survived patients also had lower 24-h urine output and a considerably greater incidence of vasopressor and mechanical ventilation demands on ICU admission than survived ones. Likewise, Hwang et al. (Hwang et al. 2012) found that in major trauma patients admitted to the ICU, non-survivors had a lower O2 index, systolic blood pressure, and GCS scale.

Our research included several limitations. First, the study involving a single SICU may have limited generalization to other ICUs due to bias in the case mix. Second, our study had a relatively small sample size.

Conclusions

A considerable body of evidence indicates that the APACHE IV score is the only reliable predictor of surgical/trauma ICU patient mortality. It outperforms the SAPS III and SOFA scores in terms of accuracy, while the three scores were similar in calibration for mortality prediction. The APACHE IV score has a significant positive correlation with the length of the ICU stay, whereas the SAPS III and SOFA scores have no significant correlation.