Background

The Sequential Organ Failure Assessment (SOFA) score, conceived in 1996, was intended to increase our knowledge about organ dysfunction, help us better understand interactions between failing organs, and play a role in the design of clinical trials [1]. Thousands of scientific reports on critically ill patients have incorporated the SOFA score in various ways since then. Despite its popularity, SOFA has been criticised for having failed to reach the goals originally set out for this tool by the Working Group on Sepsis Related Problems of the European Society of Intensive Care Medicine [2]. Accumulating evidence indicates that diagnostic and therapeutic advancements in critical care over the last 30 years have significantly weakened the clinical value of the SOFA score [3].

Simple scoring rules such as SOFA invite the addition of points to arrive at a cumulative score. Notwithstanding the explicit recommendation against the use of the total SOFA as a proxy for the overall severity of multiorgan failure, much of the available literature relies on the total score as a summary measure of organ function [1]. This is problematic as two strong assumptions are necessary for the validity of the total SOFA score: first, each individual organ failure would need to carry the same prognosis; second, categories used by SOFA would have to accurately reflect the degree of organ dysfunction [4].

As is the case for multiple scoring systems in critical care, little is known about the performance of SOFA in older adults. Therefore, we aimed to explore the prognostic impact of physiological disturbances in the six organ systems included in the SOFA score in patients ≥ 80 years old admitted to intensive care units.

Methods

The Very Old Intensive Care Patient (VIP2) was a prospective, multicentre study conducted in 22 countries and registered on ClinicalTrials.gov (ID: NCT03370692). The study enrolled patients aged 80 years or older acutely admitted to intensive care units (ICUs) without any exclusion criteria. Sources of data, methods of measurement, and results of analyses based on VIP databases were described in detail in the previous paper [5]. Participating ICUs were asked to enrol consecutive patients over a 6-month period with a possibility to end patient accrual after including the 20th participant in the study. A patient’s vital status within 30 days of admission to the ICU was ascertained by inspecting hospital records, direct contact with the patient, or querying a national registry. Participants were recruited between May 2018 and May 2019. Each country had a national coordinator responsible for securing the required ethical and regulatory approvals. A waiver of informed consent for participation in the study was granted in some countries.

Outcomes in this study were ICU- and 30 day mortality. Baseline characteristics included patients’ age, sex, and reason for admission to the ICU. We used the SOFA score to assess the severity of organ dysfunction within the first 24 h after admission to the ICU. Six organ systems are included in the SOFA score: cardiovascular, respiratory, renal, neurologic, hepatic, and coagulation. Between 0 and 4 points were assigned in each organ system, with an increasing number of points corresponding to a more severe organ failure. The highest score observed within the first 24 h was reported. We used the Clinical Frailty Scale (CFS) to describe a patient’s frailty before admission to the hospital, with nine possible classes from very fit prior to the acute illness to terminally ill. Necessary information was given by the patient, their proxy or obtained from the medical records.

Descriptive statistics on baseline variables were presented as medians (interquartile ranges [IQR]) or counts and percentages. The relation between the SOFA score in each organ system and mortality was adjusted for age, sex, reason for admission to the ICU, and the CFS score, which were selected as potential confounders based on the author’s clinical expertise and availability in the dataset. Statistical adjustment was performed using a logistic regression model while keeping age and CFS as continuous variables in distinct models that used either ICU- or 30 day mortality as dependent variables. The SOFA score was modelled twofold: as the original, categorical variable (i.e., 1, 2, 3, or 4 points assigned in each organ system with 0 score as a reference),as a dichotomous indicator of organ failure (i.e., SOFA score ≥ 3 points in each domain), and as a total score, as reported in previous papers. We performed an analogous sensitivity analysis after exclusion of patients in whom life sustaining treatment (LST) was introduced. The required sample size was not calculated a priori. We decided that a complete-case analysis was justified considering the high completeness of data. All analyses were performed using R version 3.6.0 (RProject). Reporting conforms to the STROBE statement [6] (Additional file 1: Table S1).

Results

Of 3920 patients enrolled in the VIP2 study, 3813 contributed data to analyses of the prognostic impact of the SOFA score (Fig. 1). Patient characteristics were shown in Table 1. Distribution of the SOFA score stratified by organ system was presented in Fig. 2.

Fig. 1
figure 1

Study flow-chart. CFS, Clinical Frailty Scale; SOFA, Sequential Organ Failure Assessment

Table 1 Patient characteristics
Fig. 2
figure 2

Histograms of the SOFA score by organ system. ICU, intensive care unit; SOFA, Sequential Organ Failure Assessment. A Respiratory SOFA, B Cardiovascular SOFA, C Neurological SOFA, D Renal SOFA, E Liver SOFA, F Coagulation SOFA

Estimates of both crude and adjusted effects of different organs’ failure on ICU- and 30 day mortality were summarised in Table 2. Organ failure defined as a SOFA score ≥ 3 was associated with variable adjusted odds ratios (aORs) for ICU mortality dependant on the affected organ system: respiratory, 1.53 (95% CI 1.29–1.81); cardiovascular 1.69 (95% CI 1.43–2.01); hepatic, 1.74 (95% CI 0.97–3.15); renal, 1.87 (95% CI 1.48–2.35); central nervous system, 2.79 (95% CI 2.34–3.33); coagulation, 2.72 (95% CI 1.66–4.48). Modelling consecutive levels of organ dysfunction separately resulted in aORs equal to 0.57 (95% CI 0.33–1.00) when patients scored 2 points in the cardiovascular system and 1.01 (0.79–1.30) when the cardiovascular SOFA equalled 3. Adjusted odds ratio for mortality estimated for different categories of the SOFA score were shown in Fig. 3 and Table 3. The total SOFA score was associated with ICU mortality (OR 1.26, 95% CI 1.23 to 1.29) and 30-day mortality (OR 1.20, 95% CI 1.18 to 1.23). Results of the sensitivity analysis including 2468 patients in whom LST limitation was not introduced are summarised in the Additional file 1: Tables S2, 3.

Table 2 Logistic regression models, odds ratio for mortality estimated for organ failure (SOFA ≥ 3 in each organ system)
Table 3 Logistic regression models, odds ratio for mortality estimated for original SOFA categories (reference = 0 in each category)
Fig. 3
figure 3

Association between each component of SOFA score with ICU mortality. SOFA, Sequential Organ Failure Assessment. A Respiratory SOFA, B Cardiovascular SOFA, C Neurological SOFA, D Renal SOFA, E Liver SOFA, F Coagulation SOFA

Discussion

In this multicentre cohort study of patients ≥ 80 years old acutely admitted to ICUs between the years 2018 and 2019, corresponding degrees of organ dysfunction in several organ systems included in the SOFA score translated to substantially different odds of death in the ICU and 30-day observation. Increasing number of points assigned in the cardiovascular component of the SOFA score was not uniformly associated with a poorer prognosis.

Our results corroborate existing evidence of the potential complexity of use of the total SOFA score as a summary measure of multiorgan failure. Pölkki and colleagues have shown that the maximum daily SOFA score measured within the first day after admission to the ICU was not a valid surrogate of mortality in over 60,000 Finnish patients [7]. In their study, the risk of in-hospital death associated with failure of different organs diagnosed using the SOFA score varied widely. Further, the cardiovascular component of the SOFA score did not work as intended due to the rarity and specificity of situations which prompted a dopamine infusion. It is increasingly clear that reliance on dopamine administration as a measure of dysfunction of the cardiovascular system is no longer justifiable, rendering the cardiovascular domain of the SOFA score in need of an urgent revision [8, 9]. This is the effect of a suddenly decreasing role for dopamine in clinical practice e.g. it went from the first choice vasopressor in the 2002 Surviving Sepsis Campaign guidelines to not being mentioned, and being replaced by noradrenaline, vasopressin and epinephrine, in the 2021 update [10]. Recent attempts at creating a unified measure of vasoactive support should facilitate future work on cardiovascular system assessment in the setting of critical illness [2].

Assumptions underlying the use of the total SOFA score may raise some concerns. Equating physiological disturbances in different organ systems in terms of prognosis goes against clinical gestalt and plainly contradicts the current stride towards precision medicine. How does one square the application of sophisticated machine learning algorithms with the use of crude, arbitrary categories to evaluate organ dysfunction? Parameters such as platelet count, PaO2/FiO2, bilirubin, and creatinine concentration can, and intuitively should, be analysed in a way that respects their continuous nature while maximising the amount of information gained from these measurements [11]. From a clinical point of view, it is also apparent that different combinations of organ dysfunction can have different implications. In the language of statistics, the complex interplay between organ systems can be expressed and properly quantified by employing interaction terms in regression models [12]. Previous studies have convincingly proven that two plus two does not equal four when using the SOFA score, as the relation between the failure of different organs and mortality has a multiplicative rather than additive character [7, 13]. However, one must take some extenuating circumstances into account. First, one of the aspects that made SOFA score so popular is its simplicity and the ability to evaluate it at the bedside. Second, the introduction of a complex clinical tool using machine learning, assessment of statistical interactions and other sophisticated mathematical tools would not have been feasible in 1996. Conversely, the current availability of smartphones, much more powerful than personal computers means there is great potential for the creation of new prognostic tools and this should be considered when the decision is made to update the SOFA score [14].

The SOFA score has permeated critical care [15, 16]. Diagnostic criteria of sepsis are now based on a change in the SOFA score [17]. However, if one were to take a step back, the literature begs the following question: do we really need a numerical score to describe organ failure? Even if this is the case, a reliable score would have to be organ-specific (1), independent of therapy (2), reflect acute dysfunction that does not overlap with chronic dysfunction (3) and be reproducible in heterogeneous groups of ICU patients (4) [4]. The results of our study do not support the above conditions in regards to the SOFA score in older patients. The seminal consensus indicates that SOFA should be able to broaden our knowledge about organ failure and facilitate the conduct of clinical trials. On the one hand, a meta-regression of 58 randomized controlled trials showed that SOFA score measured at one time-point is not an optimal surrogate for mortality [11]. Based on our results, we know that despite its significant association with mortality, total SOFA score fails to reliably describe multiorgan failure in the older population. On the other hand, delta of SOFA score is well associated with mortality in randomised controlled trial. Unfortunately, in this study we only gathered the worst SOFA score within 24 h of ICU admission and therefore we are unable to determine whether assessment of SOFA score trends translates better into mortality than its single measurement.

For the past decades, experts in research methods and statistics have repeatedly reminded our community that arbitrary categorisation of data is a waste at best and can lead to harm in the worst-case scenario [18, 19]. If a pattern of physiological parameters (i.e., sequential organ failure assessment) is of interest, nothing stands in the way of plotting raw clinical and laboratory measurements over time and analysing them in their original form. Data will speak for themselves and reveal both the strength and complexity of estimated effects if nonlinearities such as U-shaped relations are allowed at the stage of statistical analysis. Even though the European Medicines Agency encouraged trialists to use the SOFA score as an endpoint, the SOFA score’s capacity to explain mortality, estimated at ≤ 35% when using the delta SOFA score, is far below the 85% bar set by the Food and Drug Administration for surrogate outcomes in oncology [11, 20]. New patient-oriented outcomes, such as days alive and free from organ support or days alive outside the ICU within a predefined period, such as 28 or 90 days, have recently gained popularity. These endpoints have far more promising properties than arbitrary categories of uncertain importance to patients [21, 22]. Better still, longitudinal ordinal models can be used to incorporate all relevant transitions between stages of critical illness and maximise statistical power, though these models require a relatively high level of expertise and effort from the study’s biostatistician [23].

This study has several weaknesses. First, only the maximum SOFA score within the first 24 h after admission to the ICU was available in our dataset, precluding any exploration of changes in the SOFA score over time and their relations with mortality. We also did not record the baseline SOFA score, which would provide us with valuable information of chronic organ failure in the population of older critically ill patients. Importantly, the dynamics of each of the SOFA components express different time trajectories during the ICU hospitalisation. Nevertheless, limitations described above apply to any transformation of the original score. Second, the sample size did not allow for a credible investigation of interactions between different systems and subgroups based on the reason for admission to the ICU. It also led to low number of patients and wide confidence intervals in the analysis of the highest categories in renal, hepatic and coagulation components, potentially resulting in some difficulties in interpretation of the results. Third, we did not assess the interrater variability, we know that parameters such as the Glasgow Coma Scale are prone to misclassification in critically ill patients. Fourth, raw data such as biomarker concentrations were not collected in the primary study. Fifth, this is a post-hoc analysis and our results should be considered hypothesis-generating. Sixth, these results are generalisable primarily to older adults. Further studies developed by the VIP project group will help to address this issue more precisely in the future, however an optimal way to assess SOFA score performance in the population of older ICU patients would be to design a large prospective study focused on SOFA score validation on a dedicated cohort.

Our study has many strengths. We were able to include almost four thousand patients ≥ 80 years old from an international, prospectively enrolled cohort, and the completeness of relevant data was exceptionally high. Outcomes included both ICU- and 30-day mortality, mitigating the risk of biases arising from hospital wards’ discharge policies. Our study finished recruitment before the COVID-19 pandemic, thus our results are applicable to a broad population of older patients routinely treated in intensive care units.

Conclusion

Different components of the SOFA score have different prognostic implications for older critically ill adults. The cardiovascular component of the SOFA score needs revision. Future research should explicitly test the utility of the SOFA score with reference to other methods of organ function assessment.