Background

Decisions to withhold or withdraw life-sustaining treatment (LST) in the intensive care unit (ICU) are considered appropriate if there is no reasonable expectation of beneficial outcome [1]. However, the evaluation of prognostic information and benefit of critical care for the individual patient varies depending on a number of factors which can be related or unrelated to the individual patient, such as cultural norms and resource constraints [2,3,4,5,6]. Patient-related factors comprise the severity of the acute illness, comorbidities and, notably, old age [7, 8]. However, predicting outcome and its benefit for very old patients and making appropriate decisions about LST constitute a major challenge due to the heterogeneity of multimorbidity and the variable perception of functional impairments at an advanced age [9,10,11]. This has resulted in a substantial variability of decisions to withhold or withdraw LST in critical care [3, 12, 13].

We have recently identified distinct phenotypes of very old patients (age ≥ 80 years) from the multinational VIP2 study cohort by using clustering analysis of clinical characteristics available on admission to the ICU [14, 15]. This method provided the opportunity to explore complex patterns of clinical features to draw a nuanced picture of this patient population with regard to prognosis [16]. In a subgroup of VIP2 patients without limitations of LST, short-term mortality was found to be highest (up to 57% within 30 days) for phenotypes with marked geriatric features, i.e., frailty, multimorbidity and functional or cognitive impairments. In contrast, 30-day mortality in a phenotype composed of nonagenarians with low sequential organ failure assessment (SOFA) scores was less than 10%, which defied traditional views on the benefit of LST in that age group [16].

This new study sets out to investigate whether the decisions to withhold or withdraw LST in the ICU depend on the patients' clinical phenotype in the VIP2 cohort. We compare the influence of phenotypes on these decisions with the impact of the cultural context (country) which was shown to play a significant role in a similar cohort [12]. This analysis of practice patterns is needed to support timely discussions with patients and their families about care trajectories for critical and potentially terminal conditions [17].

Methods

The Very elderly Intensive care Patient (VIP)—2 study was a prospective observational study to examine the influence of geriatric characteristics on survival in patients aged 80 years or older admitted with acute conditions to ICUs in 22 countries [14]. The participating ICUs recruited consecutive patients who met the above demographic and clinical criteria during any 6-month period between May 2018 and May 2019. National coordinators obtained ethics committee approval in their respective countries. Case report forms and the database were hosted on a secure server located on the campus of Aarhus University (Denmark).

Clustering analysis was applied to the VIP2 study cohort to delineate groups (phenotypes) of patients with similar demographic (age, gender, residence), acute (SOFA score and subscores) and geriatric characteristics (frailty, multimorbidity and polypharmacy, functional and cognitive impairments) recorded on admission to the ICU [15]. Decisions to limit LST were recorded as withholding or withdrawing LST in the VIP2 study. Sensitivity analyses were performed with respect to the inclusion of patients with limitations of LST and the number of phenotypical categories [15].

This new descriptive study includes all patients from the VIP2 cohort who were classified into one of seven distinct phenotypes and who stayed in ICU for more than 1 h. The flowchart for obtaining this sample is depicted in Fig. 1.

Fig. 1
figure 1

Flowchart for the study sample obtained from the VIP2 study cohort

Descriptive characteristics are reported as median with inter-quartile range (IQR) for continuous variables and proportions (percentages) for nominal variables. Odds ratios with 95% confidence intervals were calculated for binary outcome variables, i.e. either withholding or withdrawing LST, for each phenotype with the phenotype having the highest rate for these outcomes as reference. One-way ANOVA test was used to examine differences of continuous variables and Fisher’s exact test for nominal variables. The area under the receiver-operating characteristic (AUROC) curve was determined for logistic regression models to assess their discriminatory performance for the binary classification of outcome. Statistical analyses were performed using R (version 4.1.1, www.r-project.org) and Python 3 (Python Software Foundation, Beaverton, OR, USA).

Results

This study included 1268 patients from the VIP2 study cohort with or without limitations of LST who were assigned to one of seven phenotypes [15]. The incidence of decisions to withhold or withdraw LST was 26.5% and 8.1%, respectively. Overall mortality in this population was 17.7% in ICU and 27.1% within 30 days. The mortality at 30 days after withholding or withdrawing LST was 34.5% and 88.3%, respectively.

The demographic and clinical characteristics of phenotypes are shown in Table 1. Phenotypes A and G represent the extreme ends of the spectrum of the SOFA score, most geriatric features and mortality. Mortality in the ICU was significantly higher for phenotypes A, B and C after decisions to withhold LST. Statistically significant differences in 30-day mortality were detected for phenotypes A–E but not for the geriatric phenotypes F and G (Table 1).

Table 1 Demographic and clinical characteristics of phenotypes

Table 2 shows the distribution of phenotypes and the incidence of limitations of LST in the patient cohorts from countries which contributed more than 3% of the study population each.

Table 2 Characteristics of patient cohorts from countries which contributed more than 3% of the study population

Phenotypes F and G and the group of oldest old patients (phenotype C) were found to be associated with the highest rates and largest odds for withholding decisions (Tables 1, 3). Phenotypes F and C did not differ significantly from phenotype G which had the highest overall rate for limitations of LST and served as reference. Regarding withdrawal of LST, phenotype A showed the smallest odds that differed significantly from the reference phenotype G (Table 3).

Table 3 Odds ratios (OR) with 95% confidence intervals for decisions to limit LST

To investigate the relationship between phenotypes and limitations of LST in more detail, we examined the odds for withholding further LST in patients during noninvasive and invasive ventilation. Patients on noninvasive ventilation in phenotype D and patients on invasive ventilation in phenotype B had significantly lower odds than the reference phenotype G for withholding decisions when treated at these levels of organ support (Table 4). Of note, we did not perform a similar analysis for withdrawing decisions due to the small number of patients with that type of decision.

Table 4 Odds ratios (OR) with 95% confidence intervals for decisions to withhold LST in patients on noninvasive and invasive ventilation

Next, we compared the discriminatory performance of logistic regression models based on phenotype, cultural contest (country), ventilatory support and the prior occurrence of withholding decisions to predict limitations of LST. Figure 2 shows the receiver operating characteristic curves and AUROC data for these models. Using phenotype or country alone did not yield good discrimination, i.e. AUROC values were below 0.8 for both types of decisions. A better discrimination was achieved by combining phenotype with country. Adding the history of withholding decisions resulted in a good discrimination with an AUROC value of 0.83 for decisions to withdraw LST (Fig. 2).

Fig. 2
figure 2

Discriminatory performance of logistic regression models, as depicted by receiver-operating characteristic (ROC) curves, for decisions to withhold (A) or withdraw (B) of LST

Discussion

Limiting LST can be an important step to adjust the extent of critical care to the individual needs of patients. Due to the uncertainty about beneficial outcome, notably in very old patients, there is no evidence-based framework to guide these decisions. A more detailed understanding of the involved factors can increase the awareness to biases and may reduce the variability of decision-making [13]. In this context, the objective of this study was to investigate the role of clinical phenotypes for decisions to withhold or withdraw LST in the VIP2 study [14, 15]. These phenotypes represent combinations of demographic, acute and geriatric characteristics on admission to the ICU and are available for early discussions about likely trajectories in critical care.

Two of the phenotypes (F, G) are characterised by enhanced geriatric features. Phenotype C includes the oldest old patients, but without prominent geriatric characteristics and with only moderate SOFA scores. Importantly, the largest odds for decisions to withhold LST were found in these three phenotypes. This confirms previous studies showing an association of such decisions with the perception of poor performance status [7]. Although frailty and other geriatric impairments were shown to correlate with worse survival and functional outcome [18,19,20,21,22], there is no strong evidence for that with respect to age itself [23]. The new findings in this study suggest a propensity among medical professionals to limit the perceived burden of interventional therapies for the oldest old, independently of acute and chronic conditions. Importantly, mortality after 30 days was not significantly increased for the geriatric phenotypes F and G after withholding LST. This indicates coherence of predictions with the actual outcome in these phenotypes. However, there was a significant increase of mortality after withholding LST in phenotype C questioning the value of the above approach for this particular group of oldest old patients.

For patients on ventilatory support, the nongeriatric phenotypes B and D had a lower probability of withholding additional organ support. Patients in both phenotypes scored high for the respiratory component of the SOFA score on admission to ICU [15]. Thus, ventilatory support was one if not the main reason for admission to the ICU and continuation of organ support until remission of respiratory failure might have been a major objective. This reasoning, however, is not applicable for phenotype G which had the highest rate of invasive ventilation, but on a background of enhanced geriatric characteristics, which eventually led to a higher rate of decisions to limit LST.

We have recently examined the relationship between single patient characteristics (age, gender, SOFA score, single geriatric features) and decisions to limit LST for the VIP2 patient cohort [24]. There was no individual characteristic with meaningful discrimination for withholding decisions, i.e. AUROC values greater than 0.6. The small increment in discrimination gained by using phenotypes instead of single features illustrates both the complexity of choosing patients for withholding decisions and the need for additional information to predict these decisions with better accuracy. Regarding withdrawal of LST in that previous study [24], the SOFA score had the largest influence on these decisions with an AUROC value of 0.66. This level of discrimination is in the range of that of the phenotype-based model in the new study and reflects the prominent role of the SOFA score for delineating phenotypes with regard to withdrawing decisions [15]. This particularly applies to phenotype A with the lowest SOFA score and the lowest rate and odds for withdrawing LST.

What could be the additional information required for predicting limitations of LST more accurately? Candidate parameters are cultural norms and the response to treatment or the lack thereof as well as the occurrence of adverse events. Moreover, fluctuating resource constraints and preferences of individual stakeholders may have an additional impact on decision-making [25,26,27]. Although these parameters were not explicitly documented in the VIP2 study, we approximated cultural norms by the geographic location (country) of the participating ICUs and showed differences for the incidence of limitations of LST between countries. Ventilatory support and decisions to withhold further LST were used as surrogate markers for assessing the course of critical care in the ICU. In comparison to the patients' phenotype, country as a variable showed a better or at least similar discrimination for predicting withholding or withdrawing decisions. The combination of phenotype and country in a regression model led to a marked increase of discrimination. Adding the prior withholding of LST as an additional variable resulted in a good discrimination for predicting withdrawal of LST.

The above results emphasise the contribution of both patient-related factors and cultural norms to decisions about LST in very old patients. However, because discrimination was only moderate for our models, yet to be specified factors, such as variable characteristics of individual stakeholders, are likely to influence these decisions. Managing multiple factors influencing decision-making in critical care can be challenging. This has been illustrated by the controversies about triage during the COVID-19 pandemic, when the variable interpretation of patient-related information as well as diverse cultural attitudes led to variations of care [28, 29].

Our study has several limitations. The VIP2 study was not designed to analyse decisions to withhold or withdraw LST as outcome. Patients' preferences and other contextual data were not recorded and, thus, were not available for our analysis. Our study focused on phenotyped patients which constitute less than 50% of the eligible study population. The impact of variables other than phenotype on limitations of LST might be different in nonphenotyped patients. Moreover, follow-up was limited to survival at 30 days. Data on survival beyond that time and quality of life, which may be impaired by new disabilities and post-ICU syndrome [30], could further support the decision-making about LST in very old patients. Lastly, patients for the VIP2 study were mostly recruited in Europe [14]. Therefore, the findings on decisions to limit LST remain to be confirmed for other geographic regions [16].

Conclusions

Our study demonstrates the role of clinical phenotypes for decisions to limit LST in very old ICU patients. Combining phenotypes with cultural factors and information about the course of critical care resulted in a good accuracy of predictive discrimination for withholding and withdrawing decisions. These findings can guide further research into biases and preferences involved in the decision-making about LST. Future studies should also analyse the impact of withholding LST on the self-perceived quality of life in ICU survivors to further personalise these decisions.