Introduction

Postoperative mortality has decreased significantly over the last few decades. The European Surgical Outcomes Study (EuSOS) reported an overall in-hospital mortality of 4% in patients undergoing non-cardiac surgery across 28 European nations in 2011 [1]. More specifically for the Dutch surgical population, an all-cause mortality rate of 1.85% was reported for major surgery performed during 1991–2005 [2]. The TRACE investigators published a 30-day mortality of 0.5% in major non-cardiac surgery patients in the period 2016–2018 [3].

Current improvements in perioperative care aim to reduce failure to rescue, thus lowering the proportion of preventable deaths due to unnoticed complications in the postoperative period [4]. Besides death, postoperative complications are associated with adverse functional outcome after surgery. The challenge is to improve postoperative survival and prevent new disability, especially in selected groups of high risk patients. Standardized risk assessment tools can be used to distinguish patients at higher risk for postoperative complications and death, and are recommended by international societies for perioperative risk stratification [5].

Various risk assessment tools for the prediction of adverse postoperative outcome have been published [6]. In 2016, Le Manach and others developed and validated the PreOperative Score to predict Post-Operative Mortality (POSPOM) [7]. A strength of POSPOM is that it is based on a large cohort consisting of more than 5.5 million patients who underwent heterogeneous surgical procedures originating from the French National Hospital Discharge Data Base (NHDBB), which makes it representative for the European surgical population.

The performance of the POSPOM score has been evaluated mostly in specific surgical populations, including patients undergoing emergency abdominal surgery, [8] radical cystectomy, [9] vascular surgery, [10, 11] and elderly patients with hip fractures [12]. Two studies have evaluated POSPOM in large heterogeneous surgical populations, using single center retrospective data [13, 14]. However, both studies did not address the performance of POSPOM in terms of sensitivity, specificity, negative predictive values (NPV) and positive predictive values (PPV), which limits the clinical applicability.

In this study we validated the POSPOM score in a heterogeneous Dutch cohort of nearly 2500 non-cardiac surgery patients by using data from the control cohort of the multicenter stepped-wedge cluster randomized interventional TRACE (routine posTsuRgical Anesthesia visit to improve patient outComE) study [3]. In addition, we studied the potential of the POSPOM score to predict the development of postoperative complications. The aim of the study was to investigate to what extent the POSPOM score is able to predict in-hospital mortality and major postoperative complications in the TRACE cohort.

Methods

Study design and participants

This study was based on the TRACE (routine posTsuRgical Anesthesia visit to improve patient outComE) study. Details on design and analysis of the TRACE study have been previously reported [15]. TRACE was a prospective, multicenter, stepped-wedge, cluster-randomized interventional study performed between October 2016 and August 2019 in nine academic and non-academic hospitals in The Netherlands. The effects of a routine postoperative visit by an anesthesiologist on the incidence of postoperative complications and mortality was assessed. For the current study, participants that originated from the control cohort (non-intervention) of the TRACE study were included. The study was ethically approved by the Human Subjects Committee of Amsterdam UMC, location VUmc Amsterdam (number NL56004.029.16, 29-06-2016) and registered with the Netherlands Trial Register (NTR5506). The Clinical Research Unit of the Amsterdam UMC took responsibility for the monitoring of patient inclusion and data registration. All study participants signed informed consent.

Study data and variables

The POSPOM score has been originally developed by Le Manach and others and is composed of a set of preoperative variables including age, medical history and type of surgery [7]. For each variable a certain number of points is assigned to the final score, which varies between 0 and 70 points and equates to a probability of in-hospital mortality. To calculate a POSPOM score for each patient, relevant preoperative variables were extracted from the TRACE database.

Type of surgery was defined in the original TRACE database, however the list of procedures was less detailed compared to POSPOM. Therefore appropriate modifications were made in the surgical classification of POSPOM to calculate the POSPOM score for all study patients. First, because of the in- and exclusion criteria of the TRACE study, day case procedures and cardiac and orthopedic trauma surgery were not included. Due to the absence of day case procedures, the POSPOM surgery groups ‘other orthopedic’ and ‘minor gastrointestinal’ were not represented in the study cohort. Second, within TRACE no subdivision was made between minor and major surgery. Hence, for urologic, vascular and hepatic procedures this subdivision was based on duration of surgery; > 120 min was considered as major surgery for urologic and vascular procedures and > 180 min for hepatic procedures.

Study endpoints

The primary study endpoint was in-hospital mortality. The secondary study endpoints were the occurrence of any complication, severity of a complication graded according to the Clavien-Dindo classification, [16] and type of complication (i.e. infectious, cardiac/transfusion, pulmonary, venous thromboembolic, renal, neurological, surgical, ileus, delirium and other).

Statistical analysis

The size of the cohort was determined within the sample size calculation of the TRACE study of which detailed descriptions have been published previously [15]. Continuous variables were expressed as mean ± standard deviation (SD) and in case of a non-symmetric distribution as median with interquartile range (IQR). Categorical variables were described with frequencies and percentages. For the calculation of the discriminating power of POSPOM, receiver operating characteristic (ROC) curves with corresponding C-statistics (i.e. area under the curve (AUC) values) were calculated for primary and secondary study endpoints [17]. For the description of the performance of the POSPOM score sensitivity, specificity, negative predictive values (NPV), and positive predictive values (PPV) were calculated from true-positive, false-positive, true-negative, and false-negative values for the outcome variables in-hospital mortality and major complication (Clavien-Dindo grade III-V). To assess the mean calibration (or calibration-in-the-large) the average predicted risk of in-hospital mortality was compared with the overall observed rate of in-hospital mortality [18]. In addition, the difference between predicted and observed in-hospital mortality rates per POSPOM score group were calculated. A calibration plot for in-hospital mortality was constructed with corresponding calibration curve intercept and slope; the predicted probability was based on the average POSPOM score per tentile. All analyses were performed using SPSS Statistics version 26.0 (IBM, Armonk, NY, USA) and GraphPad Prism version 8.2.1 (GraphPad Software, San Diego, California, USA).

Results

Study population

In total, 5473 patients were included in the TRACE study cohort. After removal of 283 drop-outs (i.e. withdrawal of informed consent before operation, operation cancelled and/or not meeting inclusion criteria) and 2700 patients originating from the intervention cohort, 2490 patients were included in the analysis. Median age was 65 (IQR 12) years, 46.9% of patients were female, and a majority (61.7%) was classified as ASA class 2. Active cancer or diabetes mellitus were the most common comorbidities. Major gastrointestinal surgery was the most frequently performed procedure, followed by arthroplasty and spine surgery, and minor urologic surgery (Table 1).

Table 1 Study characteristics

Incidence of mortality and postoperative complications

The distribution of in-hospital mortality according to POSPOM score is presented in Fig. 1. Based on the median POSPOM score of 24 (IQR 8) points, the predicted risk of in-hospital mortality was 1.3% in our study population. The observed in-hospital mortality rate was 0.5% (14 patients). Of these patients POSPOM scores varied between 24 and 36 points, which equates a probability of in-hospital mortality of 1.3 and 32.9% respectively. POSPOM values less than 24 points were associated with 0% observed mortality (Fig. 1).

Fig. 1
figure 1

Distribution of patients with mortality among POSPOM scores. No deaths were observed below a POSPOM score of 24 points and above a POSPOM score of 36 points. Abbreviations: POSPOM PreOperative Score to predict PostOperative Mortality

Six hundred and seventy-six patients (27.1%) had at least one postoperative complication. A majority (39.3%) of complications was scored Clavien-Dindo (CD) grade II at highest (requiring pharmacological treatment), followed by 33.4% CD grade I (any deviation from the normal postoperative course, without the need for pharmacological treatment or surgical, endoscopic and radiological interventions) and 15.7% CD grade III (requiring surgical, endoscopic or radiological intervention). Forty-two patients (6.2%) had a life-threating complication requiring ICU-management. The distribution of the observed in-hospital complication rates among POSPOM scores is presented in Fig. 2. Of all postoperative complications, the incidence of other complications (not further defined) was highest (45.1%), followed by infectious (36.5%) and cardiac/transfusion (27.5%).

Fig. 2
figure 2

Distribution of patients with complications among POSPOM scores. Complications are classified according to the Clavien-Dindo classification of surgical complications. Abbreviations: POSPOM PreOperative Score to predict PostOperative Mortality, CD Clavien-Dindo

Validation of POSPOM for mortality and postoperative complications

Discrimination

The concordance between POSPOM calculated mortality and observed mortality was strong (C-statistic 0.86 (95% CI, 0.78–0.93). The corresponding Receiver Operating Characteristics (ROC) curve is shown in Fig. 3. A cutoff value of a POSPOM score of 24 points equated to a sensitivity of 100%, a specificity of 48%, a PPV of 1.1% and an NPV of 100%. Despite a strong C-statistic for postoperative mortality, the low incidence of in-hospital mortality resulted in low PPV and high NPV values. Other cutoff values of the POSPOM score did not result in improvement of the PPV (Table 2).

Fig. 3
figure 3

ROC curve: predicted versus observed in-hospital mortality for POSPOM. ROC curve with a corresponding C-statistic of 0.86 (95% CI, 0.78–0.93), which implies strong discriminating power. Abbreviations: ROC Receiver Operating Characteristics, POSPOM  PreOperative Score to predict PostOperative Mortality

Table 2 Performance of different POSPOM cutoff values for the prediction of mortality

The overall discriminatory ability of POSPOM for any postoperative complication was poor (C-statistic 0.63 (95% CI, 0.60–0.65)) but improved according to severity (C-statistic 0.65 (95% CI, 0.61–0.70) for CD grade III-V and C-statistic 0.73 (95% CI, 0.67–0.80) for CD grade IV-V; Fig. 4). For different types of complications, the discrimination was lowest for neurological complications (C-statistic 0.54 (95% CI, 0.40–0.68)) and highest for pulmonary complications (C-statistic 0.73 (95% CI, 0.69–0.78)). For the prediction of major complications, the cutoff value of 38 points showed the highest PPV of 20.0%, nevertheless with a corresponding high NPV of 93.5% (Table 3).

Fig. 4
figure 4

C-statistic values for different outcomes. Complications are classified according to the Clavien-Dindo classification of surgical complications. Abbreviations: CD Clavien-Dindo

Table 3 Performance of different POSPOM cutoff values for the prediction of major complication

Calibration

The average predicted risk of in-hospital mortality was higher than the overall observed mortality rate, which indicated that POSPOM overestimated risk in general (‘calibration-in-the-large’). In addition, for each particular POSPOM score group the predicted risk of in-hospital mortality exceeded the observed percentage of in-hospital mortality (Table 4 in Appendix), especially in the groups with higher POSPOM score. The calibration plot for in-hospital mortality (Fig. 5) showed poor calibration of POSPOM with a fitting curve slope of 0.17 and a negative y-axis interception, which suggests too extreme estimated risks and overestimation respectively.

Fig. 5
figure 5

Calibration plot for in-hospital mortality. The slope of the fitted line is 0.1704 and the y-axis intercept is 0.00 (95% CI, − 0.001395-0.001345)

Discussion

In this large cohort of Dutch non-cardiac surgery patients, we found a strong discrimination for the prediction of in-hospital mortality with POSPOM. This suggests that POSPOM scores are capable to rank patients from low to high risk for in-hospital mortality during their perioperative course [17]. On the other hand, the poor agreement between predicted and observed in-hospital mortality rates indicated poor calibration; i.e. POSPOM systematically overestimated the risk of in-hospital mortality in general [18]. For the prediction of complications, the discrimination of POSPOM was poor to fair according to the severity of the complication.

The strong C-statistic (> 0.8) of POSPOM for the prediction of in-hospital mortality found in this surgical cohort is in line with two previously published validation studies [8, 9]. Two studies reported a mean observed mortality rate (‘calibration-in-the-large’) lower than predicted by POSPOM which is in accordance with our results [9, 12]. One study found an observed mortality similar to predicted [11]. Of all preceding validation studies, two studies examined calibration more closely via a calibration plot [8, 14]. Both studies found underestimation of in-hospital mortality in the POSPOM risk groups with a low risk of in-hospital mortality, whereas in the higher risk groups POSPOM overestimated mortality. In our patients, an overestimation of in-hospital mortality for all POSPOM groups was observed. The subgroup of patients with high POSPOM scores were overrepresented compared to the derivation cohort of Le Manach and others, nevertheless the observed mortality rate of 0.5% in the TRACE cohort was comparable with the mortality rate of 0.47% in the derivation cohort. An explanation for this finding may be the decrease in surgery-related risks over the years; patients in the original POSPOM cohort underwent surgery in 2010 compared to 2016 until 2018 in the TRACE cohort.

The low incidence of postoperative in-hospital death resulted in a clinically irrelevant low PPV for the prediction of mortality. For the prediction of a major complication, the PPV values were homogenous for the majority of the POSPOM cutoff value groups, which resulted in poor differentiation for the prediction of a major complication. The highest PPV of 20% was found for the POSPOM cutoff value of 38 points. Nevertheless, because of the low number of patients in this subgroup, the use of this cutoff value to distinguish patients at higher risk will not be of added value in clinical practice.

Our study was limited by the low absolute number of deaths which negatively influences the quality of the validation of the POSPOM score in this cohort. Validation of the POSPOM score in a population with a higher a priori chance of a complicated postoperative course may have led to different results. As described in the method section, unavoidable modifications of the surgical classification were made which may have influenced the calculation of the final POSPOM score. In addition, cardiac and orthopedic trauma patients were absent in our cohort. Strengths of the TRACE control cohort are its prospective and multicenter design, large size and heterogeneous surgery patient group.

In our study cohort the performance of the POSPOM score to predict in-hospital mortality and complications was poor. In a population with a higher a priori chance of a complicated postoperative hospital stay, using POSPOM as a standardized risk assessment tool may be of added value. However, despite all the risk models that have been developed in the last couple of decades, there is a lack of evidence with regard to the clinical consequences of a higher predicted risk of in-hospital death or complications. The role of these risk models in clinical practice has still to be determined. Besides, because of the low mortality rates in current European healthcare, it may be more interesting to focus on the prediction of complications and relevant outcomes which lead to disability instead of mortality.

Conclusion

In this study POSPOM showed strong discrimination for the prediction of in-hospital mortality, however the calibration of POSPOM was poor. In addition, for the prediction of complications the performance of POSPOM was poor. This limits the use of the risk score in clinical practice.