Introduction

The Physiological and Operative Severity Score for the enUmeration of Mortality and Morbidity (POSSUM) has been successfully used as a tool to provide risk-adjusted operative morbidity and mortality rates for comparisons of surgeon and hospital performance [121]. Increased awareness of the hospital and surgeon volume effect has contributed to the use of such tools. The applicability has been further studied for various highly specialized procedures that include vascular [6, 9, 13, 2226], pulmonary [27], head and neck [28, 29], orthopedic [30], emergency [7], esophageal [17], and liver procedures [5], and all of these applications have been derived from the original POSSUM [2].

There is limited literature on how POSSUM performs in patients undergoing pancreatoduodenectomy (PD). One study that used an adaptation, the Portsmouth-POSSUM, which analyzes mortality, found that this model appeared satisfactory for predicting mortality risk, but that the original POSSUM overestimated morbidity and mortality for PD [31]. These findings indicate that modifications are needed prior to further application. Furthermore, the study was hampered by the small number of patients and the fact that the Portsmouth-POSSUM does not analyze morbidity. Two more larger studies on original POSSUM for pancreatic surgery showed mixed results [32, 33].

The aim of the present study was to evaluate the predictive properties of POSSUM for morbidity in patients undergoing PD for periampullary neoplasms, and to identify specific risk factors associated with morbidity. The adapted version of POSSUM, the Portsmouth-POSSUM, which is used in the prediction of mortality was not analyzed because mortality is generally very low in high-volume centers.

Patients and methods

All patients who underwent PD for malignant and benign disease from January 1993 to April 2006 were included. Patients were selected from our prospective database, and some of the variables needed to calculate POSSUM were collected retrospectively (Table 1). All patients were operated on by the same surgical staff during the study period.

Table 1 Physiological and operative severity assessment for the POSSUM system

Surgical procedure and complications

A PD was performed as previously described [34]. Briefly, an en bloc resection of the duodenum, pancreatic head, bile duct, and gallbladder was performed, and the pylorus was preferably preserved. Only lymph nodes surrounding the pancreas anteriorly and posteriorly, in the hepatoduodenal ligament, and right of the common hepatic artery and portal vein and superior mesenteric vein were removed. If limited involvement of the portal vein or superior mesenteric vein was found, a (wedge) resection of the vein was performed with curative intent. The three anastomoses were generally made by bringing the proximal jejunal limb up along the retroperitoneum behind the mesenteric vessels or through the mesocolon. The pancreaticojejunostomy was generally constructed as an end-to-side anastomosis with a single-layer 3-0 PDS running suture including the pancreatic duct. The hepaticojejunostomy was performed by a single-layer 3-0 PDS running suture, as was the gastrojejunostomy/duodenojejunostomy. Morbidity was re-evaluated according to the criteria described by Copeland et al. [2].

Delayed gastric emptying, pancreatic leakage, and postpancreatectomy hemorrhage were registered according to recently suggested definitions established by the International Study Group of Pancreatic Surgery in the present study [35, 36].

Statistical analysis

A linear analysis was used to evaluate the predictive properties of POSSUM. For linear analysis as described by Whiteley et al. [18], patients were divided according to their predictive risk of morbidity. The number of patients falling into each such category was multiplied by the average risk of morbidity to give the predicted morbidity of that group. This type of analysis allows each group to be considered separately.

Statistical calculations were performed with SPSS software (Chicago, IL). A value of P < 0.05 was considered significant. If missing data of a variable did not exceed 10% it was imputed in the database to maximize data extraction. A separate “missing data analysis” was performed to ensure that the data were missing at random. Analysis of specific risk factors associated with morbidity was done by the univariate method. Binominal variables where compared with the chi-square test. Categorical variables were compared with a reference variable by logistic regression. Continuous variables were also analyzed by logistic regression.

Results

The 652 two consecutive patients who underwent PD for various disorders during the study period were included in the present study (Table 2). There were nine postoperative deaths (1.4%). One or more complications were seen in 332 of 652 patients (50.9%). Missing data of the analysed variables never exceeded 10%.

Table 2 Characteristics of patients undergoing surgery for periampullary neoplasms

By means of linear analysis to compare predicted morbidity with observed morbidity, an O:P ratio of 0.88 was found (Fig. 1). POSSUM under-predicts actual morbidity in patients who are at low risk, and it over-predicts actual morbidity in patients who are deemed to be at high risk. The model had a significant poor fit (χ2 = 30.24; 8 degrees of freedom [df]; P < 0.001).

Fig. 1
figure 1

Calibration curve of surgical morbidity (symbols with 95% confidence interval) showing significant deviation from the diagonal line, which represents a perfect predictive ability when the observed to expected ratio is 1.00. The bars represent the number of patients in each risk group. (O:P ratio = 0.88, χ2 = 30.24, 8 degrees of freedom, P < 0.001, indicating significantly poor fit)

Preoperative and perioperative variables associated with morbidity

The results of the univariate and multivariate analyses for preoperative and perioperative variables associated with morbidity are shown in Table 3. One factor from the original POSSUM was found to be an independent predictors of morbidity in the present data set, this was pulmonary history (Odds Ratio [OR] 2.05, 95% Confidence Interval [CI] 1.15–3.67). Stepwise logistic regression also found that ampulla of Vater adenocarcinoma (OR 1.73, 95% CI 1.07–2.80) was independently associated with morbidity. This factor is not incorporated in POSSUM.

Table 3 Univariate and multivariate analysis of variables found to be significantly associated with morbidity

Discussion

In the present study POSSUM failed to accurately predict morbidity. The results of the study cast serious doubt on the reproducibility of POSSUM in highly specialized procedures such as pancreatoduodenectomy. Modifications are needed prior to its application for a comparative audit in pancreatic surgery in high-volume centers.

Auditing instruments for evaluation of treatment outcome and quality of care between hospitals are required nowadays. Predicting morbidity with POSSUM has been evaluated in a general surgical population to enable a fair comparison between the population of individual surgeons and individual hospitals. The POSSUM system has recently undergone significant critical appraisal [37]. Copeland et al. [2], who described the original system and its application to general surgical patients, have reinforced its application for auditing outcomes in general and orthopedic surgery, comparing outcomes between units and for comparison of surgeons within an individual department, as well as monitoring for a change in an individual surgeon’s performance over a period of years. There is no question concerning the usefulness of POSSUM for general surgery.

Khan et al. [31] were the first to evaluate POSSUM for pancreatic surgery, and they found that the model overestimated morbidity in a low-volume hospital. Their study was limited by the small number of patients. A more recent and lager study performed by Pratt et al. [33] found that the original POSSUM was a good predictor of morbidity and that the model had an excellent fit. Their study was conducted in a high-volume center, and they used the same statistical analysis methods applied in the present study. A possible reason for the different findings in our study and theirs could be the use of different definitions for what constitutes a postoperative complication. For example, the International Study Group on Pancreatic Fistula found that several definitions for pancreatic leakage after pancreaticodoudenectomy exist, and the reported range of 2–50% underscores this variation [36]. This is also the case for delayed gastric emptying and postoperative hemorrhage [35, 38]. Together these three complications represent the majority of complications after pancreatic surgery, and differences in definition could explain the varied results of POSSUM.

In contrast, another large study performed by Tamijmarane et al. [32] found that POSSUM underestimated morbidity. Their study was performed in a high-volume hospital. The present study is the largest to date, and it found that POSSUM overestimates morbidity and has a significant lack of fit.

There are some known drawbacks to POSSUM [39], where pitfalls may be encountered in both data collection and data analysis. Data collection seems like a straightforward process, but methods have to be standardized if results are going to be reproducible. The physiological score is obviously subject to change over time, especially in nonelective urgent procedures. This was not a factor in the present study, which involves only elective procedures. Another problem could arise if the surgeon were to select the worst physiological score in order to show a positive result. Again, this is virtually impossible in the present study because the procedures were all performed electively, and the patients are presumed to have been physiologically stable throughout the preoperative assessment. Furthermore, patients selected for a pancreatic resection are always subjected to intensive screening.

Missing data is another important problem in data collection. Some tests included in the POSSUM are not indicated in otherwise healthy individuals. Performing all these preoperative investigations is not in keeping with the hospital guidelines affecting the present study population. Therefore this study, like many others, scored these variables as 1. However, missing data never exceeded 10% of the variables analysed in the present study. Also, analysis of the missing variables, including sole analysis of patients with the complete POSSUM work-up, showed that these data were indeed missing at random and did not influence the fit of the model.

Problems in data analysis can be due to the homogeneous nature of some variables. The operative score in the present study is homogeneous because the POSSUM is calculated for one procedure and thus does not vary much. In addition, all patients had the same operative severity score and the same mode of surgery—consistent with a single procedure—and they also had the same peritoneal soiling score. Only blood loss and the presence of malignancy differed among these patients.

Another point of discussion is which analysis method is best suited for POSSUM. Copeland et al. [2] have shown that exponential analysis continues to be predictive of mortality associated with general surgery. With linear analysis, small sample size can result in inaccurate results, and large samples will allow more accurate analysis of goodness-of-fit. Thus in the present study linear analysis was used because of the large sample size. Of interest, exponential analysis of the data from the present study (results not shown) yielded similar results.

Other highlighted potential pitfalls in the use of the POSSUM system include the classification of ECG abnormalities and the difficulty in establishing the exact operative blood loss [10].

Many patients undergoing surgery for periampullary neoplasms have major co-morbidity, which could strongly influence their risk of postoperative morbidity. This characteristic is not apparent in the POSSUM score in the present study because multivariate analysis did not find an association between these variables and postoperative morbidity. Technical complications do not seem to be influenced by preoperative factors, but they can reflect the extent of surgery and, perhaps, the surgeon’s judgment. And as found in the present study and noted by many other authors, the degree of fibrosis of the pancreatic remnant (e.g., nondilated duct) seems to contribute significantly to the morbidity rate [4046].

For most surgeons, their area of expertise dictates their highest-risk operative procedures. And many specialists have adapted POSSUM scoring as a way of allowing for case mix in their complex, high-risk operations. Separate equations have also been developed in specialized procedures. However, most adapted models are pending external validation. [The question remains if the specialized surgeons cannot suffice with regression analysis of their “case-mix” in order to compare individual or hospital results.]

The outcome of the present study raises the question of whether a specialized POSSUM score has any place in pancreatic surgery because it is questionable whether an adequate model can be developed. It is also doubtful whether surgeons and clinicians are waiting for another “adapted model,” as logistic regression analysis of their own data can be used for a similar purpose. Furthermore, the use of models that overpredict or underpredict morbidity may have grave consequences. Nevertheless, surgical audits are of the utmost importance, and if the use of POSSUM is desirable, our results point to a need for a new equation based on the variables that are unique to this procedure.